List Duplicates

Just starting out? Need help? Post your questions and find answers here.
collectordave
Addict
Addict
Posts: 1309
Joined: Fri Aug 28, 2015 6:10 pm
Location: Portugal

List Duplicates

Post by collectordave »

I have two lists. Each entry in each list is an integer value. These can run into hundreds of thousands of entries.

List 1 is the master list and has to remain intact.

Is there any quick way of removing any entries in list 2 that appear in list 1?

Regards

CD
Any intelligent fool can make things bigger and more complex. It takes a touch of genius — and a lot of courage to move in the opposite direction.
User avatar
RSBasic
Moderator
Moderator
Posts: 1218
Joined: Thu Dec 31, 2009 11:05 pm
Location: Gernsbach (Germany)
Contact:

Re: List Duplicates

Post by RSBasic »

Option 1: Check your List with SortList and ForEach.
Option 2: Insert your List in Map as Index. Duplicates are overwritten.

Or do you mean something else?
Image
Image
wilbert
PureBasic Expert
PureBasic Expert
Posts: 3870
Joined: Sun Aug 08, 2004 5:21 am
Location: Netherlands

Re: List Duplicates

Post by wilbert »

I think the best approach depends a bit on the lists.
If they are sorted or not, the type of integer values (16, 32 or 64 bit), if a list can contain duplicate values or not.
Windows (x64)
Raspberry Pi OS (Arm64)
collectordave
Addict
Addict
Posts: 1309
Joined: Fri Aug 28, 2015 6:10 pm
Location: Portugal

Re: List Duplicates

Post by collectordave »

Hi
Will try the map approach. Seems to make sense to me.

Each list has no duplicates within the list but they are not guaranteed to hold the same numbers. I tried the sort list approach, quick but not infallible.

Thanks to all

Just thought a little more. The idea is to end up with a list of numbers in the second list that do not appear in the first list. The map approach may not work.

Regards

CD
Any intelligent fool can make things bigger and more complex. It takes a touch of genius — and a lot of courage to move in the opposite direction.
wilbert
PureBasic Expert
PureBasic Expert
Posts: 3870
Joined: Sun Aug 08, 2004 5:21 am
Location: Netherlands

Re: List Duplicates

Post by wilbert »

collectordave wrote:Each list has no duplicates within the list but they are not guaranteed to hold the same numbers. I tried the sort list approach, quick but not infallible.
If you sort both lists, it should work fine.

If the numbers are not very high (for example all in the range of 0 - 1,000,000), I would allocate a piece of memory and set and test bits to check if a value already exists.
Should be much faster.
Windows (x64)
Raspberry Pi OS (Arm64)
JHPJHP
Addict
Addict
Posts: 2129
Joined: Sat Oct 09, 2010 3:47 am
Contact:

Re: List Duplicates

Post by JHPJHP »

Hi collectordave,

See the following:
- Services, Stuff & Shellhook
-- Stuff\SQLiteDatabase\SELECT.pb

NB*: Instead of using databse-memory, you could create the database with list one, and only need to replace list two for the compare.
Last edited by JHPJHP on Thu Mar 29, 2018 8:32 pm, edited 4 times in total.
collectordave
Addict
Addict
Posts: 1309
Joined: Fri Aug 28, 2015 6:10 pm
Location: Portugal

Re: List Duplicates

Post by collectordave »

Hi All

Sorry may have helped if I had written that it was a database app in the first place.

JHPJHPs solution works very well no need for lists or maps gives the right answer each time and is so much quicker than my puny effort.

Thanks to all.

CD
Any intelligent fool can make things bigger and more complex. It takes a touch of genius — and a lot of courage to move in the opposite direction.
User avatar
skywalk
Addict
Addict
Posts: 3996
Joined: Wed Dec 23, 2009 10:14 pm
Location: Boston, MA

Re: List Duplicates

Post by skywalk »

If you are still interested, depending on implementation, a map approach may be faster.

Code: Select all

Define.s DatabaseMemory, dbSQL, List1_Filename, List2_Filename, List3_Filename, ListValue
UseSQLiteDatabase()
DatabaseMemory = ":memory:"
List1_Filename = "Files/List1.txt"
List2_Filename = "Files/List2.txt"
List3_Filename = "Files/List3.txt"
If 0;OpenDatabase(0, DatabaseMemory, "", "", #PB_Database_SQLite)
  dbSQL = "BEGIN TRANSACTION;" + #LF$
  dbSQL + "CREATE TABLE list1 (value INTEGER);" + #LF$
  dbSQL + "CREATE TABLE list2 (value INTEGER);" + #LF$
  If ReadFile(0, List1_Filename)
    While Not Eof(0)
      ListValue = ReadString(0)
      dbSQL + "INSERT INTO list1 (value) VALUES (" + ListValue + ");" + #LF$
    Wend
    CloseFile(0)
  EndIf
  If ReadFile(0, List2_Filename)
    While Not Eof(0)
      ListValue = ReadString(0)
      dbSQL + "INSERT INTO list2 (value) VALUES (" + ListValue + ");" + #LF$
    Wend
    CloseFile(0)
  EndIf
  dbSQL + "COMMIT;"
  DatabaseUpdate(0, dbSQL) : ListValue = #Null$
  If DatabaseQuery(0, "SELECT value FROM list2 WHERE value NOT IN (SELECT value FROM list1);")
    While NextDatabaseRow(0)
      ListValue + Str(GetDatabaseLong(0, 0)) + #LF$
    Wend
    FinishDatabaseQuery(0)
  EndIf
  CloseDatabase(0)
  If CreateFile(0, List3_Filename)
    WriteString(0, ListValue)
    CloseFile(0)
  EndIf
Else  ; nosql
  Define NewMap mapL1.i()
  Define NewMap mapL3.i()
  Define.s r$
  If ReadFile(0, List1_Filename)
    While Not Eof(0)
      AddMapElement(mapL1(), ReadString(0))
    Wend
    CloseFile(0)
  EndIf
  If ReadFile(0, List2_Filename)
    While Not Eof(0)
      r$ = ReadString(0)
      If FindMapElement(mapL1(), r$) = 0
        AddMapElement(mapL3(), r$)
      EndIf
    Wend
    CloseFile(0)
  EndIf
  Debug "-- List 2 entries NOT in List 1 --"
  ForEach mapL3()
    Debug MapKey(mapL3()); + ", " + mapL3()
  Next
EndIf
The nice thing about standards is there are so many to choose from. ~ Andrew Tanenbaum
Post Reply