I have two lists. Each entry in each list is an integer value. These can run into hundreds of thousands of entries.
List 1 is the master list and has to remain intact.
Is there any quick way of removing any entries in list 2 that appear in list 1?
Regards
CD
List Duplicates
-
- Addict
- Posts: 1309
- Joined: Fri Aug 28, 2015 6:10 pm
- Location: Portugal
List Duplicates
Any intelligent fool can make things bigger and more complex. It takes a touch of genius — and a lot of courage to move in the opposite direction.
- RSBasic
- Moderator
- Posts: 1218
- Joined: Thu Dec 31, 2009 11:05 pm
- Location: Gernsbach (Germany)
- Contact:
Re: List Duplicates
Option 1: Check your List with SortList and ForEach.
Option 2: Insert your List in Map as Index. Duplicates are overwritten.
Or do you mean something else?
Option 2: Insert your List in Map as Index. Duplicates are overwritten.
Or do you mean something else?
Re: List Duplicates
I think the best approach depends a bit on the lists.
If they are sorted or not, the type of integer values (16, 32 or 64 bit), if a list can contain duplicate values or not.
If they are sorted or not, the type of integer values (16, 32 or 64 bit), if a list can contain duplicate values or not.
Windows (x64)
Raspberry Pi OS (Arm64)
Raspberry Pi OS (Arm64)
-
- Addict
- Posts: 1309
- Joined: Fri Aug 28, 2015 6:10 pm
- Location: Portugal
Re: List Duplicates
Hi
Will try the map approach. Seems to make sense to me.
Each list has no duplicates within the list but they are not guaranteed to hold the same numbers. I tried the sort list approach, quick but not infallible.
Thanks to all
Just thought a little more. The idea is to end up with a list of numbers in the second list that do not appear in the first list. The map approach may not work.
Regards
CD
Will try the map approach. Seems to make sense to me.
Each list has no duplicates within the list but they are not guaranteed to hold the same numbers. I tried the sort list approach, quick but not infallible.
Thanks to all
Just thought a little more. The idea is to end up with a list of numbers in the second list that do not appear in the first list. The map approach may not work.
Regards
CD
Any intelligent fool can make things bigger and more complex. It takes a touch of genius — and a lot of courage to move in the opposite direction.
Re: List Duplicates
If you sort both lists, it should work fine.collectordave wrote:Each list has no duplicates within the list but they are not guaranteed to hold the same numbers. I tried the sort list approach, quick but not infallible.
If the numbers are not very high (for example all in the range of 0 - 1,000,000), I would allocate a piece of memory and set and test bits to check if a value already exists.
Should be much faster.
Windows (x64)
Raspberry Pi OS (Arm64)
Raspberry Pi OS (Arm64)
Re: List Duplicates
Hi collectordave,
See the following:
- Services, Stuff & Shellhook
-- Stuff\SQLiteDatabase\SELECT.pb
NB*: Instead of using databse-memory, you could create the database with list one, and only need to replace list two for the compare.
See the following:
- Services, Stuff & Shellhook
-- Stuff\SQLiteDatabase\SELECT.pb
NB*: Instead of using databse-memory, you could create the database with list one, and only need to replace list two for the compare.
Last edited by JHPJHP on Thu Mar 29, 2018 8:32 pm, edited 4 times in total.
-
- Addict
- Posts: 1309
- Joined: Fri Aug 28, 2015 6:10 pm
- Location: Portugal
Re: List Duplicates
Hi All
Sorry may have helped if I had written that it was a database app in the first place.
JHPJHPs solution works very well no need for lists or maps gives the right answer each time and is so much quicker than my puny effort.
Thanks to all.
CD
Sorry may have helped if I had written that it was a database app in the first place.
JHPJHPs solution works very well no need for lists or maps gives the right answer each time and is so much quicker than my puny effort.
Thanks to all.
CD
Any intelligent fool can make things bigger and more complex. It takes a touch of genius — and a lot of courage to move in the opposite direction.
Re: List Duplicates
If you are still interested, depending on implementation, a map approach may be faster.
Code: Select all
Define.s DatabaseMemory, dbSQL, List1_Filename, List2_Filename, List3_Filename, ListValue
UseSQLiteDatabase()
DatabaseMemory = ":memory:"
List1_Filename = "Files/List1.txt"
List2_Filename = "Files/List2.txt"
List3_Filename = "Files/List3.txt"
If 0;OpenDatabase(0, DatabaseMemory, "", "", #PB_Database_SQLite)
dbSQL = "BEGIN TRANSACTION;" + #LF$
dbSQL + "CREATE TABLE list1 (value INTEGER);" + #LF$
dbSQL + "CREATE TABLE list2 (value INTEGER);" + #LF$
If ReadFile(0, List1_Filename)
While Not Eof(0)
ListValue = ReadString(0)
dbSQL + "INSERT INTO list1 (value) VALUES (" + ListValue + ");" + #LF$
Wend
CloseFile(0)
EndIf
If ReadFile(0, List2_Filename)
While Not Eof(0)
ListValue = ReadString(0)
dbSQL + "INSERT INTO list2 (value) VALUES (" + ListValue + ");" + #LF$
Wend
CloseFile(0)
EndIf
dbSQL + "COMMIT;"
DatabaseUpdate(0, dbSQL) : ListValue = #Null$
If DatabaseQuery(0, "SELECT value FROM list2 WHERE value NOT IN (SELECT value FROM list1);")
While NextDatabaseRow(0)
ListValue + Str(GetDatabaseLong(0, 0)) + #LF$
Wend
FinishDatabaseQuery(0)
EndIf
CloseDatabase(0)
If CreateFile(0, List3_Filename)
WriteString(0, ListValue)
CloseFile(0)
EndIf
Else ; nosql
Define NewMap mapL1.i()
Define NewMap mapL3.i()
Define.s r$
If ReadFile(0, List1_Filename)
While Not Eof(0)
AddMapElement(mapL1(), ReadString(0))
Wend
CloseFile(0)
EndIf
If ReadFile(0, List2_Filename)
While Not Eof(0)
r$ = ReadString(0)
If FindMapElement(mapL1(), r$) = 0
AddMapElement(mapL3(), r$)
EndIf
Wend
CloseFile(0)
EndIf
Debug "-- List 2 entries NOT in List 1 --"
ForEach mapL3()
Debug MapKey(mapL3()); + ", " + mapL3()
Next
EndIf
The nice thing about standards is there are so many to choose from. ~ Andrew Tanenbaum