Compare String
-
- User
- Posts: 46
- Joined: Sat Jan 08, 2005 7:46 pm
Compare String
Hi,
I am looking for a procedure to compare strings very, very fast.
example:
I have to lists.
List 1: 100000 Stings (each string has about 140 different characters)
List 2= 100100 Strings (each string has about 140 different characters)
Some strings are different between List 1 and List 2 (i.g. 123)
The problem is to find out the different strings between List 2 and List 1.
The following procedure ist to slow.
ForEach List1()
For Each Liste2()
If List1() <> List2())
AddElement(List3())
List3() = List1()
endif
Next
Next
Thank you very much for your help.
I am looking for a codeexample.
Martin
I am looking for a procedure to compare strings very, very fast.
example:
I have to lists.
List 1: 100000 Stings (each string has about 140 different characters)
List 2= 100100 Strings (each string has about 140 different characters)
Some strings are different between List 1 and List 2 (i.g. 123)
The problem is to find out the different strings between List 2 and List 1.
The following procedure ist to slow.
ForEach List1()
For Each Liste2()
If List1() <> List2())
AddElement(List3())
List3() = List1()
endif
Next
Next
Thank you very much for your help.
I am looking for a codeexample.
Martin
Re: Compare String
You could probably at least double the speed if you used string arrays rather than lists.
DE AA EB
Re: Compare String
use a map with the string as the key
that way you can check for duplications as your filling it
Another way it to use a trie though whether it would be faster than a map
is more a case of code optimization.
that way you can check for duplications as your filling it
Another way it to use a trie though whether it would be faster than a map
is more a case of code optimization.
Windows 11, Manjaro, Raspberry Pi OS


Re: Compare String
Use arrays and CompareMemoryString().
The nice thing about standards is there are so many to choose from. ~ Andrew Tanenbaum
-
- User
- Posts: 46
- Joined: Sat Jan 08, 2005 7:46 pm
Re: Compare String
Thank you very much for your help.
First of all: I am a beginner and am I not familar with
Ergebnis = CompareMemoryString(*String1, *String2 [, Modus [, Länge [, Flags]]])
Right now I have two list
ReadFile(1,FileName1$)
While Eof(1)=0
Zeile.s=ReadString(1)
AddElement(Liste2())
Liste1() = Zeile.s
Wend
ReadFile(1,FileName2$)
While Eof(1)=0
Zeile.s=ReadString(1)
AddElement(Liste2())
Liste2() = Zeile.s
Wend
How can I use the CompareMemoryString(*String1, *String2 ) to compare both List in a loop.
Sorry for the question
First of all: I am a beginner and am I not familar with
Ergebnis = CompareMemoryString(*String1, *String2 [, Modus [, Länge [, Flags]]])
Right now I have two list
ReadFile(1,FileName1$)
While Eof(1)=0
Zeile.s=ReadString(1)
AddElement(Liste2())
Liste1() = Zeile.s
Wend
ReadFile(1,FileName2$)
While Eof(1)=0
Zeile.s=ReadString(1)
AddElement(Liste2())
Liste2() = Zeile.s
Wend
How can I use the CompareMemoryString(*String1, *String2 ) to compare both List in a loop.
Sorry for the question
Re: Compare String
martin66119 forgot to mention the corresponding thread in the german forum:
http://www.purebasic.fr/german/viewtopi ... 90#p311612
Greetings ... Kiffi
http://www.purebasic.fr/german/viewtopi ... 90#p311612
Greetings ... Kiffi
Hygge
-
- User
- Posts: 46
- Joined: Sat Jan 08, 2005 7:46 pm
Re: Compare String
Yes you are right. I have forgotten to mentioned that.
Re: Compare String
well if your reading from a file you could adapt this
list 1 holds the main list from s1
list 2 stores the duplicates from s2
list 3 stores the unique from s2
list 4 the order of the second list pointing to items in list2 and list 3
list 1 holds the main list from s1
list 2 stores the duplicates from s2
list 3 stores the unique from s2
list 4 the order of the second list pointing to items in list2 and list 3
Code: Select all
Global NewMap MA.s()
Global NewList list1() ;main list
Global NewList List2() ;duplicates
Global NewList list3() ;unique
Global NewList list4() ;order of the second list pointing to items in list2() or list3()
Procedure DeDup(s1.s,s2.s)
AddMapElement(MA(),s1)
AddElement(list1())
MA() = s1
list1() = @MA()
If FindMapElement(ma(),s2)
AddElement(list2())
AddElement(list4())
list2() = @MA()
list4() = @list2()
Else
AddMapElement(MA(),s2)
MA() = s2
AddElement(list3())
AddElement(list4())
list3() = @MA()
list4() = @list3()
EndIf
EndProcedure
Procedure FillLists()
For a = 1 To 100
If a % 2
b= a + 100
Else
b=a
EndIf
DeDup("some string " + Str(a),"some string "+ Str(b))
Next
EndProcedure
FillLists()
*ps.string
ForEach list2()
*ps = list2()
Debug "duplicate " + *ps\s
Next
ForEach list3()
*ps = list3()
Debug "unique " + *ps\s
Next
ForEach list4()
*ps = PeekI(list4())
Debug "order " + *ps\s
Next
Windows 11, Manjaro, Raspberry Pi OS


-
- User
- Posts: 46
- Joined: Sat Jan 08, 2005 7:46 pm
Re: Compare String
Thanks a lot for the example.
Do you have an other example using CompareMemoryString.
I do not know if the following lines are ok!
Do you have an other example using CompareMemoryString.
I do not know if the following lines are ok!
Code: Select all
Dim Array1 (20000)
Dim Array2 (20000)
Dim Array3 (20000)
j = 0
For i = 1 to LastElement Array1
*String1 = @Array(i)
For j = 1 to LastElement Array2
*String2 = @Array2(j)
Result = CompareMemoryString(*String1, *String2))
If Result = #PB_String_Equal
else
Array3(j) = *String1
j = j +1
Endif
Next
Next
Re: Compare String
If you're only comparing two strings... If FindString() = 0. 

www.posemotion.com
PureBasic Tools for OS X: PureMonitor, plist Tool, Data Maker & App Chef
Even the vine knows it surroundings but the man with eyes does not.
PureBasic Tools for OS X: PureMonitor, plist Tool, Data Maker & App Chef
Even the vine knows it surroundings but the man with eyes does not.
Re: Compare String
If you're reading the strings in from two files it will be more effective to sort the items as you read in the second file
rather than sorting them out after you've loaded the files.
Do you just want a merged list or do you need to keep references of duplicates or unique items?
if you need a reference to a list of duplicates and unique items
or if you just want a combined list
rather than sorting them out after you've loaded the files.
Do you just want a merged list or do you need to keep references of duplicates or unique items?
if you need a reference to a list of duplicates and unique items
Code: Select all
Global NewMap Ma.s()
Global NewList list1() ;main list
Global NewList ListDuplicate() ;duplicates
Global NewList listUnique() ;unique
Procedure Filllist1(s1.s)
AddElement(list1())
If FindMapElement(Ma(),s1)
list1() = @MA()
Else
AddMapElement(Ma(),s1)
Ma() = s1
list1() = @ma()
EndIf
EndProcedure
Procedure FillList2(s2.s)
If FindMapElement(ma(),s2)
AddElement(listDuplicate())
listDuplicate() = @MA()
Else
AddMapElement(MA(),s2)
Ma() = s2
AddElement(listUnique())
listunique() = @Ma()
EndIf
EndProcedure
Define a
;read in file 1
For a = 1 To 100
Filllist1("some string " + Str(a))
Next
;read in file 2
For a = 50 To 150
FillList2("some string " + Str(a))
Next
Define *ps.string
ForEach listUnique()
*ps = listUnique()
Debug "unique " + *ps\s
Next
Code: Select all
Global NewMap Ma.s()
Global NewList listCombined()
Procedure Filllist(s1.s)
If Not FindMapElement(Ma(),s1)
AddMapElement(Ma(),s1)
AddElement(listCombined())
Ma() = s1
listCombined() = @ma()
EndIf
EndProcedure
Define a
For a = 1 To 100
Filllist("some string " + Str(a))
Next
For a = 50 To 150
FillList("some string " + Str(a))
Next
Define *ps.string
Debug "the combined lists output"
ForEach listCombined()
*ps = listCombined()
Debug *ps\s
Next
Windows 11, Manjaro, Raspberry Pi OS


-
- Enthusiast
- Posts: 252
- Joined: Mon Aug 16, 2010 4:29 am
Re: Compare String
@martin66119
Here is your CompareMemoryString snippet tweaked to summarize Strings_in_Array1_not_in_Array2 and vice versa using two different methods. idle's map should be faster.
Here is your CompareMemoryString snippet tweaked to summarize Strings_in_Array1_not_in_Array2 and vice versa using two different methods. idle's map should be faster.

Code: Select all
Na1 = 2000 : Na2 = 3004 : Na3 = Na1 + Na2 : Na4 = Na2
Dim Array1.s (Na1) : Dim Array2.s (Na2) : Dim Matches.i(Na4) : Dim Array3.s (Na3)
DefaultStr.s = "DefStr" : DiffStr.s = "DiffStr"
For i = 1 To Na1
Array1(i) = DefaultStr ; Initialize
Next
For i = 1 To Na2
Array2(i) = DefaultStr
Next
Array1(1) = DiffStr + "11" : Array1(Na1/2) = DiffStr + "100001" : Array1(Na1) = DiffStr + "200001" ; Set up mismatches to find
Array2(1) = DiffStr + "12" : Array2(Na2/2) = DiffStr + "100002" : Array2(Na2 -2) = DiffStr + "200002" : Array2(Na2) = DiffStr + "200022"
k = 1
For i = 1 To Na1 ; Test for differences
*String1 = @Array1(i)
Ifound = 0
For j = 1 To Na2
*String2 = @Array2(j)
Result = CompareMemoryString(*String1, *String2)
If Result = #PB_String_Equal
Matches(j) = 1
Ifound = 1
EndIf
Next j
If Ifound = 0
Array3(k) = Array1(i) + " From Array1("+Str(i)+")" ; Save Strings_in_Array1_not_in_2
k = k +1
EndIf
Next i
For i = 1 To Na2
If Matches(i) = 0
Array3(k) = Array2(i) + " From Array2("+Str(i)+")" ; Save Strings_in_Array2_not_in_1
k = k +1
EndIf
Next
For i = 1 To k-1
Debug Array3(i)
Next
Re: Compare String
Here is a variation of idle's code that I think results in a speed improvement:
The main point on this code is that the strings don't need to be stored in the map because they are already stored as the key. If you wanted to stored something in the map then you might store the line number where it occurs. The elements could then be retrieved and placed in a list and sorted by the line# to get the results in order.
Code: Select all
;set number of simulated lines each file to be read
CompilerIf #PB_Compiler_Debugger
#tests = 100 ;limit the size for debugged output
CompilerElse
#tests = 100000
CompilerEndIf
;speed will increase for a larger number of slots in the map, but use more memory (as written it uses about 48k/map for 100000 lines in file)
Global NewMap map1.i(#tests / 8)
Global NewMap noMatch(#tests / 8)
;if the strings are needed in the order they first occur in the file then set line = to the line where the string occurs
Procedure recordUniqueString(S1.s, line = 0)
If Not FindMapElement(map1(), S1)
noMatch(S1) = line
EndIf
EndProcedure
RandomSeed(0) ;ensure the results are for the same random collection each time, if comparing methods
Define a, t = ElapsedMilliseconds()
;read file 1 and add it to map 1 to keep a collection of unique strings for the file, each string is simulated
For a = 1 To #tests
map1("some string " + Str(Random(a)))
Next
;read file 2 and create a list of only unique strings that aren't in file 1, each string is simulated
For a = 1 To #tests
recordUniqueString("some string " + Str(Random(a)), a)
Next
Define output$ = "Lines in each file: " + Str(#tests) + #LF$
output$ + "Unique strings in file 1 = " + Str(MapSize(map1())) + #LF$
output$ + "Unique strings in file 2 not in file 1 = " + Str(MapSize(noMatch())) + #LF$
output$ + "Time: " + Str(ElapsedMilliseconds() - t)
MessageRequester("results", output$)
;to display the strings which are unique retrieve the map keys of noMatch()
CompilerIf #PB_Compiler_Debugger
Debug "unique lines in file 2, unsorted"
ForEach noMatch()
Debug MapKey(noMatch()) + " on line " + Str(noMatch())
Next
;sorted order
Structure lineEntries
s.s ;string
n.i ;line number
EndStructure
NewList results.lineEntries()
ForEach noMatch()
AddElement(results()): results()\s = MapKey(noMatch()): results()\n = noMatch()
Next
SortStructuredList(results(), #PB_Sort_Ascending, OffsetOf(lineEntries\n), #PB_Sort_Integer)
Debug "unique lines in file 2, sorted by line order"
ForEach results()
Debug results()\s + " on line " + Str(results()\n)
Next
CompilerEndIf
-
- User
- Posts: 46
- Joined: Sat Jan 08, 2005 7:46 pm
Re: Compare String
Hi,
thank you very much for your help and sorry for my late answer. But a had not time yesterday.
I have tried your proposal but they are not faster or I do not integrate the code not correctly in my code and therefore the proposal wasn´t faster. I suppose that the integration in my code (I have made) was not ok and therefore there is no improvement.
Here is my code a bit modified for testing.
thank you very much for your help and sorry for my late answer. But a had not time yesterday.
I have tried your proposal but they are not faster or I do not integrate the code not correctly in my code and therefore the proposal wasn´t faster. I suppose that the integration in my code (I have made) was not ok and therefore there is no improvement.
Here is my code a bit modified for testing.
Code: Select all
NewList Listing1.s()
NewList Listing2.s()
NewMap File1.s(1000000)
NewMap File2.s(1000000)
NewMap File3.s(1000000)
NewMap File4.s(1000000)
Count = 500000
For i = 1 To count
Line.s = "This is the test string to verify the speed of different codes"
AddMapElement(File1(), Line.s,#PB_Map_ElementCheck)
AddElement(Listing1())
Listing1() = Line.s
Next i
For i = 1 To count
Line.s = "This is the test string to verify the speed of different codes" + Str(Random(10))
AddMapElement(File2(), Line.s,#PB_Map_ElementCheck)
AddElement(Listing2())
Listing2() = Line.s
Next i
;-----------------------------------
MessageRequester ("","Start")
StartTime = ElapsedMilliseconds()
;-----------------------------------
ForEach Listing1()
If FindMapElement(File2(), Listing1())
Else
AddMapElement(File3(), Listing1(),#PB_Map_ElementCheck)
EndIf
Next
;--------------------------------
ForEach Listing2()
If FindMapElement(File1(), Listing2())
Else
AddMapElement(File4(), Listing2(),#PB_Map_ElementCheck)
EndIf
Next
;---------------------------------
Time =ElapsedMilliseconds() - StartTime
CountFile3 = MapSize(File3())
CountFile4 = MapSize(File4())
MessageRequester("Execution Time", "Time: " + Str(Time) + " ms")
MessageRequester("File3", "File3: " + Str(CountFile3) + " File4: " +Str(CountFile4))
-
- User
- Posts: 46
- Joined: Sat Jan 08, 2005 7:46 pm
Re: Compare String
Hi again,
in following code I use FindMapElement(....) to find the corresponding strings
but I like to use "comparememorystring" in the relevant loops. I do not know to change the code taht it works using "comparememorystring". Please could you helb me.
Thank you very much in advance
in following code I use FindMapElement(....) to find the corresponding strings
but I like to use "comparememorystring" in the relevant loops. I do not know to change the code taht it works using "comparememorystring". Please could you helb me.
Thank you very much in advance
Code: Select all
ewList Listing1.s()
NewList Listing2.s()
NewMap File1.s(1000000)
NewMap File2.s(1000000)
NewMap File3.s(1000000)
NewMap File4.s(1000000)
Count = 500000
For i = 1 To count
Line.s = "This is the test string to verify the speed of different codes"
AddMapElement(File1(), Line.s,#PB_Map_ElementCheck)
AddElement(Listing1())
Listing1() = Line.s
Next i
For i = 1 To count
Line.s = "This is the test string to verify the speed of different codes" + Str(Random(10))
AddMapElement(File2(), Line.s,#PB_Map_ElementCheck)
AddElement(Listing2())
Listing2() = Line.s
Next i
;-----------------------------------
MessageRequester ("","Start")
StartTime = ElapsedMilliseconds()
;-----------------------------------
ForEach Listing1()
If FindMapElement(File2(), Listing1())
Else
AddMapElement(File3(), Listing1(),#PB_Map_ElementCheck)
EndIf
Next
;--------------------------------
ForEach Listing2()
If FindMapElement(File1(), Listing2())
Else
AddMapElement(File4(), Listing2(),#PB_Map_ElementCheck)
EndIf
Next
;---------------------------------
Time =ElapsedMilliseconds() - StartTime
CountFile3 = MapSize(File3())
CountFile4 = MapSize(File4())
MessageRequester("Execution Time", "Time: " + Str(Time) + " ms")
MessageRequester("File3", "File3: " + Str(CountFile3) + " File4: " +Str(CountFile4))