Page 1 of 2

Make string comparing faster

Posted: Mon Mar 29, 2021 11:49 pm
by Josh
String comparisons (is equal only) could be much faster by comparing only the pointers. This makes the comparison faster by a factor of 2-3 already for short strings, for longer strings the factor increases more and more.

Code: Select all

;Test with short strings
;a$="x"
;b$="x"

;Test with longer strings
a$="jjkgkdjghklsdghsdklfghkldghlgögjäfjeögheögtegdfkvbjsdgkljsdfhgkdhgösdfhgklsdghskdlghksldfghskldfghsdklfghskldfghklsdghsdklfghskldfghsdklfghsdklghklsdfghksldfghksdghksdfghskdghklsdg"
b$="jjkgkdjghklsdghsdklfghkldghlgögjäfjeögheögtegdfkvbjsdgkljsdfhgkdhgösdfhgklsdghskdlghksldfghskldfghsdklfghskldfghklsdghsdklfghskldfghsdklfghsdklghklsdfghksldfghksdghksdfghskdghklsdg"


time1 = ElapsedMilliseconds()
For i = 1 To 10000000
  If a$ = b$
    x1 + 1
  EndIf
Next
time1 = ElapsedMilliseconds() - time1


time2 = ElapsedMilliseconds()
For i = 1 To 10000000
  If PeekI (@a$) = PeekI (@b$)
    x2 + 1
  EndIf
Next
time2 = ElapsedMilliseconds() - time2


MessageRequester ("", "" + x1 + " - " + time1 + #CRLF$ + x2 + " - " + time2)

I am not 100% sure if a negative comparison is valid in every case. A positive comparison is valid in any case and the time reduction would even be worth to make a precheck for a positiv result.

Re: Make string comparing faster

Posted: Tue Mar 30, 2021 12:02 am
by Saki
Hi,
you can use CompareMemoryString() or Comparememory()

Re: Make string comparing faster

Posted: Tue Mar 30, 2021 12:10 am
by Josh
Saki wrote:Hi,
you can use CompareMemoryString() or Comparememory()
None of this comes close to a simple comparison of the string pointers.

Re: Make string comparing faster

Posted: Tue Mar 30, 2021 12:16 am
by BarryG
Huh? How does PeekI() know if the strings are the same? Your amended code below says they are, when they're clearly not (I just added a new last char to one string):

Code: Select all

a$="jjkgkdjghklsdghsdklfghkldghlgögjäfjeögheögtegdfkvbjsdgkljsdfhgkdhgösdfhgklsdghskdlghksldfghskldfghsdklfghskldfghklsdghsdklfghskldfghsdklfghsdklghklsdfghksldfghksdghksdfghskdghklsdg"
b$="jjkgkdjghklsdghsdklfghkldghlgögjäfjeögheögtegdfkvbjsdgkljsdfhgkdhgösdfhgklsdghskdlghksldfghskldfghsdklfghskldfghklsdghsdklfghskldfghsdklfghsdklghklsdfghksldfghksdghksdfghskdghklsdgz"

time2 = ElapsedMilliseconds()
For i = 1 To 10000000
  If PeekI (@a$) = PeekI (@b$)
    x2 + 1
  EndIf
Next
time2 = ElapsedMilliseconds() - time2

MessageRequester ("", "" + x2 + " - " + time2) ; x2 = 10000000 when it should be 0

Re: Make string comparing faster

Posted: Tue Mar 30, 2021 12:28 am
by Josh
BarryG wrote:Huh? How does PeekI() know if the strings are the same?
A string is a pointer to a pointer. PurbeBasic keeps an internal management over string literals. If a string literal is used more than once, it is still only created once in memory. This is even true for a combination of strings and constants.

Re: Make string comparing faster

Posted: Tue Mar 30, 2021 12:29 am
by BarryG
So my amended code should show 0 then, instead of 10000000, because the two literal strings are different.

This code says the two literal strings are the same too, when they're clearly not:

Code: Select all

a$="1234567890"
b$="12345678901234567890"
Debug PeekI(@a$) ; 3276849
Debug PeekI(@b$) ; 3276849

Re: Make string comparing faster

Posted: Tue Mar 30, 2021 12:51 am
by Josh
All back. I don't know where I made my mental error. Maybe I should not think too much at this time :D

Re: Make string comparing faster

Posted: Tue Mar 30, 2021 4:27 am
by kenmo
If you PeekI() at a string address, you're just getting an integer of the first 32 or 64 bits of its character contents!

You're right (IIRC) the PB compiler groups together string literal constants of the same text.
But not for string variables, at runtime, like A$ and B$. They are separate vars at separate memory addresses, even if they currently hold the same contents.

Code: Select all

A$ = "1_2345678"
B$ = "1_2345678"

Debug "A$: " + A$
Debug "B$: " + B$

Debug ""
Debug "Address of A: " + Hex(@A$)
Debug "Address of B: " + Hex(@B$)

Debug ""
Debug "PeekI @A$: " + Hex(PeekI(@A$))
Debug "PeekI @B$: " + Hex(PeekI(@B$))

Re: Make string comparing faster

Posted: Tue Mar 30, 2021 5:07 am
by skywalk
Josh admitted this was a brain fart. We've all had them. :oops:

My most recent was trying to add a feature while refactoring some unrelated code and then debugging a monsoon of errors. Set me back almost 4 days. Had to pull out the feature to solve it. Oooh baby, dare I go back in?

Re: Make string comparing faster

Posted: Tue Mar 30, 2021 6:47 am
by BarryG
skywalk wrote:Had to pull out the feature to solve it. Oooh baby, dare I go back in?
Oh yes, I've had that before. Sometimes spend days doing something and then ripping it all out because it wasn't working. Still kept the non-working version for one of those days when I feel like having another go.

Re: Make string comparing faster

Posted: Tue Mar 30, 2021 10:22 am
by Saki
You do it in a completely different way.

First you only check if the first two characters are identical.
If these first two characters are identical, then you check if the rest is identical.

That goes off like a rocket, for sure. :wink:

There is no faster method for hashes, names, or address data...

This simple procedure is amazingly hardly known,
where one can say that one often before many forest the trees no longer sees.

Re: Make string comparing faster

Posted: Tue Mar 30, 2021 1:16 pm
by davido
@Saki,
How clever! I never thought of that! :)
How do you check the 1st two characters? Perhaps by the Left() function or so you have something better?

Re: Make string comparing faster

Posted: Tue Mar 30, 2021 1:40 pm
by Saki
Hi Davido
You can simply use PeekL for example.

But I don't think that Left() is noticeably slower,
because the end of the string does not have to be searched first.

Re: Make string comparing faster

Posted: Tue Mar 30, 2021 1:40 pm
by Keya
davido wrote:How do you check the 1st two characters? Perhaps by the Left() function or so you have something better?
PeekW() to read 2 bytes. I'm not sure why check only the first two bytes though, as opposed to the first one or four or eight etc, and you still need to know the length of the string and Len() is slow because it checks every byte until it gets to a null byte

Re: Make string comparing faster

Posted: Tue Mar 30, 2021 1:47 pm
by Saki
Hi Keya,
With Left() it would be illogical if BP would search the end of the string first,
you will find that anyway if you want to truncate the first two characters.

But PeekL has probably advantages, because you set a pointer to the first 4 bytes, not more.

Due to the double-zero termination, strings which are only one character long are also recognized correctly.