Make string comparing faster

Got an idea for enhancing PureBasic? New command(s) you'd like to see?
User avatar
Josh
Addict
Addict
Posts: 1183
Joined: Sat Feb 13, 2010 3:45 pm

Make string comparing faster

Post by Josh »

String comparisons (is equal only) could be much faster by comparing only the pointers. This makes the comparison faster by a factor of 2-3 already for short strings, for longer strings the factor increases more and more.

Code: Select all

;Test with short strings
;a$="x"
;b$="x"

;Test with longer strings
a$="jjkgkdjghklsdghsdklfghkldghlgögjäfjeögheögtegdfkvbjsdgkljsdfhgkdhgösdfhgklsdghskdlghksldfghskldfghsdklfghskldfghklsdghsdklfghskldfghsdklfghsdklghklsdfghksldfghksdghksdfghskdghklsdg"
b$="jjkgkdjghklsdghsdklfghkldghlgögjäfjeögheögtegdfkvbjsdgkljsdfhgkdhgösdfhgklsdghskdlghksldfghskldfghsdklfghskldfghklsdghsdklfghskldfghsdklfghsdklghklsdfghksldfghksdghksdfghskdghklsdg"


time1 = ElapsedMilliseconds()
For i = 1 To 10000000
  If a$ = b$
    x1 + 1
  EndIf
Next
time1 = ElapsedMilliseconds() - time1


time2 = ElapsedMilliseconds()
For i = 1 To 10000000
  If PeekI (@a$) = PeekI (@b$)
    x2 + 1
  EndIf
Next
time2 = ElapsedMilliseconds() - time2


MessageRequester ("", "" + x1 + " - " + time1 + #CRLF$ + x2 + " - " + time2)

I am not 100% sure if a negative comparison is valid in every case. A positive comparison is valid in any case and the time reduction would even be worth to make a precheck for a positiv result.
Last edited by Josh on Tue Mar 30, 2021 12:02 am, edited 1 time in total.
sorry for my bad english
User avatar
Saki
Addict
Addict
Posts: 830
Joined: Sun Apr 05, 2020 11:28 am
Location: Pandora

Re: Make string comparing faster

Post by Saki »

Hi,
you can use CompareMemoryString() or Comparememory()
地球上の平和
User avatar
Josh
Addict
Addict
Posts: 1183
Joined: Sat Feb 13, 2010 3:45 pm

Re: Make string comparing faster

Post by Josh »

Saki wrote:Hi,
you can use CompareMemoryString() or Comparememory()
None of this comes close to a simple comparison of the string pointers.
sorry for my bad english
BarryG
Addict
Addict
Posts: 4138
Joined: Thu Apr 18, 2019 8:17 am

Re: Make string comparing faster

Post by BarryG »

Huh? How does PeekI() know if the strings are the same? Your amended code below says they are, when they're clearly not (I just added a new last char to one string):

Code: Select all

a$="jjkgkdjghklsdghsdklfghkldghlgögjäfjeögheögtegdfkvbjsdgkljsdfhgkdhgösdfhgklsdghskdlghksldfghskldfghsdklfghskldfghklsdghsdklfghskldfghsdklfghsdklghklsdfghksldfghksdghksdfghskdghklsdg"
b$="jjkgkdjghklsdghsdklfghkldghlgögjäfjeögheögtegdfkvbjsdgkljsdfhgkdhgösdfhgklsdghskdlghksldfghskldfghsdklfghskldfghklsdghsdklfghskldfghsdklfghsdklghklsdfghksldfghksdghksdfghskdghklsdgz"

time2 = ElapsedMilliseconds()
For i = 1 To 10000000
  If PeekI (@a$) = PeekI (@b$)
    x2 + 1
  EndIf
Next
time2 = ElapsedMilliseconds() - time2

MessageRequester ("", "" + x2 + " - " + time2) ; x2 = 10000000 when it should be 0
User avatar
Josh
Addict
Addict
Posts: 1183
Joined: Sat Feb 13, 2010 3:45 pm

Re: Make string comparing faster

Post by Josh »

BarryG wrote:Huh? How does PeekI() know if the strings are the same?
A string is a pointer to a pointer. PurbeBasic keeps an internal management over string literals. If a string literal is used more than once, it is still only created once in memory. This is even true for a combination of strings and constants.
Last edited by Josh on Tue Mar 30, 2021 12:29 am, edited 1 time in total.
sorry for my bad english
BarryG
Addict
Addict
Posts: 4138
Joined: Thu Apr 18, 2019 8:17 am

Re: Make string comparing faster

Post by BarryG »

So my amended code should show 0 then, instead of 10000000, because the two literal strings are different.

This code says the two literal strings are the same too, when they're clearly not:

Code: Select all

a$="1234567890"
b$="12345678901234567890"
Debug PeekI(@a$) ; 3276849
Debug PeekI(@b$) ; 3276849
User avatar
Josh
Addict
Addict
Posts: 1183
Joined: Sat Feb 13, 2010 3:45 pm

Re: Make string comparing faster

Post by Josh »

All back. I don't know where I made my mental error. Maybe I should not think too much at this time :D
sorry for my bad english
User avatar
kenmo
Addict
Addict
Posts: 2033
Joined: Tue Dec 23, 2003 3:54 am

Re: Make string comparing faster

Post by kenmo »

If you PeekI() at a string address, you're just getting an integer of the first 32 or 64 bits of its character contents!

You're right (IIRC) the PB compiler groups together string literal constants of the same text.
But not for string variables, at runtime, like A$ and B$. They are separate vars at separate memory addresses, even if they currently hold the same contents.

Code: Select all

A$ = "1_2345678"
B$ = "1_2345678"

Debug "A$: " + A$
Debug "B$: " + B$

Debug ""
Debug "Address of A: " + Hex(@A$)
Debug "Address of B: " + Hex(@B$)

Debug ""
Debug "PeekI @A$: " + Hex(PeekI(@A$))
Debug "PeekI @B$: " + Hex(PeekI(@B$))
User avatar
skywalk
Addict
Addict
Posts: 4211
Joined: Wed Dec 23, 2009 10:14 pm
Location: Boston, MA

Re: Make string comparing faster

Post by skywalk »

Josh admitted this was a brain fart. We've all had them. :oops:

My most recent was trying to add a feature while refactoring some unrelated code and then debugging a monsoon of errors. Set me back almost 4 days. Had to pull out the feature to solve it. Oooh baby, dare I go back in?
The nice thing about standards is there are so many to choose from. ~ Andrew Tanenbaum
BarryG
Addict
Addict
Posts: 4138
Joined: Thu Apr 18, 2019 8:17 am

Re: Make string comparing faster

Post by BarryG »

skywalk wrote:Had to pull out the feature to solve it. Oooh baby, dare I go back in?
Oh yes, I've had that before. Sometimes spend days doing something and then ripping it all out because it wasn't working. Still kept the non-working version for one of those days when I feel like having another go.
User avatar
Saki
Addict
Addict
Posts: 830
Joined: Sun Apr 05, 2020 11:28 am
Location: Pandora

Re: Make string comparing faster

Post by Saki »

You do it in a completely different way.

First you only check if the first two characters are identical.
If these first two characters are identical, then you check if the rest is identical.

That goes off like a rocket, for sure. :wink:

There is no faster method for hashes, names, or address data...

This simple procedure is amazingly hardly known,
where one can say that one often before many forest the trees no longer sees.
地球上の平和
davido
Addict
Addict
Posts: 1890
Joined: Fri Nov 09, 2012 11:04 pm
Location: Uttoxeter, UK

Re: Make string comparing faster

Post by davido »

@Saki,
How clever! I never thought of that! :)
How do you check the 1st two characters? Perhaps by the Left() function or so you have something better?
DE AA EB
User avatar
Saki
Addict
Addict
Posts: 830
Joined: Sun Apr 05, 2020 11:28 am
Location: Pandora

Re: Make string comparing faster

Post by Saki »

Hi Davido
You can simply use PeekL for example.

But I don't think that Left() is noticeably slower,
because the end of the string does not have to be searched first.
Last edited by Saki on Tue Mar 30, 2021 1:42 pm, edited 1 time in total.
地球上の平和
User avatar
Keya
Addict
Addict
Posts: 1890
Joined: Thu Jun 04, 2015 7:10 am

Re: Make string comparing faster

Post by Keya »

davido wrote:How do you check the 1st two characters? Perhaps by the Left() function or so you have something better?
PeekW() to read 2 bytes. I'm not sure why check only the first two bytes though, as opposed to the first one or four or eight etc, and you still need to know the length of the string and Len() is slow because it checks every byte until it gets to a null byte
User avatar
Saki
Addict
Addict
Posts: 830
Joined: Sun Apr 05, 2020 11:28 am
Location: Pandora

Re: Make string comparing faster

Post by Saki »

Hi Keya,
With Left() it would be illogical if BP would search the end of the string first,
you will find that anyway if you want to truncate the first two characters.

But PeekL has probably advantages, because you set a pointer to the first 4 bytes, not more.

Due to the double-zero termination, strings which are only one character long are also recognized correctly.
地球上の平和
Post Reply