Make string comparing faster

Got an idea for enhancing PureBasic? New command(s) you'd like to see?
wilbert
PureBasic Expert
PureBasic Expert
Posts: 3870
Joined: Sun Aug 08, 2004 5:21 am
Location: Netherlands

Re: Make string comparing faster

Post by wilbert »

Saki wrote:Due to the double-zero termination, strings which are only one character long are also recognized correctly.
Where did you find PB strings use a double-zero termination ?
As far as I know only a single Null character is used to terminate a string. :?
Windows (x64)
Raspberry Pi OS (Arm64)
User avatar
Saki
Addict
Addict
Posts: 830
Joined: Sun Apr 05, 2020 11:28 am
Location: Pandora

Re: Make string comparing faster

Post by Saki »

It corresponds to the logic to search the Unicode string word by word for the termination,
so a word is also expected as termination.

Code: Select all

a$=" "
ShowMemoryViewer(@a$, 100)
地球上の平和
User avatar
Keya
Addict
Addict
Posts: 1891
Joined: Thu Jun 04, 2015 7:10 am

Re: Make string comparing faster

Post by Keya »

Unicode uses two null bytes (0x0000) for string termination - a single null byte is not sufficient because many Unicode characters contain null bytes as either the high or the low byte
wilbert
PureBasic Expert
PureBasic Expert
Posts: 3870
Joined: Sun Aug 08, 2004 5:21 am
Location: Netherlands

Re: Make string comparing faster

Post by wilbert »

I misunderstood.
I thought with double-zero termination you meant two null characters (four zero bytes).

An empty string only contains a null character. In this case comparing the first two characters in theory can result in a crash if the address of the second character you try to read is on a memory page that doesn't allow to access its memory.
So if you want to compare the first two characters at once, you need to be sure that both strings have a minimum length of one.
Windows (x64)
Raspberry Pi OS (Arm64)
User avatar
Saki
Addict
Addict
Posts: 830
Joined: Sun Apr 05, 2020 11:28 am
Location: Pandora

Re: Make string comparing faster

Post by Saki »

Primarily, I don't think Left(2) offers any noticeable speed advantage.
I myself had always used Left(2) and for the rest the simple comparison.
The rest is not big, it's always lightning fast.
地球上の平和
User avatar
Saki
Addict
Addict
Posts: 830
Joined: Sun Apr 05, 2020 11:28 am
Location: Pandora

Re: Make string comparing faster

Post by Saki »

Hi, if you have tested these procedures, you can post the results here, then all have something from it :wink:
地球上の平和
User avatar
Mijikai
Addict
Addict
Posts: 1360
Joined: Sun Sep 11, 2016 2:17 pm

Re: Make string comparing faster

Post by Mijikai »

I remember several threads on fast strings and im sure there was also talk about string comparisons.
Besides that i distinctively remember some search pattern code which does basically the same - inline asm, very fast.
I suggest to do some searching.
infratec
Always Here
Always Here
Posts: 6817
Joined: Sun Sep 07, 2008 12:45 pm
Location: Germany

Re: Make string comparing faster

Post by infratec »

My try:

Code: Select all

CompilerIf #PB_Compiler_IsMainFile
  EnableExplicit
CompilerEndIf


Procedure.i StringCompare(*String1.Character, *String2.Character)
  
  While *String1\c = *String2\c
    If *String1\c = 0 Or *String2\c = 0
      Break
    EndIf
    *String1 + SizeOf(Character)
    *String2 + SizeOf(Character)
  Wend
  
  ProcedureReturn Bool(*String1\c = *String2\c)
  
EndProcedure



CompilerIf #PB_Compiler_IsMainFile
  
  Define a$, b$
  
  a$ = "jjkgkdjghklsdghsdklfghkldghlgögjäfjeögheögtegdfkvbjsdgkljsdfhgkdhgösdfhgklsdghskdlghksldfghskldfghsdklf"
  b$ = "jjkgkdjghklsdghsdklfghkldghlgögjäfjeögheögtegdfkvbjsdgkljsdfhgkdhgösdfhgklsdghskdlghksldfghskldfghsdklf"
  
  MessageRequester("Info", Str(StringCompare(@a$, @b$)))
  
CompilerEndIf
It is for PB strings and not for UTF8 or ASCII strings.
Of course if I convert it to asm it's faster, but this is more educative for 'normal' PB users.
Last edited by infratec on Wed Mar 31, 2021 2:50 pm, edited 3 times in total.
User avatar
Keya
Addict
Addict
Posts: 1891
Joined: Thu Jun 04, 2015 7:10 am

Re: Make string comparing faster

Post by Keya »

for 1/2/4/8 bytes the Peek functions are always going to be faster because they're not string functions like Left(), but obviously you won't notice any speed difference if you're not calling it very often.

But you can see here why Peek is always going to be faster...

Code: Select all

mystr$ = "ab"
twobytes.w = PeekW(@mystr$)
=

Code: Select all

mov eax, dword ptr [403168]  ;ASCII "ab"
call 00402030                ;call PeekW()
	movsx eax, word ptr [eax]
	retn
mov word ptr [40316C], ax
You could easily inline that to remove the call/retn -- the upcoming C version of PB will probably automatically do that.

But now the same, using a string function...

Code: Select all

mystr$ = "ab"
twobytes$ = Left(mystr$,2)
=

Code: Select all

mov edx, dword ptr [40316C]
push edx
push edx
push 2                   ;2 bytes
push dword ptr [403168]  ;ASCII "ab"
call 00402040            ;call Left()
	push ebx
	push ebp
	push esi
	push edi
	mov edi, dword ptr [esp+14]
	push edi
	call 004020B0
		...
	xor ecx, ecx
	mov esi, eax
	mov eax, dword ptr [esp+18]
	test eax, eax
	sets cl
	dec ecx
	and ecx, eax
	cmp esi, ecx
	jle short 00402064
	mov esi, ecx
	push edi
	call 004022E0                                        
		...
	push dword ptr [esp+1C]                              
	mov ebp, eax                                         
	push esi                                             
	call 00402320                                        
		...
	mov ebx, eax
	test ebp, ebp
	jz short 00402084
	push ebp                                             
	call 004023A0                                        
		...
	mov edi, eax
	test edi, edi
	jz short 0040209B
	test esi, esi
	jle short 0040209B
	push esi
	push edi
	push ebx
	call 004023C
		...
	pop edi
	pop esi
	pop ebp
	pop ebx
	retn 0C
	pop edi
	pop esi
	pop ebp
	mov byte ptr [ebx], 0
	pop ebx
	retn 0C
push offset 00403164
call 00402160            ;store result
	push ebp                                            
	mov ebp, esp
	push ecx
	mov eax, dword ptr [40316C]
	sub eax, dword ptr [ebp+0C]
	mov dword ptr [ebp-4], eax
	mov ecx, dword ptr [ebp+8]
	cmp dword ptr [ecx], 0
	jne short 00402193
	mov edx, dword ptr [ebp-4]
	add edx, 5
	push edx                                             ; /Size
	push 0                                               ; |Flags = 0
	mov eax, dword ptr [403170]                          ; |
	push eax                                             ; |Heap
	call dword ptr [<&KERNEL32.HeapAlloc>]               ; \NTDLL.RtlAllocateHeap
		...
	mov ecx, dword ptr [ebp+8]
	mov dword ptr [ecx], eax
	jmp short 004021B4
	mov edx, dword ptr [ebp-4]
	add edx, 5
	push edx                                             ; /Size
	mov eax, dword ptr [ebp+8]                           ; |
	mov ecx, dword ptr [eax]                             ; |
	push ecx                                             ; |pMem
	push 0                                               ; |Flags = 0
	mov edx, dword ptr [403170]                          ; |
	push edx                                             ; |Heap
	call dword ptr [<&KERNEL32.HeapReAlloc>]             ; \NTDLL.RtlReAllocateHeap
		...
	mov ecx, dword ptr [ebp+8]
	mov dword ptr [ecx], eax
	mov edx, dword ptr [ebp-4]
	push edx           
	mov eax, dword ptr [40302C]                     
	add eax, dword ptr [ebp+0C]                     
	push eax                                        
	mov ecx, dword ptr [ebp+8]                      
	mov edx, dword ptr [ecx]                        
	push edx                                        
	call 004023C0                                   
		...
	mov eax, dword ptr [ebp+0C]
	mov dword ptr [40316C], eax
	mov esp, ebp
	pop ebp
	retn 8
(there's actually even more code to it than this as i'm only showing the one call layer)
User avatar
skywalk
Addict
Addict
Posts: 3972
Joined: Wed Dec 23, 2009 10:14 pm
Location: Boston, MA

Re: Make string comparing faster

Post by skywalk »

Now do CompareMemoryString()?
The nice thing about standards is there are so many to choose from. ~ Andrew Tanenbaum
Post Reply