Get the position of the last occurrence of a character in a string

Bare metal programming in PureBasic, for experienced users
XCoder
User
User
Posts: 68
Joined: Tue Dec 31, 2013 9:18 pm

Get the position of the last occurrence of a character in a string

Post by XCoder »

On several occasions I needed to find the position of the last last occurrence of a character in a string. Although this can be done using BASIC code, I decided to write a procedure using assembler to achieve this.

Code: Select all

CompilerIf #PB_Compiler_Processor = 4
    Macro eax : rax : EndMacro
    Macro ebx : rbx : EndMacro
    Macro ecx : rcx : EndMacro
    Macro edx : rdx : EndMacro
    Macro esi : rsi : EndMacro
    Macro edi : rdi : EndMacro
    Macro ebp : rbp : EndMacro
    Macro esp : rsp : EndMacro
    Macro dword : qword :EndMacro
    Macro octets : 8 :EndMacro
    Debug "x64 mode"
CompilerElse
  Macro octets : 4 :EndMacro
  Debug "x86 mode"
CompilerEndIf

Procedure.l GetPosOfLastChar(*searchThisString, *CharToFind)
  CompilerIf #PB_Compiler_Processor = 4 ; If compiler is in x64 mode
    !push rsi				 ;preserve registers
    !push rdi
    !mov rsi, [p.p_searchThisString+16]	;Get address of string in rsi-add 16 to the address of the string, ie 8 for each push instruction
    !mov rdi, [p.p_CharToFind+16]
    !mov rdx, rsi
  CompilerElse
    !push esi				 ;preserve registers
    !push edi
    !mov esi, [p.p_searchThisString+8] ;Get address of string in esi-add 8 to the address of the string, ie 4 for each push instruction
    !mov edi, [p.p_CharToFind+8] 
    !mov edx, esi
  CompilerEndIf      
  
  !or ecx, $ffffffff      ;Set counter to -1
  !cld                    ;makes esi count upwards when lodsw is used (hence fetches next character in string) 
  
!l_CountChars:  
  !inc ecx             	  ; increment counter (on first run this makes ecx = 0)
  !lodsw                  ; get word pointed to by esi in ax then inc esi - use word for unicode strings [lodsb for ascii strings]
  !test   al,al           ; check if low byte is zero ie the string terminator
  !jnz l_CountChars       ; get next character in string
  !mov eax, ecx 		      ; copy count of characters into eax
  
  !mov esi, edx   ; restore esi - (lodsw has changed its value)
  
  !dec ecx        ; decrease count of characters by 1 
  !shl ecx, 1     ; multiply count of characters by 2
  !add esi, ecx   ; esi now points to last char in string
  !std            ; makes esi count downwards when lodsw is used (hence fetches previous character in string)
  !mov ecx, eax   ; copy count of characters to ecx
  
!l_CheckNextChar:
  !lodsw          ; get word pointed to by esi in ax then dec esi- use word for unicode strings [lodsb for ascii strings]
  !dec ecx        ; decrease counter
  !cmp ecx, -1
  !jz l_NotFound  ;If counter is -1 then character has not been found, so return a not found value of 0
  !cmp ax, [edi]
  !jne l_CheckNextChar
  !mov ax, cx 
  !inc ax         ;Ensures position of character found starts at 1
  !jmp l_exit
  
!l_NotFound:
  !mov eax, 0
  
!l_exit:  

  CompilerIf #PB_Compiler_Processor = 4 ; If compiler is in x64 mode
    !pop rdi
    !pop rsi			 ;preserve the rsi register
  CompilerElse
    !pop edi
    !pop esi			 ;preserve the esi register
  CompilerEndIf 
  
  ProcedureReturn	;return eax
EndProcedure

CharToFind$ = "\"
a$ = "1234567890123456789\D\"
Debug GetPosOfLastChar(@a$, @CharToFind$)

a$ = "123456789012345C\"
Debug GetPosOfLastChar(@a$, @CharToFind$)

a$ = "\123456789"
Debug GetPosOfLastChar(@a$, @CharToFind$)

a$ = "123456789ABC"
Debug GetPosOfLastChar(@a$, @CharToFind$)
My knowledge of assembly language is a little rusty, so the code can probably be improved. However, I have posted this code in case others may find it useful.
User avatar
mk-soft
Always Here
Always Here
Posts: 5389
Joined: Fri May 12, 2006 6:51 pm
Location: Germany

Re: Get the position of the last occurrence of a character in a string

Post by mk-soft »

Works without ASM and should not be slower in C-Backend.

Code: Select all

Procedure GetPosOfLastChar(*String.character, *Char.character)
  Protected char, *pos.character, *found
  
  If *String = 0 Or *Char = 0
    ProcedureReturn 0
  EndIf
  
  char = *char\c
  *pos = *String
  Repeat
    If *pos\c = 0
      Break
    EndIf
    If *pos\c = char
      *found = *pos
    EndIf
    *pos + SizeOf(character)
  ForEver
  If *found
    ProcedureReturn (*found - *String) / SizeOf(character) + 1
  Else
    ProcedureReturn 0
  EndIf
EndProcedure

CharToFind$ = "\"
a$ = "1234567890123456789\D\"
r1 =  GetPosOfLastChar(@a$, @CharToFind$)
Debug r1

a$ = "123456789012345C\"
Debug GetPosOfLastChar(@a$, @CharToFind$)

a$ = "\123456789"
Debug GetPosOfLastChar(@a$, @CharToFind$)

a$ = "123456789ABC"
Debug GetPosOfLastChar(@a$, @CharToFind$)
Last edited by mk-soft on Fri Mar 04, 2022 3:29 pm, edited 1 time in total.
My Projects ThreadToGUI / OOP-BaseClass / EventDesigner V3
PB v3.30 / v5.75 - OS Mac Mini OSX 10.xx - VM Window Pro / Linux Ubuntu
Downloads on my Webspace / OneDrive
BarryG
Addict
Addict
Posts: 3318
Joined: Thu Apr 18, 2019 8:17 am

Re: Get the position of the last occurrence of a character in a string

Post by BarryG »

I don't know how good/fast this is for large strings, but it's simple and short:

Code: Select all

Procedure GetPosOfLastChar(text$,char$)
  f=FindString(ReverseString(text$),char$)
  If f
    p=Len(text$)-f+1
  EndIf
  ProcedureReturn p
EndProcedure

CharToFind$ = "\"
a$ = "1234567890123456789\D\"
r1 =  GetPosOfLastChar(a$, CharToFind$)
Debug r1

a$ = "123456789012345C\"
Debug GetPosOfLastChar(a$, CharToFind$)

a$ = "\123456789"
Debug GetPosOfLastChar(a$, CharToFind$)

a$ = "123456789ABC"
Debug GetPosOfLastChar(a$, CharToFind$)
User avatar
Paul
PureBasic Expert
PureBasic Expert
Posts: 1251
Joined: Fri Apr 25, 2003 4:34 pm
Location: Canada
Contact:

Re: Get the position of the last occurrence of a character in a string

Post by Paul »

@BarryG - Nice! (and simple)
Image Image
breeze4me
Enthusiast
Enthusiast
Posts: 523
Joined: Thu Mar 09, 2006 9:24 am
Location: S. Kor

Re: Get the position of the last occurrence of a character in a string

Post by breeze4me »

If you really want to use the asm code, you can do this.

Edit:
+ improved asm version and the version for C-backend.

However, as a result of benchmarking, it may be better in some cases to use mk-soft's PB code in the C-backend using the optimization option.
In the C-backend with the optimization option, mk-soft's PB code was almost the same speed as the asm version in the worst case.

In the C-backend WITH the optimization, use mk-soft's PB code.
In the C-backend WITHOUT the optimization, use the asm code.

Code: Select all

Procedure GetPosOfLastChar(*searchThisString, *CharToFind)
  
  If *searchThisString = 0 Or *CharToFind = 0
    ProcedureReturn 0
  EndIf
  
  CompilerIf #PB_Compiler_Backend = #PB_Backend_Asm
    CompilerIf #PB_Compiler_Processor = #PB_Processor_x86
      !mov edx, [p.p_searchThisString]
      !mov eax, [p.p_CharToFind]
      
      !push ebx
      !push esi
      
      !movzx esi, word [eax]  ;load a character to find
      
      !xor ebx, ebx
      !mov eax, edx
      !mov ecx, edx           ;copy the source string address (1)
      
      !@@:
      !mov bx, word [edx]     ;load a character from source string
      !test ebx, ebx
      !jz @f                  ;if null, exit
      
      !add edx, 2             ;address to the next character
      
      !cmp ebx, esi
      !cmovz eax, edx         ;if found, copy the address of the next character (2)
      !jmp @r
      
      !@@:
      !sub eax, ecx           ;address(2) - address(1)
      !shr eax, 1             ;divide by 2
      
      !pop esi
      !pop ebx
    CompilerElse
      !mov r9, [p.p_searchThisString]
      !mov rdx, [p.p_CharToFind]
      
      !movzx r8, word [rdx]   ;load a character to find
      
      !xor rdx, rdx
      !mov rax, r9
      !mov rcx, r9            ;copy the source string address (1)
      
      !@@:
      !mov dx, word [r9]      ;load a character from source string
      !test edx, edx
      !jz @f                  ;if null, exit
      
      !add r9, 2              ;address to the next character
      
      !cmp rdx, r8
      !cmovz rax, r9          ;if found, copy the address of the next character (2)
      !jmp @r
      
      !@@:
      !sub rax, rcx           ;address(2) - address(1)
      !shr rax, 1             ;divide by 2
    CompilerEndIf
  CompilerElse
    CompilerError "Use Asm backend"
  CompilerEndIf
  
  ProcedureReturn
EndProcedure

CharToFind$ = "\"
a$ = "1234567890123456789\D\"
Debug GetPosOfLastChar(@a$, @CharToFind$)

a$ = "123456789012345C\"
Debug GetPosOfLastChar(@a$, @CharToFind$)

a$ = "\123456789"
Debug GetPosOfLastChar(@a$, @CharToFind$)

a$ = "123456789ABC"
Debug GetPosOfLastChar(@a$, @CharToFind$)

Debug "---------"

a$ = ""
Debug GetPosOfLastChar(@a$, @CharToFind$)   ;0

CharToFind$ = ""
a$ = "1234567890123456789\D\"
Debug GetPosOfLastChar(@a$, @CharToFind$)   ;0

a$ = ""
Debug GetPosOfLastChar(@a$, @CharToFind$)   ;0

Code: Select all

Procedure GetPosOfLastChar(*searchThisString, *CharToFind)
  Protected Result
  
  If *searchThisString = 0 Or *CharToFind = 0
    ProcedureReturn 0
  EndIf
  
  !__asm__ __volatile__ (".intel_syntax noprefix;"
  
  CompilerIf #PB_Compiler_Backend = #PB_Backend_C
    CompilerIf #PB_Compiler_Processor = #PB_Processor_x86
      !"mov ebx, %1;"
      !"movzx esi, word ptr [%2];"   //load a character to find
      
      !"xor edx, edx;"
      !"mov %0, ebx;"
      !"mov ecx, ebx;"               //copy the source string address (1)
      
      !"_continue_to_search%=:;"
      !"mov dx, word ptr [ebx];"     //load a character from source string
      !"test edx, edx;"              //if null, exit
      !"jz _loop_exit%=;"
      
      !"add ebx, 2;"                 //address to the next character
      
      !"cmp edx, esi;"
      !"cmovz %0, ebx;"              //if found, copy the address of the next character (2)
      !"jmp _continue_to_search%=;"
      
      !"_loop_exit%=:;"
      !"sub %0, ecx;"                //address(2) - address(1)
      !"shr %0, 1;"                  //divide by 2
      
      !".att_syntax" 
      !:"=a" (v_result)
      !:"r" (p_searchthisstring), "r" (p_chartofind)
      !: "ebx", "ecx", "edx", "esi"
      !);
    CompilerElse
      !"mov rbx, %1;"
      !"movzx rsi, word ptr [%2];"   //load a character to find
      
      !"xor rdx, rdx;"
      !"mov %0, rbx;"
      !"mov rcx, rbx;"               //copy the source string address (1)
      
      !"_continue_to_search%=:;"
      !"mov dx, word ptr [rbx];"     //load a character from source string
      !"test edx, edx;"              //if null, exit
      !"jz _loop_exit%=;"
      
      !"add rbx, 2;"                 //address to the next character
      
      !"cmp rdx, rsi;"
      !"cmovz %0, rbx;"              //if found, copy the address of the next character (2)
      !"jmp _continue_to_search%=;"
      
      !"_loop_exit%=:;"
      !"sub %0, rcx;"                //address(2) - address(1)
      !"shr %0, 1;"                  //divide by 2
      
      !".att_syntax" 
      !:"=a" (v_result)
      !:"r" (p_searchthisstring), "r" (p_chartofind)
      !: "rbx", "rcx", "rdx", "rsi"
      !);
    CompilerEndIf
  CompilerElse
    CompilerError "Use C backend"
  CompilerEndIf
  
  ProcedureReturn Result
EndProcedure


CharToFind$ = "\"
a$ = "1234567890123456789\D\"
Debug GetPosOfLastChar(@a$, @CharToFind$)

a$ = "123456789012345C\"
Debug GetPosOfLastChar(@a$, @CharToFind$)

a$ = "\123456789"
Debug GetPosOfLastChar(@a$, @CharToFind$)

a$ = "123456789ABC"
Debug GetPosOfLastChar(@a$, @CharToFind$)

Debug "---------"

a$ = ""
Debug GetPosOfLastChar(@a$, @CharToFind$)   ;0

CharToFind$ = ""
a$ = "1234567890123456789\D\"
Debug GetPosOfLastChar(@a$, @CharToFind$)   ;0

a$ = ""
Debug GetPosOfLastChar(@a$, @CharToFind$)   ;0



Old version.

Code: Select all

Procedure GetPosOfLastChar(*searchThisString, *CharToFind)
  
  If *searchThisString = 0 Or *CharToFind = 0
    ProcedureReturn 0
  EndIf
  
  CompilerIf #PB_Compiler_Processor = #PB_Processor_x86
    !mov edx, [p.p_searchThisString]
    !mov eax, [p.p_CharToFind]
    
    !push ebx
    !push esi
    !movzx esi, word [eax]   ;load a character to find
    
    !xor eax, eax
    !xor ecx, ecx
    !xor ebx, ebx
    
    !@@:
    !mov bx, word [edx]     ;load a character from source string
    !test ebx, ebx
    !jz @f                  ;if null, exit
    
    !add edx, 2             ;address to the next character
    !inc ecx                ;current position + 1
    
    !cmp ebx, esi
    !cmovz eax, ecx         ;if found, copy current position
    !jmp @r
    
    !@@:
    !pop esi
    !pop ebx
  CompilerElse
    !mov r9, [p.p_searchThisString]
    !mov rdx, [p.p_CharToFind]
    
    !movzx r8, word [rdx]   ;load a character to find
    
    !xor rax, rax
    !xor rcx, rcx
    !xor rdx, rdx
    
    !@@:
    !mov dx, word [r9]      ;load a character from source string
    !test edx, edx
    !jz @f                  ;if null, exit
    
    !add r9, 2              ;address to the next character
    !inc ecx                ;current position + 1
    
    !cmp rdx, r8
    !cmovz eax, ecx         ;if found, copy current position
    !jmp @r
    
    !@@:
  CompilerEndIf
  
  ProcedureReturn
EndProcedure

CharToFind$ = "\"
a$ = "1234567890123456789\D\"
Debug GetPosOfLastChar(@a$, @CharToFind$)

a$ = "123456789012345C\"
Debug GetPosOfLastChar(@a$, @CharToFind$)

a$ = "\123456789"
Debug GetPosOfLastChar(@a$, @CharToFind$)

a$ = "123456789ABC"
Debug GetPosOfLastChar(@a$, @CharToFind$)

Debug "---------"

a$ = ""
Debug GetPosOfLastChar(@a$, @CharToFind$)   ;0

CharToFind$ = ""
a$ = "1234567890123456789\D\"
Debug GetPosOfLastChar(@a$, @CharToFind$)   ;0
Last edited by breeze4me on Sun Mar 06, 2022 12:20 am, edited 6 times in total.
XCoder
User
User
Posts: 68
Joined: Tue Dec 31, 2013 9:18 pm

Re: Get the position of the last occurrence of a character in a string

Post by XCoder »

Having seen the code posted by breeze4me, I realised that the code I posted above is inefficient because it goes through searchThisString to find its length then it goes backwards to find the required character whereas the code posted by breeze4me goes through searchThisString only once.

I have made a slight adjustment to the code posted by breeze4me so that it uses lodsw which (I believe) improves the efficiency of the search, particularly for long strings.

Code: Select all

Procedure GetPosOfLastChar(*searchThisString, *CharToFind)
  
  If *searchThisString = 0 Or *CharToFind = 0
    ProcedureReturn 0
  EndIf
  
  CompilerIf #PB_Compiler_Processor = #PB_Processor_x86
    !mov eax, [p.p_CharToFind]
    
    !push ebx
    !push esi
    !push edi
    
    !mov esi, [p.p_searchThisString+12]
    !movzx edi, word [eax]   ;load the character to find
    
    !xor eax, eax
    !xor ecx, ecx
    !xor ebx, ebx
    
    !cld		                ;makes esi count upwards when lodsw is used (hence fetches next character in string) 
    
    !@@:
    !lodsw                  ;get word pointed to by esi in ax then inc esi
    !test eax, eax          ;End of string?
    !jz @f                  ;if null, exit
    
    !inc ecx                ;current position + 1
    
    !cmp eax, edi           ;compare string character with character to find
    !cmovz ebx, ecx         ;if found, copy current position into ebx
    !jmp @r
    
    !@@:
    !mov eax, ebx           ;copy position of character found to eax (0 if character not found)
    !pop edi
    !pop esi
    !pop ebx
    
  CompilerElse
    !mov rdx, [p.p_CharToFind]
    !push rsi
    !mov rsi, [p.p_searchThisString+8]

    !movzx r8, word [rdx]   ;load the character to find
    
    !xor rax, rax
    !xor rcx, rcx
    !xor rdx, rdx
    !xor r9, r9
    
    !@@:
    !lodsw                  ;load a character from source string into ax
    !test eax, eax
    !jz @f                  ;if null, exit
    
    !inc rcx                ;current position + 1
    
    !cmp rax, r8
    !cmovz r9, rcx          ;if found, copy current position
    !jmp @r
    
    !@@:
    !mov rax, r9            ;copy position of character found to rax (0 if character not found)
    !pop rsi
  CompilerEndIf
  
  ProcedureReturn
EndProcedure

CharToFind$ = "\"
a$ = "1234567890123456789\D\"
Debug GetPosOfLastChar(@a$, @CharToFind$)

a$ = "123456789012345C\"
Debug GetPosOfLastChar(@a$, @CharToFind$)

a$ = "\123456789"
Debug GetPosOfLastChar(@a$, @CharToFind$)

a$ = "123456789ABC"
Debug GetPosOfLastChar(@a$, @CharToFind$)

Debug "---------"

a$ = ""
Debug GetPosOfLastChar(@a$, @CharToFind$)   ;0

CharToFind$ = ""
a$ = "1234567890123456789\D\"
Debug GetPosOfLastChar(@a$, @CharToFind$)   ;0
AZJIO
Addict
Addict
Posts: 1358
Joined: Sun May 14, 2017 1:48 am

Re: Get the position of the last occurrence of a character in a string

Post by AZJIO »

BarryG

Your code should in theory run slower. Even if you take the mk-soft example and remake it to search from the end of the line, then with the Len() function you will have to run through the entire line to calculate the length. As a result, the ideal option is to search from the beginning of the line. But if it was a file of known length loaded into memory, then it would be possible to search from the end of the line.

Or maybe it makes sense to search from the end of the string, since the search for a character implies two conditions, checking 0 and checking the character, then you can run to the end with a check of 0, and then search from the end to the beginning with a check for a character search.

Code: Select all

#RegExp = 0

Text$ = "\1234567890123456789DF"

If CreateRegularExpression(#RegExp, "(?m).*\\", #PB_RegularExpression_MultiLine | #PB_RegularExpression_NoCase)
	If ExamineRegularExpression(#RegExp, Text$)
		If NextRegularExpressionMatch(#RegExp)
			Debug RegularExpressionMatchLength(#RegExp)
		Else
			Debug 0
		EndIf
	EndIf
Else
	Debug RegularExpressionError()
EndIf
User avatar
idle
Always Here
Always Here
Posts: 5089
Joined: Fri Sep 21, 2007 5:52 am
Location: New Zealand

Re: Get the position of the last occurrence of a character in a string

Post by idle »

Look for finddata module I'm pretty sure it returns the positions then use ssefind
Post Reply