UpdateString: destructive insert

Bare metal programming in PureBasic, for experienced users
User avatar
netmaestro
PureBasic Bullfrog
PureBasic Bullfrog
Posts: 8425
Joined: Wed Jul 06, 2005 5:42 am
Location: Fort Nelson, BC, Canada

UpdateString: destructive insert

Post by netmaestro »

I need this for a project I'm working on and I discovered that PB's InsertString doesn't have a #PB_String_Inplace flag. And so I wrote my own. The target string is overwritten at pos (1-based, not 0 to conform with PB string commands) with the insert string and the procedure returns the number of characters successfully altered. Improvements are welcome, thanks for any time you spend on it.

Code: Select all

Procedure.i UpdateString(*target, *insert, pos.i)
  
  CompilerIf #PB_Compiler_Unicode
    
    !extrn _PB_Len_UNICODE@4
    !push dword [esp + 8]
    !call _PB_Len_UNICODE@4
    !mov ecx, eax
    
    !push edi
    !push esi
    !push ecx
    
    !mov edi, [esp + 16]
    !add edi, [esp + 24]
    !add edi, [esp + 24]
    !mov esi, [esp + 20]
    !sub edi, 2
    
    !rep movsw
    
    !pop eax
    !pop esi  
    !pop edi
    
  CompilerElse
    
    !extrn _PB_Len@4
    !push dword [esp + 8]
    !call _PB_Len@4
    !mov ecx, eax
    
    !push edi
    !push esi
    !push ecx
    
    !mov edi, [esp + 16]
    !add edi, [esp + 24]
    !mov esi, [esp + 20]
    !dec edi
    
    !rep movsb
    
    !pop eax
    !pop esi  
    !pop edi
    
  CompilerEndIf
  
  ProcedureReturn 
  
EndProcedure

a$ = "Hello World!"
b$ = "Girls"

Debug UpdateString(@a$, @b$, 7)

Debug a$
BERESHEIT
User avatar
netmaestro
PureBasic Bullfrog
PureBasic Bullfrog
Posts: 8425
Joined: Wed Jul 06, 2005 5:42 am
Location: Fort Nelson, BC, Canada

Re: UpdateString: destructive insert

Post by netmaestro »

This code is executing one billion times in 13.7 seconds on my computer, 3.2ghz AMD Phenom. I'm going to guess it can't be optimized very much from here.. but see what you can do :D
BERESHEIT
wilbert
PureBasic Expert
PureBasic Expert
Posts: 3870
Joined: Sun Aug 08, 2004 5:21 am
Location: Netherlands

Re: UpdateString: destructive insert

Post by wilbert »

This should be faster

Code: Select all

Procedure.l UpdateString(*target, *insert, pos.l)
  EnableASM
  MOV edx, *target
  MOV eax, pos
  CompilerIf #PB_Compiler_Unicode
    !lea edx, [edx + eax * 2 - 2]
    MOV eax, *insert
    !jmp us_unic_entry
    !us_unic_loop:
    !mov [edx], cx
    !add eax, 2
    !add edx, 2
    !us_unic_entry:
    !mov cx, [eax]
    !and cx, cx
    !jnz us_unic_loop
    SUB eax, *insert
    !shr eax, 1
  CompilerElse
    !lea edx, [edx + eax - 1]
    MOV eax, *insert
    !jmp us_ascii_entry
    !us_ascii_loop:
    !mov [edx], cl
    !inc eax
    !inc edx
    !us_ascii_entry:
    !mov cl, [eax]
    !and cl, cl
    !jnz us_ascii_loop
    SUB eax, *insert
  CompilerEndIf
  DisableASM
  ProcedureReturn 
EndProcedure
It does the same as your function.
There is a possible problem however with this approach (both your function and my alternative) and that is that there is no check to assure that there is only data set within the boundaries of the target string.
User avatar
netmaestro
PureBasic Bullfrog
PureBasic Bullfrog
Posts: 8425
Joined: Wed Jul 06, 2005 5:42 am
Location: Fort Nelson, BC, Canada

Re: UpdateString: destructive insert

Post by netmaestro »

It is faster, by about 25% here! I must study why now. No checking is done for speed's sake.
BERESHEIT
wilbert
PureBasic Expert
PureBasic Expert
Posts: 3870
Joined: Sun Aug 08, 2004 5:21 am
Location: Netherlands

Re: UpdateString: destructive insert

Post by wilbert »

The call to the PB procedure to get the length of *insert is probably what is slowing things down.
The procedure I created doesn't determine the length in advance.

If you want to get the length in advance like you did, it probably is faster to do it yourself using ASM instead of calling the PB procedure.
Something like this

Edit:
Okay, this was a bad idea :?
I assumed determining the string length using scasb / scasw would be faster but it's actually a lot slower.
My first attempt was much faster

Code: Select all

Procedure.i UpdateString(*target, *insert, pos.i)
  EnableASM
  !mov edx, edi
  MOV edi, *insert
  !mov ecx, -1
  !xor eax, eax
  !cld
  CompilerIf #PB_Compiler_Unicode
    !repne scasw
    !not ecx
    !dec ecx
    !mov eax, esi
    MOV esi, pos
    MOV edi, *target
    !lea edi, [edi + esi * 2 - 2]
    MOV esi, *insert
    !rep movsw
    SUB esi, *insert
    !shr esi, 1
  CompilerElse
    !repne scasb
    !not ecx
    !dec ecx
    !mov eax, esi
    MOV esi, pos
    MOV edi, *target
    !lea edi, [edi + esi - 1]
    MOV esi, *insert
    !rep movsb
    SUB esi, *insert
  CompilerEndIf
  !mov edi, edx
  !xchg eax, esi
  DisableASM
  ProcedureReturn 
EndProcedure
User avatar
netmaestro
PureBasic Bullfrog
PureBasic Bullfrog
Posts: 8425
Joined: Wed Jul 06, 2005 5:42 am
Location: Fort Nelson, BC, Canada

Re: UpdateString: destructive insert

Post by netmaestro »

Okay, this was a bad idea
Yep. Your first code takes 9 seconds here to do one billion updates, your second code 19 seconds. I like your first one better :mrgreen:

My code is taking 13.7 seconds, so your code is cutting my time by 35%, not the 25 I had estimated when I was in a rush.

Code: Select all

  EnableASM
  MOV edx, *target
  MOV eax, pos
You're doing it this way to get around the stack bug on Mac, is that right?
BERESHEIT
wilbert
PureBasic Expert
PureBasic Expert
Posts: 3870
Joined: Sun Aug 08, 2004 5:21 am
Location: Netherlands

Re: UpdateString: destructive insert

Post by wilbert »

It's indeed a workaround to get things working on Mac as long as the bug related to using inline ASM isn't fixed.
Thorium
Addict
Addict
Posts: 1271
Joined: Sat Aug 15, 2009 6:59 pm

Re: UpdateString: destructive insert

Post by Thorium »

Would be cool if you could make two versions of your asm codes one for x86 and one for x64.
wilbert
PureBasic Expert
PureBasic Expert
Posts: 3870
Joined: Sun Aug 08, 2004 5:21 am
Location: Netherlands

Re: UpdateString: destructive insert

Post by wilbert »

The version I created converted to x64 should look like this I think

Code: Select all

Procedure.l UpdateString(*target, *insert, pos.l)
  EnableASM
  MOV rdx, *target
  MOV eax, pos
  CompilerIf #PB_Compiler_Unicode
    !lea rdx, [rdx + rax * 2 - 2]
    MOV rax, *insert
    !jmp us_unic_entry
    !us_unic_loop:
    !mov [rdx], cx
    !add rax, 2
    !add rdx, 2
    !us_unic_entry:
    !mov cx, [rax]
    !and cx, cx
    !jnz us_unic_loop
    SUB rax, *insert
    !shr rax, 1
  CompilerElse
    !lea rdx, [rdx + rax - 1]
    MOV rax, *insert
    !jmp us_ascii_entry
    !us_ascii_loop:
    !mov [rdx], cl
    !inc rax
    !inc rdx
    !us_ascii_entry:
    !mov cl, [rax]
    !and cl, cl
    !jnz us_ascii_loop
    SUB rax, *insert
  CompilerEndIf
  DisableASM
  ProcedureReturn 
EndProcedure
User avatar
netmaestro
PureBasic Bullfrog
PureBasic Bullfrog
Posts: 8425
Joined: Wed Jul 06, 2005 5:42 am
Location: Fort Nelson, BC, Canada

Re: UpdateString: destructive insert

Post by netmaestro »

@wilbert: Your version is more accurate than mine because my procedure returns the number of intended char replacements, while yours returns the actual number of chars replaced.

I find that it's very inexpensive timewise to add some safety to this procedure. Basically what I'm doing is not only checking for the insertstring terminating null, but also for the targetstring null. Execution is stopped if either is found and so you'll never write past the end of the target string. It isn't going to protect you from say, passing an offset that starts past the end of the target string, passing a negative offset, etc. but the error protection it does provide is the most useful one imho.

With the following protection adjustments, I find the speed penalty well less than 5%, which at the speed this is going, shouldn't faze anyone.

Code: Select all

Procedure.l UpdateString(*target, *insert, pos.l)
  EnableASM
  MOV edx, *target
  MOV eax, pos
  CompilerIf #PB_Compiler_Unicode
    !lea edx, [edx + eax * 2 - 2]
    mov eax, *insert
    !jmp us_unic_entry
    !us_unic_loop:
    !mov [edx], cx
    !add eax, 2
    !add edx, 2
    !us_unic_entry:
    !mov cx, [eax]
    !and cx, cx
    !mov bx, [edx]    ; <--- write-past-end protection
    !and bx, cx       ; <---   "     "   "      "     
    !jnz us_unic_loop
    sub eax, *insert
    !shr eax, 1
  CompilerElse
    !lea edx, [edx + eax - 1]
    mov eax, *insert
    !jmp us_ascii_entry
    !us_ascii_loop:
    !mov [edx], cl
    !inc eax
    !inc edx
    !us_ascii_entry:
    !mov cl, [eax]
    !and cl, cl
    !mov ch, [edx]   ; <--- write-past-end protection 
    !and ch, cl      ; <---   "     "   "      "     
    !jnz us_ascii_loop
    sub eax, *insert
  CompilerEndIf
  DisableASM
  ProcedureReturn 
EndProcedure

a$ = "Hello World!"
b$ = "Girls"

Debug UpdateString(@a$, @b$, 10)
Debug a$
Note that I'm not preserving ebx:
http://www.swansontec.com/sregisters.html wrote: EBX - In 16-bit mode, the base register was useful as a pointer. Now it is completely free for extra storage space.

So, of all the general-purpose registers, EBX is the only register without an important dedicated purpose. It is a good place to store an extra pointer or calculation step, but not much more.
Having read this reference I concluded that it's probably ok not to preserve the ebx register. I use ebx in the unicode version of this procedure.
BERESHEIT
wilbert
PureBasic Expert
PureBasic Expert
Posts: 3870
Joined: Sun Aug 08, 2004 5:21 am
Location: Netherlands

Re: UpdateString: destructive insert

Post by wilbert »

You can't do it like that Netmaestro.
And is a bitwise And, not a logical and.
If one value would be 32 and the other one 64, the result of a bitwise and will be 0 while no end is reached.

It's not the most elegant solution but probably something like

Code: Select all

Procedure.l UpdateString(*target, *insert, pos.l)
  EnableASM
  MOV edx, *target
  MOV eax, pos
  CompilerIf #PB_Compiler_Unicode
    !lea edx, [edx + eax * 2 - 2]
    MOV eax, *insert
    !jmp us_unic_entry
    !us_unic_loop:
    !mov [edx], cx
    !add eax, 2
    !add edx, 2
    !us_unic_entry:
    !cmp word [edx], 0
    !jz us_unic_exit
    !mov cx, [eax]
    !and cx, cx
    !jnz us_unic_loop
    !us_unic_exit:
    SUB eax, *insert
    !shr eax, 1
  CompilerElse
    !lea edx, [edx + eax - 1]
    MOV eax, *insert
    !jmp us_ascii_entry
    !us_ascii_loop:
    !mov [edx], cl
    !inc eax
    !inc edx
    !us_ascii_entry:
    !cmp byte [edx], 0
    !jz us_ascii_exit
    !mov cl, [eax]
    !and cl, cl
    !jnz us_ascii_loop
    !us_ascii_exit:
    SUB eax, *insert
  CompilerEndIf
  DisableASM
  ProcedureReturn 
EndProcedure
would be the easiest way.
Thorium
Addict
Addict
Posts: 1271
Joined: Sat Aug 15, 2009 6:59 pm

Re: UpdateString: destructive insert

Post by Thorium »

netmaestro wrote: Note that I'm not preserving ebx:
http://www.swansontec.com/sregisters.html wrote: EBX - In 16-bit mode, the base register was useful as a pointer. Now it is completely free for extra storage space.

So, of all the general-purpose registers, EBX is the only register without an important dedicated purpose. It is a good place to store an extra pointer or calculation step, but not much more.
Having read this reference I concluded that it's probably ok not to preserve the ebx register. I use ebx in the unicode version of this procedure.
Thats the wrong conclusion. It can be used for anything you want but you need to preserve it. It's specified in the Windows ABI.
Take a look at that official reference if you are not sure about a register: http://msdn.microsoft.com/en-us/library/9z1stfyw.aspx
User avatar
Demivec
Addict
Addict
Posts: 4085
Joined: Mon Jul 25, 2005 3:51 pm
Location: Utah, USA

Re: UpdateString: destructive insert

Post by Demivec »

netmaestro wrote:Note that I'm not preserving ebx:
http://www.swansontec.com/sregisters.html wrote: EBX - In 16-bit mode, the base register was useful as a pointer. Now it is completely free for extra storage space.

So, of all the general-purpose registers, EBX is the only register without an important dedicated purpose. It is a good place to store an extra pointer or calculation step, but not much more.
Having read this reference I concluded that it's probably ok not to preserve the ebx register. I use ebx in the unicode version of this procedure.
I agree with Thorium that this is the wrong conclusion. According to the PB Manual:
- The available volatile registers are: eax, edx and ecx. All others must be always preserved.
User avatar
netmaestro
PureBasic Bullfrog
PureBasic Bullfrog
Posts: 8425
Joined: Wed Jul 06, 2005 5:42 am
Location: Fort Nelson, BC, Canada

Re: UpdateString: destructive insert

Post by netmaestro »

Thorium pointed me to an x64 registers page. Have a look here:

http://msdn.microsoft.com/en-us/library/k1a8ss06.aspx
BERESHEIT
User avatar
Demivec
Addict
Addict
Posts: 4085
Joined: Mon Jul 25, 2005 3:51 pm
Location: Utah, USA

Re: UpdateString: destructive insert

Post by Demivec »

netmaestro wrote:Thorium pointed me to an x64 registers page. Have a look here:

http://msdn.microsoft.com/en-us/library/k1a8ss06.aspx
That's a helpful reference but it deals with C++ inline assembly. How does that relate to PureBasic's inline assembly?
Post Reply