Improving performance of byte pointer copy

Just starting out? Need help? Post your questions and find answers here.
Oso
Enthusiast
Enthusiast
Posts: 595
Joined: Wed Jul 20, 2022 10:09 am

Improving performance of byte pointer copy

Post by Oso »

Dear all, is there a way I can improve the time for test 3 below?

I've been intending to re-write some of my earlier PureBasic code that uses PokeS(), by instead using a structure pointer (test 3), but I'm finding that my original use of PokeS() is already faster. This is largely because I work with fixed blocks of data, but there are cases where I would still prefer to re-write the code. One reason is because byte-by-byte enables me to act on certain intelligent logic.

Test 3 below is already quite quick if I turn off the debugger and output the results to the console. I get around 33ms for 2,000,000 bytes. So not too bad I suppose.

Code: Select all

EnableExplicit

Define t1, t2, t3

Define cpstring.s    = Space(1000000)
Define lendata.i     = Len(cpstring.s) * 2

Define *src                                                             ; Start address of source data
Define *srcptr.byte                                                     ; Source data pointer
Define *dest                                                            ; Start address of destination
Define *destptr.byte                                                    ; Destination data pointer
Define *limit

*dest = AllocateMemory(lendata.i)

; **
; **  Test 1, PokeS()
; **
t1 = ElapsedMilliseconds()

PokeS(*dest, cpstring.s, lendata.i, #PB_String_NoZero)

Debug "Time 1 = " + Str(ElapsedMilliseconds() - t1)


; **
; **  Test 2, CopyMemory()
; **
t2 = ElapsedMilliseconds()

CopyMemory(@cpstring.s, *dest, lendata.i)

Debug "Time 2 = " + Str(ElapsedMilliseconds() - t2)


; **
; **  Test 3, Byte copy
; **
t3 = ElapsedMilliseconds()

*src     = @cpstring.s                                                  ; Start address of source data
*srcptr  = *src                                                         ; Set starting pointer
*destptr = *dest                                                        ; Set destination pointer

*limit = *dest + lendata.i
While *destptr < *limit
  *destptr\b = *srcptr\b                                                ; Copy the byte
  *srcptr + 1                                                           ; Advance source pointer
  *destptr + 1                                                          ; Advance destination pointer
Wend

Debug "Time 3 = " + Str(ElapsedMilliseconds() - t3)
Olli
Addict
Addict
Posts: 1267
Joined: Wed May 27, 2020 12:26 pm

Re: Improving performance of byte pointer copy

Post by Olli »

It will be hard to improve this.

My suggest : switch the C optimizer on, test it, and test it on every new repeating algo. It is not sure, the algo stays slow.

There is no faster than an integer variable, and it is what you are using.

What it is faster is a CPU register. But I do not know if the C backend supports a "cross hardware" (you type a register name, and the compiler adapts it to the specific register, depending of the i86/ARM hardware.

You have also a problem of quantity. If you copy a 2N/4N/8N count of bytes, the copy could be executed through packs of bytes. Example : a quad copy will copy 8 bytes in one time.

Note that in the ASM backend, you could have this :

Code: Select all

! mov rcx, [v_quantity]
! mov si,[v_source]
! mov di,[v_dest]

mybytecopy:   ; equ to "while"
! lodsb ; LOaD String Byte
; your "intelligent logic" which handle RAX (64 bits) or EAX (32 bits)
! stosb ; STOre String Byte
! sub rcx,1
! jnz mybytecopy    ; equ to "wend"
pjay
Enthusiast
Enthusiast
Posts: 277
Joined: Thu Mar 30, 2006 11:14 am

Re: Improving performance of byte pointer copy

Post by pjay »

If you're working with strings then why not use the unicode type and +2 through the data?
Oso
Enthusiast
Enthusiast
Posts: 595
Joined: Wed Jul 20, 2022 10:09 am

Re: Improving performance of byte pointer copy

Post by Oso »

Olli wrote: Thu Dec 21, 2023 4:59 pm It will be hard to improve this. My suggest : switch the C optimizer on, test it, and test it on every new repeating algo. It is not sure, the algo stays slow. There is no faster than an integer variable, and it is what you are using.
Thanks Olli, I've got some amazing results with the C compiler, and even more when optimised :

1. ASM — 34ms
2. C — 11ms
3. Optimised C — 2ms

This suggests that the optimisation process is very worthwhile and has effectively reduced the code down to the equivalent of the PokeS() and CopyMemory().
pjay wrote: Thu Dec 21, 2023 5:15 pm If you're working with strings then why not use the unicode type and +2 through the data?
It will often be raw data, but can \u be used anyway, to copy two bytes?
User avatar
Sicro
Enthusiast
Enthusiast
Posts: 563
Joined: Wed Jun 25, 2014 5:25 pm
Location: Germany
Contact:

Re: Improving performance of byte pointer copy

Post by Sicro »

@Oso

It would be better if you included a test code with console output instead of debug output, so that not everyone has to adapt your code to experiment with it and measure the speed correctly.

Please note that PeekS() requires the character length:

Code: Select all

PokeS(*dest, cpstring.s, lendata.i / 2, #PB_String_NoZero)

Also note that ElapsedMilliseconds() returns a value of type Quad (.q).
Image
Why OpenSource should have a license :: PB-CodeArchiv-Rebirth :: Pleasant-Dark (syntax color scheme) :: RegEx-Engine (compiles RegExes to NFA/DFA)
Manjaro Xfce x64 (Main system) :: Windows 10 Home (VirtualBox) :: Newest PureBasic version
User avatar
idle
Always Here
Always Here
Posts: 6040
Joined: Fri Sep 21, 2007 5:52 am
Location: New Zealand

Re: Improving performance of byte pointer copy

Post by idle »

In my utf16 module I step through strings by integers and shift chars out. Assuming the memory is aligned it's really quick.

Also be very careful testing the optimization in c backend you need to introduce a dependency on the result so that the optimization won't vanish what your trying to test.
Oso
Enthusiast
Enthusiast
Posts: 595
Joined: Wed Jul 20, 2022 10:09 am

Re: Improving performance of byte pointer copy

Post by Oso »

Sicro wrote: Thu Dec 21, 2023 7:02 pm It would be better if you included a test code with console output instead of debug output, so that not everyone has to adapt your code to experiment with it and measure the speed correctly.
Okay, thanks. Below is the test code with corrections. I'm glad that you mentioned the PokeS() data length in characters, it's been a while since I used it :?

Code: Select all

EnableExplicit
OpenConsole()

Define t1.q, t2.q, t3.q

Define cpstring.s    = Space(1000000)
Define lendata.i     = Len(cpstring.s) * 2

Define *src                                                             ; Start address of source data
Define *srcptr.byte                                                     ; Source data pointer
Define *dest                                                            ; Start address of destination
Define *destptr.byte                                                    ; Destination data pointer
Define *limit

*dest = AllocateMemory(lendata.i)

; **
; **  Test 1, PokeS()
; **
t1 = ElapsedMilliseconds()

PokeS(*dest, cpstring.s, lendata.i / 2, #PB_String_NoZero)

PrintN( "Time 1 = " + Str(ElapsedMilliseconds() - t1))


; **
; **  Test 2, CopyMemory()
; **
t2 = ElapsedMilliseconds()

CopyMemory(@cpstring.s, *dest, lendata.i)

PrintN("Time 2 = " + Str(ElapsedMilliseconds() - t2))


; **
; **  Test 3, Byte copy
; **
t3 = ElapsedMilliseconds()

*src     = @cpstring.s                                                  ; Start address of source data
*srcptr  = *src                                                         ; Set starting pointer
*destptr = *dest                                                        ; Set destination pointer

*limit = *dest + lendata.i
While *destptr < *limit
  *destptr\b = *srcptr\b                                                ; Copy the byte
  *srcptr + 1                                                           ; Advance source pointer
  *destptr + 1                                                          ; Advance destination pointer
Wend

PrintN( "Time 3 = " + Str(ElapsedMilliseconds() - t3))
Print("Press <ENTER> : ")
Input()
Oso
Enthusiast
Enthusiast
Posts: 595
Joined: Wed Jul 20, 2022 10:09 am

Re: Improving performance of byte pointer copy

Post by Oso »

idle wrote: Thu Dec 21, 2023 7:38 pm In my utf16 module I step through strings by integers and shift chars out. Assuming the memory is aligned it's really quick.
This is for transferring a character at a time, with variable number of bytes?
Also be very careful testing the optimization in c backend you need to introduce a dependency on the result so that the optimization won't vanish what your trying to test.
Okay, because I was only testing with spaces... :D I've just introduced a control, if you like, which I guess is what you're referring to. I've checked at two or three points in the output, just to make sure if has something in there. I could have generated something better, by way of test, but it's 2.20am here :x

Image
User avatar
idle
Always Here
Always Here
Posts: 6040
Joined: Fri Sep 21, 2007 5:52 am
Location: New Zealand

Re: Improving performance of byte pointer copy

Post by idle »

CopyMemory will be the fastest method but if you need to modify the strings or do case conversion then processing by integers and shifting is much better than reading bytes, word, longs. You just have to test for the null as you shift.
infratec
Always Here
Always Here
Posts: 7664
Joined: Sun Sep 07, 2008 12:45 pm
Location: Germany

Re: Improving performance of byte pointer copy

Post by infratec »

If you want to handle only strings:

Code: Select all

; **
; **  Test 3, Character copy
; **
t3 = ElapsedMilliseconds()

*srcptr  = @cpstring.s                                                  ; Set starting pointer
*destptr = *dest                                                        ; Set destination pointer

*limit = *dest + lendata.i
While *destptr < *limit
  *destptr\c = *srcptr\c                                                ; Copy the character
  *srcptr + 2                                                           ; Advance source pointer
  *destptr + 2                                                          ; Advance destination pointer
Wend

PrintN( "Time 3 = " + Str(ElapsedMilliseconds() - t3))
In my test it is 2 times faster.
Oso
Enthusiast
Enthusiast
Posts: 595
Joined: Wed Jul 20, 2022 10:09 am

Re: Improving performance of byte pointer copy

Post by Oso »

infratec wrote: Thu Dec 21, 2023 8:59 pm If you want to handle only strings:
In my test it is 2 times faster.
Thanks, there's definitely some great performance to be had from doing this.

What would happen if the data was binary/raw data? Does \c not work in that case, because characters cannot be determined from it? Or would it just passively move 2-byte groups from memory to memory?
infratec
Always Here
Always Here
Posts: 7664
Joined: Sun Sep 07, 2008 12:45 pm
Location: Germany

Re: Improving performance of byte pointer copy

Post by infratec »

Code: Select all

; **
; **  Test 3, Character copy
; **
Structure TypeArray_Structure
  StructureUnion
    a.a
    u.u
    l.l
    q.q
  EndStructureUnion
EndStructure

Define *srcptr.TypeArray_Structure        ; Source data pointer
Define *destptr.TypeArray_Structure       ; Destination data pointer

t3 = ElapsedMilliseconds()

*srcptr  = @cpstring.s        ; Set starting pointer
*destptr = *dest              ; Set destination pointer

CopyLen = lendata.i
While CopyLen
  If CopyLen > 8
    *destptr\q = *srcptr\q      ; Copy the character
    *srcptr + 8                 ; Advance source pointer
    *destptr + 8                ; Advance destination pointer
    CopyLen - 8
  ElseIf CopyLen > 4
    *destptr\l = *srcptr\l      ; Copy the character
    *srcptr + 4                 ; Advance source pointer
    *destptr + 4                ; Advance destination pointer
    CopyLen - 4
  ElseIf CopyLen > 2
    *destptr\u = *srcptr\u      ; Copy the character
    *srcptr + 2                 ; Advance source pointer
    *destptr + 2                ; Advance destination pointer
    CopyLen - 2
  Else
    *destptr\a = *srcptr\a      ; Copy the character
    *srcptr + 1                 ; Advance source pointer
    *destptr + 1                ; Advance destination pointer
    CopyLen - 1
  EndIf
Wend

PrintN( "Time 3 = " + Str(ElapsedMilliseconds() - t3))
5 times faster
infratec
Always Here
Always Here
Posts: 7664
Joined: Sun Sep 07, 2008 12:45 pm
Location: Germany

Re: Improving performance of byte pointer copy

Post by infratec »

Hm...

with C backend, optimized and without debugger it results in 0ms
User avatar
idle
Always Here
Always Here
Posts: 6040
Joined: Fri Sep 21, 2007 5:52 am
Location: New Zealand

Re: Improving performance of byte pointer copy

Post by idle »

infratec wrote: Thu Dec 21, 2023 9:23 pm Hm...

with C backend, optimized and without debugger it results in 0ms
its probably optimizing the test away. You can examine the generated asm with my tool pbcex use flags
!//gccflags -S -march=native; Then look at the PB.obj to see if it's doing it or vectorized it.
Oso
Enthusiast
Enthusiast
Posts: 595
Joined: Wed Jul 20, 2022 10:09 am

Re: Improving performance of byte pointer copy

Post by Oso »

idle wrote: Thu Dec 21, 2023 9:37 pm its probably optimizing the test away. You can examine the generated asm with my tool pbcex use flags
I'm consistently getting a 1ms result, regardless of whether I use ASM, C, or optimised C. I also added a control total, of the bytes copied for each...

Code: Select all

While CopyLen
  If CopyLen > 8
    *destptr\q = *srcptr\q      ; Copy the character
    *srcptr + 8                 ; Advance source pointer
    *destptr + 8                ; Advance destination pointer
    CopyLen - 8
    control + 8
         ... etc.
At the end, I get :
Time 3 = 1 (ms)
Control = 1,998,000

1,998,000 is the precise length.
Last edited by Oso on Thu Dec 21, 2023 10:20 pm, edited 1 time in total.
Post Reply