Unicode string Len and Mid procedures using SSE2

Share your advanced PureBasic knowledge/code with the community.
wilbert
PureBasic Expert
PureBasic Expert
Posts: 3942
Joined: Sun Aug 08, 2004 5:21 am
Location: Netherlands

Re: Unicode (UCS2) string length using SSE2

Post by wilbert »

Thanks everyone again for testing :)
It's nice to see there's so much room for speed improvement using SSE2 instructions compared to the default implementation.
It probably is possible in a similar way to also create faster unicode versions of procedures like Mid or FindString.
Tenaja wrote:Is there a way to force (or ensure) string alignment on the Word boundaries? Fred?
If you want to use this, I recommend using the second version since in almost every case it is faster compared to my first attempt.
Even with non aligned strings, it should perform similar to the first version.
Normally when memory is allocated, it always is allocated on a DWord (4 byte) boundary which of course is also a Word boundary.
So I presume all PB strings with the .s type will be aligned. Maybe Fred can confirm this.
The thing is that PB supports non aligned structures. The code below is an example where the fixed length string is not aligned on a Word boundary.

Code: Select all

Structure MyStruct
  a.a
  s.s{20}
EndStructure

Debug @A.MyStruct\s
Windows (x64)
Raspberry Pi OS (Arm64)
davido
Addict
Addict
Posts: 1890
Joined: Fri Nov 09, 2012 11:04 pm
Location: Uttoxeter, UK

Re: Unicode (UCS2) string length using SSE2

Post by davido »

@wilbert

Thank you. Very useful.

Would it be advisable to use a dummy, in structures, to force word alignment?
DE AA EB
wilbert
PureBasic Expert
PureBasic Expert
Posts: 3942
Joined: Sun Aug 08, 2004 5:21 am
Location: Netherlands

Re: Unicode (UCS2) string length using SSE2

Post by wilbert »

davido wrote:Would it be advisable to use a dummy, in structures, to force word alignment?
It's only important if the structure contains fixed length strings.
One way is do use a dummy, another solution to use align

Code: Select all

Structure MyStruct Align 2
  a.a
  s.s{20}
EndStructure

Debug @A.MyStruct\s
Windows (x64)
Raspberry Pi OS (Arm64)
Fred
Administrator
Administrator
Posts: 18161
Joined: Fri May 17, 2002 4:39 pm
Location: France
Contact:

Re: Unicode (UCS2) string length using SSE2

Post by Fred »

Strings uses system memory allocator which usually align memory on a DWORD boundary (4 bytes).
wilbert
PureBasic Expert
PureBasic Expert
Posts: 3942
Joined: Sun Aug 08, 2004 5:21 am
Location: Netherlands

Re: Unicode (UCS2) string length using SSE2

Post by wilbert »

Here's an updated version of my code with an additional procedure to get the length of a string with the ability to specify a maximum value and a mid procedure.
Unfortunately the mid procedure is only faster for longer strings (at least on OSX)

Code: Select all

; Procedure.i UCS2Len(*UCS2String)
; Procedure.i UCS2LenM(*UCS2String, MaxLen.i = -1)
; Procedure.s UCS2Mid(*UCS2String, StartPos.i, Length.i = -1)


CompilerIf #PB_Compiler_Processor = #PB_Processor_x64
  ; x64 Macro definitions
  Macro UCS2Len_MOV
    !movdqa xmm0, [rdx]
  EndMacro
  Macro UCS2Len_ADD
    !lea rdx, [rdx + 16]
  EndMacro    
  Macro UCS2Len_CMP
    !cmp rdx, r8
  EndMacro  
CompilerElse
  ; x86 Macro definitions  
  Macro UCS2Len_MOV
    !movdqa xmm0, [edx]
  EndMacro
  Macro UCS2Len_ADD
    !lea edx, [edx + 16]
  EndMacro    
  Macro UCS2Len_CMP
    !cmp edx, ebx
  EndMacro  
CompilerEndIf  


Procedure.i UCS2Len(*UCS2String)
  ; init and check word boundary alignment
  CompilerIf #PB_Compiler_Processor = #PB_Processor_x64
    !mov rdx, [p.p_UCS2String]
    !mov ecx, edx
    !and rdx, -16
  CompilerElse
    !mov edx, [p.p_UCS2String]
    !mov ecx, edx
    !and edx, -16
  CompilerEndIf
  UCS2Len_MOV
  !pxor xmm1, xmm1
  !sub ecx, edx
  !test ecx, 1
  !jnz ucs2len1
  
  ; handle strings aligned to word boundary
  !pcmpeqw xmm0, xmm1
  !pmovmskb eax, xmm0
  !shr eax, cl
  !shl eax, cl
  UCS2Len_ADD
  !and eax, eax
  !jnz ucs2len3
  !ucs2len0:
  UCS2Len_MOV
  !pcmpeqw xmm0, xmm1
  !pmovmskb eax, xmm0
  UCS2Len_ADD
  !and eax, eax
  !jz ucs2len0
  !jmp ucs2len3
  
  ; handle strings not aligned to word boundary
  !ucs2len1:
  !movdqa xmm3, xmm0
  !pslldq xmm0, 1
  !inc cl
  !pcmpeqw xmm0, xmm1
  !pmovmskb eax, xmm0
  !shr eax, cl
  !shl eax, cl
  UCS2Len_ADD
  !and eax, eax
  !jnz ucs2len3
  !ucs2len2:
  UCS2Len_MOV
  !movdqa xmm2, xmm0
  !pslldq xmm0, 1
  !psrldq xmm3, 15
  !por xmm0, xmm3
  !pcmpeqw xmm0, xmm1
  !pmovmskb eax, xmm0
  UCS2Len_ADD
  !movdqa xmm3, xmm2
  !and eax, eax
  !jz ucs2len2
  
  ; exit procedure
  !ucs2len3:
  !bsf ecx, eax
  !lea eax, [ecx + edx - 16]
  !sub eax, [p.p_UCS2String]
  !shr eax, 1
  ProcedureReturn
EndProcedure


Procedure.i UCS2LenM(*UCS2String, MaxLen.i = -1)
  ; init and check word boundary alignment
  CompilerIf #PB_Compiler_Processor = #PB_Processor_x64
    !mov rdx, [p.p_UCS2String]
    !mov rax, [p.v_MaxLen]
    !lea r8, [rdx + rax * 2]
    !sar rax, 63
    !or r8, rax
    !mov ecx, edx
    !and rdx, -16
  CompilerElse
    !mov edx, [p.p_UCS2String]
    !mov eax, [p.v_MaxLen]
    !push ebx
    !lea ebx, [edx + eax * 2]
    !sar eax, 31
    !or ebx, eax
    !mov ecx, edx
    !and edx, -16
  CompilerEndIf
  UCS2Len_MOV
  !pxor xmm1, xmm1
  !sub ecx, edx
  !test ecx, 1
  !jnz ucs2lenm1
  
  ; handle strings aligned to word boundary
  !pcmpeqw xmm0, xmm1
  !pmovmskb eax, xmm0
  !shr eax, cl
  !shl eax, cl
  UCS2Len_ADD
  !and eax, eax
  !jnz ucs2lenm3
  !ucs2lenm0:
  UCS2Len_MOV
  !pcmpeqw xmm0, xmm1
  !pmovmskb eax, xmm0
  UCS2Len_CMP
  UCS2Len_ADD
  !ja ucs2lenm3
  !and eax, eax
  !jz ucs2lenm0
  !jmp ucs2lenm3
  
  ; handle strings not aligned to word boundary
  !ucs2lenm1:
  !movdqa xmm3, xmm0
  !pslldq xmm0, 1
  !inc cl
  !pcmpeqw xmm0, xmm1
  !pmovmskb eax, xmm0
  !shr eax, cl
  !shl eax, cl
  UCS2Len_ADD
  !and eax, eax
  !jnz ucs2lenm3
  !ucs2lenm2:
  UCS2Len_MOV
  !movdqa xmm2, xmm0
  !pslldq xmm0, 1
  !psrldq xmm3, 15
  !por xmm0, xmm3
  !pcmpeqw xmm0, xmm1
  !pmovmskb eax, xmm0
  UCS2Len_CMP
  UCS2Len_ADD
  !ja ucs2lenm3
  !movdqa xmm3, xmm2
  !and eax, eax
  !jz ucs2lenm2
  
  ; exit procedure
  !ucs2lenm3:
  !bsf ecx, eax
  CompilerIf #PB_Compiler_Processor = #PB_Processor_x64
    !lea rax, [rcx + rdx - 16]
    !cmp rax, r8
    !cmova rax, r8
  CompilerElse
    !lea eax, [ecx + edx - 16]
    !cmp eax, ebx
    !cmova eax, ebx
    !pop ebx  
  CompilerEndIf  
  !sub eax, [p.p_UCS2String]
  !shr eax, 1
  ProcedureReturn
EndProcedure


Procedure.s UCS2Mid(*UCS2String, StartPos.i, Length.i = -1)
  If StartPos > 0
    ProcedureReturn PeekS(*UCS2String + UCS2LenM(*UCS2String, StartPos - 1) << 1, Length)
  Else
    ProcedureReturn PeekS(*UCS2String + UCS2LenM(*UCS2String, 0) << 1, Length)
  EndIf
EndProcedure
Test code for the UCS2Mid procedure (important to compile with unicode enabled and debugger disabled !)

Code: Select all

S.s = "Unicode mid string, SSE2 vs PB" + #CRLF$ + #CRLF$  

t1 = ElapsedMilliseconds()
For i = 1 To 1000000
  r.s = UCS2Mid(@"Short", 3)
Next
t2 = ElapsedMilliseconds()
For i = 1 To 1000000
  r.s = Mid("Short", 3)
Next
t3 = ElapsedMilliseconds()
S + "Short string : " + Str(t2 - t1) + "(SSE2) vs " + Str(t3 - t2) + "(PB)" + #CRLF$


t1 = ElapsedMilliseconds()
For i = 1 To 1000000
  r.s = UCS2Mid(@"A bit longer string to test with", 19, 7)
Next
t2 = ElapsedMilliseconds()
For i = 1 To 1000000
  r.s = Mid("A bit longer string to test with", 19, 7)
Next
t3 = ElapsedMilliseconds()
S + "Longer string : " + Str(t2 - t1) + "(SSE2) vs " + Str(t3 - t2) + "(PB)" + #CRLF$


SpaceString.s = Space(20000)
t1 = ElapsedMilliseconds()
For i = 1 To 1000000
  r.s = UCS2Mid(@SpaceString, 3500, 20)
Next
t2 = ElapsedMilliseconds()
For i = 1 To 1000000
  r.s = Mid(SpaceString, 3500, 20)
Next
t3 = ElapsedMilliseconds()
S + "Very long string : " + Str(t2 - t1) + "(SSE2) vs " + Str(t3 - t2) + "(PB)" + #CRLF$

S = Str(OSVersion()) + #CRLF$ + CPUName() + #CRLF$ + #CRLF$ + S
SetClipboardText("[quote]" + S + "[/quote]")

MessageRequester("Test results", S)
Last edited by wilbert on Tue Aug 19, 2014 4:09 pm, edited 1 time in total.
Windows (x64)
Raspberry Pi OS (Arm64)
davido
Addict
Addict
Posts: 1890
Joined: Fri Nov 09, 2012 11:04 pm
Location: Uttoxeter, UK

Re: Unicode (UCS2) string length using SSE2

Post by davido »

I got the following results:

Unicode mid string, SSE2 vs PB

Short string : 114(SSE2) vs 99(PB)
Longer string : 101(SSE2) vs 96(PB)
Very long string : 562(SSE2) vs 2399(PB)
Intel(R) Core(TM) i3 CPU 540 @ 3.07GHz
80 - Windows 7 64 bit - PureBasic 5.30

Might I suggest adding:

Code: Select all

S + CPUName() +  #CRLF$
S + Str(OSVersion())
SetClipboardText(S) 
DE AA EB
User avatar
ts-soft
Always Here
Always Here
Posts: 5756
Joined: Thu Jun 24, 2004 2:44 pm
Location: Berlin - Germany

Re: Unicode (UCS2) string length using SSE2

Post by ts-soft »

---------------------------
Test results
---------------------------
Unicode mid string, SSE2 vs PB



Windows 7 x64

AMD FX(tm)-6300 Six-Core Processor



Short string : 111(SSE2) vs 99(PB)

Longer string : 109(SSE2) vs 102(PB)

Very long string : 688(SSE2) vs 2269(PB)


---------------------------
OK
---------------------------
---------------------------
Test results
---------------------------
Unicode mid string, SSE2 vs PB



Windows 7 x86

AMD FX(tm)-6300 Six-Core Processor



Short string : 110(SSE2) vs 99(PB)

Longer string : 107(SSE2) vs 107(PB)

Very long string : 688(SSE2) vs 1936(PB)


---------------------------
OK
---------------------------
PureBasic 5.73 | SpiderBasic 2.30 | Windows 10 Pro (x64) | Linux Mint 20.1 (x64)
Old bugs good, new bugs bad! Updates are evil: might fix old bugs and introduce no new ones.
Image
User avatar
flaith
Enthusiast
Enthusiast
Posts: 704
Joined: Mon Apr 25, 2005 9:28 pm
Location: $300:20 58 FC 60 - Rennes
Contact:

Re: Unicode string Len and Mid procedures using SSE2

Post by flaith »

Windows 8.1 x64-PB 5.30 x86
Intel(R) Core(TM) i3-3227U CPU @ 1.90GHz

Unicode mid string, SSE2 vs PB

Short string : 146(SSE2) vs 135(PB)
Longer string : 152(SSE2) vs 158(PB)
Very long string : 690(SSE2) vs 4085(PB)
“Fear is a reaction. Courage is a decision.” - WC
User avatar
idle
Always Here
Always Here
Posts: 5836
Joined: Fri Sep 21, 2007 5:52 am
Location: New Zealand

Re: Unicode string Len and Mid procedures using SSE2

Post by idle »

Linux x64-PB 5.30
AMD FX(tm)-6100 Six-Core

Unicode mid string, SSE2 vs PB

Short string : 93(SSE2) vs 64(PB)
Longer string : 83(SSE2) vs 88(PB)
Very long string : 680(SSE2) vs 2273(PB)
Windows 11, Manjaro, Raspberry Pi OS
Image
Foz
Addict
Addict
Posts: 1359
Joined: Tue Nov 13, 2007 12:42 pm
Location: Manchester, UK

Re: Unicode string Len and Mid procedures using SSE2

Post by Foz »

Windows 7 SP1, PB 5.30 x64
Intel(R) Core(TM) i5-3450 CPU @ 3.10GHz

Unicode mid string, SSE2 vs PB

Short string : 101(SSE2) vs 86(PB)
Longer string : 82(SSE2) vs 82(PB)
Very long string : 391(SSE2) vs 2229(PB)
I think I'd prefer this version that takes a hit on small strings but I get a performance equivlaence or gain on medium to long strings.

It's when I've got to start meddling with big strings that I want the high performance :)
wilbert
PureBasic Expert
PureBasic Expert
Posts: 3942
Joined: Sun Aug 08, 2004 5:21 am
Location: Netherlands

Re: Unicode string Len and Mid procedures using SSE2

Post by wilbert »

Foz wrote:I think I'd prefer this version that takes a hit on small strings but I get a performance equivlaence or gain on medium to long strings.

It's when I've got to start meddling with big strings that I want the high performance :)
I agree :)

I'll try if I can improve the timing for short strings a bit more but I guess there's not much room left for improvement.
Windows (x64)
Raspberry Pi OS (Arm64)
spacebuddy
Enthusiast
Enthusiast
Posts: 356
Joined: Thu Jul 02, 2009 5:42 am

Re: Unicode string Len and Mid procedures using SSE2

Post by spacebuddy »

Thanks wilbert, this is real good :D
User avatar
RichAlgeni
Addict
Addict
Posts: 935
Joined: Wed Sep 22, 2010 1:50 am
Location: Bradenton, FL

Re: Unicode string Len and Mid procedures using SSE2

Post by RichAlgeni »

Do you have any idea Wilbert how this code would compare with your SSE42 code you had posted previously in the Assembler section?
wilbert
PureBasic Expert
PureBasic Expert
Posts: 3942
Joined: Sun Aug 08, 2004 5:21 am
Location: Netherlands

Re: Unicode string Len and Mid procedures using SSE2

Post by wilbert »

RichAlgeni wrote:Do you have any idea Wilbert how this code would compare with your SSE42 code you had posted previously in the Assembler section?
I'm sure someone else created the SSE4.2 code you are mentioning since my computer doesn't support it. :wink:
SSE4.2 was designed for string handling so it probably would be faster.
The main problem with it would be that there are still a lot of computers that don't support it while there are very few that don't support SSE2 nowadays.
What I also like about SSE2 is that it is part of the x64 standard. All x64 processors support it. With SSE3 and later this isn't always the case.
Windows (x64)
Raspberry Pi OS (Arm64)
User avatar
RichAlgeni
Addict
Addict
Posts: 935
Joined: Wed Sep 22, 2010 1:50 am
Location: Bradenton, FL

Re: Unicode string Len and Mid procedures using SSE2

Post by RichAlgeni »

Understood.

Thanks for your time and efforts Wilbert!

Rich
Post Reply