Unicode string Len and Mid procedures using SSE2

Share your advanced PureBasic knowledge/code with the community.
wilbert
PureBasic Expert
PureBasic Expert
Posts: 3942
Joined: Sun Aug 08, 2004 5:21 am
Location: Netherlands

Unicode string Len and Mid procedures using SSE2

Post by wilbert »

I tried to create a procedure to get the length of a unicode string using SSE2 instructions.

Since results might be different on different platforms, it would be great if someone wants to test it.
On my iMac with OSX (Core2Duo) the PB version is faster for short strings and SSE2 for longer strings.

Edit:
I posted an updated code a few posts below.
It's faster and also contains a Mid procedure.
http://www.purebasic.fr/english/viewtop ... 25#p451425

Code: Select all

Procedure.i UCS2Len(*UCS2String)
  CompilerIf #PB_Compiler_Processor = #PB_Processor_x64
    !mov rdx, [p.p_UCS2String]
    !mov ecx, edx
    !and rdx, -16
    !movdqa xmm0, [rdx]
    !push rbx
  CompilerElse
    !mov edx, [p.p_UCS2String]
    !mov ecx, edx
    !and edx, -16
    !movdqa xmm0, [edx]
    !push ebx
  CompilerEndIf  
  !pxor xmm1, xmm1
  !pcmpeqb xmm0, xmm1
  !pmovmskb eax, xmm0
  !mov ebx, 0xaaaaaaaa
  !sub ecx, edx
  !shr eax, cl
  !add ecx, 16
  !shl eax, cl  
  !and ecx, 1
  !shr ebx, cl
  !jmp ucs2len1
  !ucs2len0:
  CompilerIf #PB_Compiler_Processor = #PB_Processor_x64
    !add rdx, 16
    !movdqa xmm0, [rdx]
  CompilerElse
    !add edx, 16
    !movdqa xmm0, [edx]
  CompilerEndIf
  !pcmpeqb xmm0, xmm1
  !pmovmskb ecx, xmm0
  !shr eax, 16
  !shl ecx, 16
  !or eax, ecx
  !ucs2len1:
  !lea ecx, [eax + eax]
  !and ecx, eax
  !and ecx, ebx 
  !jz ucs2len0
  CompilerIf #PB_Compiler_Processor = #PB_Processor_x64
    !pop rbx
  CompilerElse
    !pop ebx
  CompilerEndIf
  !bsf eax, ecx
  !lea eax, [eax + edx - 16]
  !sub eax, [p.p_UCS2String]
  !shr eax, 1
  ProcedureReturn
EndProcedure
Test code (important to compile with unicode enabled and debugger disabled !)

Code: Select all

S.s = "Unicode string length, SSE2 vs PB" + #CRLF$ + #CRLF$  

t1 = ElapsedMilliseconds()
For i = 1 To 10000000
  c = UCS2Len(@"Short")
Next
t2 = ElapsedMilliseconds()
For i = 1 To 10000000
  c = MemoryStringLength(@"Short")
Next
t3 = ElapsedMilliseconds()
S + "Short string : " + Str(t2 - t1) + "(SSE2) vs " + Str(t3 - t2) + "(PB)" + #CRLF$


t1 = ElapsedMilliseconds()
For i = 1 To 10000000
  c = UCS2Len(@"A bit longer string to test with")
Next
t2 = ElapsedMilliseconds()
For i = 1 To 10000000
  c = MemoryStringLength(@"A bit longer string to test with")
Next
t3 = ElapsedMilliseconds()
S + "Longer string : " + Str(t2 - t1) + "(SSE2) vs " + Str(t3 - t2) + "(PB)" + #CRLF$


SpaceString.s = Space(20000)
t1 = ElapsedMilliseconds()
For i = 1 To 100000
  c = UCS2Len(@SpaceString)
Next
t2 = ElapsedMilliseconds()
For i = 1 To 100000
  c = MemoryStringLength(@SpaceString)
Next
t3 = ElapsedMilliseconds()
S + "Very long string : " + Str(t2 - t1) + "(SSE2) vs " + Str(t3 - t2) + "(PB)" + #CRLF$

MessageRequester("Test results", S)
Last edited by wilbert on Tue Aug 19, 2014 5:04 pm, edited 3 times in total.
Windows (x64)
Raspberry Pi OS (Arm64)
User avatar
ts-soft
Always Here
Always Here
Posts: 5756
Joined: Thu Jun 24, 2004 2:44 pm
Location: Berlin - Germany

Re: Unicode (UCS2) string length using SSE2

Post by ts-soft »

Windows 7 x64:
Test results wrote:---------------------------
Test results
---------------------------
Short string : 81(SSE2) vs 160(PB)

Longer string : 159(SSE2) vs 446(PB)

Very long string : 471(SSE2) vs 2053(PB)


---------------------------
OK
---------------------------
Windows 7 x86:
Test results wrote:---------------------------
Test results
---------------------------
Short string : 72(SSE2) vs 70(PB)

Longer string : 154(SSE2) vs 285(PB)

Very long string : 476(SSE2) vs 1158(PB)


---------------------------
OK
---------------------------
PureBasic 5.73 | SpiderBasic 2.30 | Windows 10 Pro (x64) | Linux Mint 20.1 (x64)
Old bugs good, new bugs bad! Updates are evil: might fix old bugs and introduce no new ones.
Image
User avatar
idle
Always Here
Always Here
Posts: 5835
Joined: Fri Sep 21, 2007 5:52 am
Location: New Zealand

Re: Unicode (UCS2) string length using SSE2

Post by idle »

linux x64
---------------------------
Test results
---------------------------
Short string : 107(SSE2) vs 67(PB)

Longer string : 180(SSE2) vs 259(PB)

Very long string : 416(SSE2) vs 1043(PB)
Windows 11, Manjaro, Raspberry Pi OS
Image
Fred
Administrator
Administrator
Posts: 18153
Joined: Fri May 17, 2002 4:39 pm
Location: France
Contact:

Re: Unicode (UCS2) string length using SSE2

Post by Fred »

so you actually beat the strlen() routine on linux, that's quite a nice work and you should submit a patch for the libc ! :)
User avatar
netmaestro
PureBasic Bullfrog
PureBasic Bullfrog
Posts: 8451
Joined: Wed Jul 06, 2005 5:42 am
Location: Fort Nelson, BC, Canada

Re: Unicode (UCS2) string length using SSE2

Post by netmaestro »

sse2 vs. pb

39 vs. 35

78 vs. 127

224 vs. 669

Windows 8.1 x64, pb 5.30x86, i7 4770 @ 3.4 ghz

fastest code overall: wilbert
fastest computer: netmaestro (yay!)
BERESHEIT
User avatar
graph100
Enthusiast
Enthusiast
Posts: 115
Joined: Tue Aug 10, 2010 3:17 pm

Re: Unicode (UCS2) string length using SSE2

Post by graph100 »

avec mon : i7-3630QM @2.40GHz

window 8, PB5.22x86 :
Short string : 48(SSE2) vs 59(PB)
Longer string : 106(SSE2) vs 297(PB)
Very long string : 285(SSE2) vs 929(PB)
Mandriva (VM), PB5.22x86 :
Short string : 72(SSE2) vs 66(PB)
Longer string : 125(SSE2) vs 231(PB)
Very long string : 283(SSE2) vs 839(PB)
Mac (VM), PB5.21x86 :
Short string : 91(SSE2) vs 77(PB)
Longer string : 124(SSE2) vs 257(PB)
Very long string : 288(SSE2) vs 850(PB)
_________________________________________________
My Website : CeriseCode (Warning : perpetual changes & not completed ;))
User avatar
Demivec
Addict
Addict
Posts: 4259
Joined: Mon Jul 25, 2005 3:51 pm
Location: Utah, USA

Re: Unicode (UCS2) string length using SSE2

Post by Demivec »

Using: intel i5-4570 (4 cores) CPU @ 3.20GHz

Windows 8.1 5.30 x64

Code: Select all

Short string : 56(SSE2) vs 200(PB)
Longer string : 95(SSE2) vs 362(PB)
Very long string : 254(SSE2) vs 1179(PB)
Windows 8.1 5.30 x86

Code: Select all

Short string : 44(SSE2) vs 39(PB)
Longer string : 88(SSE2) vs 142(PB)
Very long string : 252(SSE2) vs 751(PB)
wilbert
PureBasic Expert
PureBasic Expert
Posts: 3942
Joined: Sun Aug 08, 2004 5:21 am
Location: Netherlands

Re: Unicode (UCS2) string length using SSE2

Post by wilbert »

Thanks a lot for testing :-)
This helps to get a better perspective.
Fred wrote:so you actually beat the strlen() routine on linux, that's quite a nice work and you should submit a patch for the libc ! :)
I don't know if Linux is optimized for handling 16 bit unicode. As far as I understand it uses 32 bit internally.

My code in the first post, was a general approach being able to handle both strings aligned on a word boundary and not.
If most strings are aligned on a word boundary (the usually are) and it isn't a problem the source becomes a little bigger, the following code should be even faster.
It splits the source in a part to handle word aligned and a part to handle non word aligned strings.
It can be tested with the same test code from the first post.

Code: Select all

CompilerIf #PB_Compiler_Processor = #PB_Processor_x64
  ; x64 Macro definitions
  Macro UCS2Len_MOV
    !movdqa xmm0, [rdx]
  EndMacro
  Macro UCS2Len_ADD
    !add rdx, 16
  EndMacro    
CompilerElse
  ; x86 Macro definitions  
  Macro UCS2Len_MOV
    !movdqa xmm0, [edx]
  EndMacro
  Macro UCS2Len_ADD
    !add edx, 16
  EndMacro    
CompilerEndIf  

Procedure.i UCS2Len(*UCS2String)
  ; init and check word boundary alignment
  CompilerIf #PB_Compiler_Processor = #PB_Processor_x64
    !mov rdx, [p.p_UCS2String]
    !mov ecx, edx
    !and rdx, -16
  CompilerElse
    !mov edx, [p.p_UCS2String]
    !mov ecx, edx
    !and edx, -16
  CompilerEndIf
  UCS2Len_MOV
  !pxor xmm1, xmm1
  !sub ecx, edx
  !test ecx, 1
  !jnz ucs2len1
  
  ; handle strings aligned to word boundary
  !pcmpeqw xmm0, xmm1
  !pmovmskb eax, xmm0
  !shr eax, cl
  !shl eax, cl
  UCS2Len_ADD
  !and eax, eax
  !jnz ucs2len3
  !ucs2len0:
  UCS2Len_MOV
  !pcmpeqw xmm0, xmm1
  !pmovmskb eax, xmm0
  UCS2Len_ADD
  !and eax, eax
  !jz ucs2len0
  !jmp ucs2len3
  
  ; handle strings not aligned to word boundary
  !ucs2len1:
  !movdqa xmm3, xmm0
  !pslldq xmm0, 1
  !inc cl
  !pcmpeqw xmm0, xmm1
  !pmovmskb eax, xmm0
  !shr eax, cl
  !shl eax, cl
  UCS2Len_ADD
  !and eax, eax
  !jnz ucs2len3
  !ucs2len2:
  UCS2Len_MOV
  !movdqa xmm2, xmm0
  !pslldq xmm0, 1
  !psrldq xmm3, 15
  !por xmm0, xmm3
  !pcmpeqw xmm0, xmm1
  !pmovmskb eax, xmm0
  UCS2Len_ADD
  !movdqa xmm3, xmm2
  !and eax, eax
  !jz ucs2len2
  
  ; exit procedure
  !ucs2len3:
  !bsf ecx, eax
  !lea eax, [ecx + edx - 16]
  !sub eax, [p.p_UCS2String]
  !shr eax, 1
  ProcedureReturn
EndProcedure
Test code (important to compile with unicode enabled and debugger disabled !)

Code: Select all

S.s = "Unicode string length, SSE2 vs PB" + #CRLF$ + #CRLF$  

t1 = ElapsedMilliseconds()
For i = 1 To 10000000
  c = UCS2Len(@"Short")
Next
t2 = ElapsedMilliseconds()
For i = 1 To 10000000
  c = MemoryStringLength(@"Short")
Next
t3 = ElapsedMilliseconds()
S + "Short string : " + Str(t2 - t1) + "(SSE2) vs " + Str(t3 - t2) + "(PB)" + #CRLF$


t1 = ElapsedMilliseconds()
For i = 1 To 10000000
  c = UCS2Len(@"A bit longer string to test with")
Next
t2 = ElapsedMilliseconds()
For i = 1 To 10000000
  c = MemoryStringLength(@"A bit longer string to test with")
Next
t3 = ElapsedMilliseconds()
S + "Longer string : " + Str(t2 - t1) + "(SSE2) vs " + Str(t3 - t2) + "(PB)" + #CRLF$


SpaceString.s = Space(20000)
t1 = ElapsedMilliseconds()
For i = 1 To 100000
  c = UCS2Len(@SpaceString)
Next
t2 = ElapsedMilliseconds()
For i = 1 To 100000
  c = MemoryStringLength(@SpaceString)
Next
t3 = ElapsedMilliseconds()
S + "Very long string : " + Str(t2 - t1) + "(SSE2) vs " + Str(t3 - t2) + "(PB)" + #CRLF$

MessageRequester("Test results", S)
Windows (x64)
Raspberry Pi OS (Arm64)
User avatar
blueb
Addict
Addict
Posts: 1111
Joined: Sat Apr 26, 2003 2:15 pm
Location: Cuernavaca, Mexico

Re: Unicode (UCS2) string length using SSE2

Post by blueb »

Using the new code....
Using: intel i7-2600 (4 cores) CPU @ 3.40GHz

Windows 7 Pro 5.30 x86

Code: Select all

Short string : 35(SSE2) vs 46(PB)
Longer string : 60(SSE2) vs 204(PB)
Very long string : 157(SSE2) vs 709(PB)
- It was too lonely at the top.

System : PB 6.21(x64) and Win 11 Pro (x64)
Hardware: AMD Ryzen 9 5900X w/64 gigs Ram, AMD RX 6950 XT Graphics w/16gigs Mem
User avatar
flaith
Enthusiast
Enthusiast
Posts: 704
Joined: Mon Apr 25, 2005 9:28 pm
Location: $300:20 58 FC 60 - Rennes
Contact:

Re: Unicode (UCS2) string length using SSE2

Post by flaith »

Windows 8.1 x64:
---------------------------
Test results V1
---------------------------
Short string : 99(SSE2) vs 100(PB)
Longer string : 181(SSE2) vs 391(PB)
Very long string : 479(SSE2) vs 1563(PB)
---------------------------
Test results V2
---------------------------
Short string : 69(SSE2) vs 102(PB)
Longer string : 121(SSE2) vs 392(PB)
Very long string : 331(SSE2) vs 1563(PB)
“Fear is a reaction. Courage is a decision.” - WC
User avatar
idle
Always Here
Always Here
Posts: 5835
Joined: Fri Sep 21, 2007 5:52 am
Location: New Zealand

Re: Unicode (UCS2) string length using SSE2

Post by idle »

v2 Linux x64

Twice as fast as V1 on the very long string, nice work Wilbert
---------------------------
Test results
---------------------------
Short string : 99(SSE2) vs 70(PB)

Longer string : 140(SSE2) vs 260(PB)

Very long string : 206(SSE2) vs 1041(PB)
Windows 11, Manjaro, Raspberry Pi OS
Image
User avatar
ts-soft
Always Here
Always Here
Posts: 5756
Joined: Thu Jun 24, 2004 2:44 pm
Location: Berlin - Germany

Re: Unicode (UCS2) string length using SSE2

Post by ts-soft »

Version 2

Windows 7 x64:
---------------------------
Test results
---------------------------
Unicode string length, SSE2 vs PB



Short string : 64(SSE2) vs 143(PB)

Longer string : 109(SSE2) vs 405(PB)

Very long string : 190(SSE2) vs 1896(PB)


---------------------------
OK
---------------------------
Windows 7 x86:
---------------------------
Test results
---------------------------
Unicode string length, SSE2 vs PB



Short string : 56(SSE2) vs 60(PB)

Longer string : 85(SSE2) vs 260(PB)

Very long string : 225(SSE2) vs 1057(PB)


---------------------------
OK
---------------------------
PureBasic 5.73 | SpiderBasic 2.30 | Windows 10 Pro (x64) | Linux Mint 20.1 (x64)
Old bugs good, new bugs bad! Updates are evil: might fix old bugs and introduce no new ones.
Image
User avatar
Tenaja
Addict
Addict
Posts: 1959
Joined: Tue Nov 09, 2010 10:15 pm

Re: Unicode (UCS2) string length using SSE2

Post by Tenaja »

wilbert wrote:If most strings are aligned on a word boundary (the usually are) and it isn't a problem the source becomes a little bigger, the following code should be even faster.
It splits the source in a part to handle word aligned and a part to handle non word aligned strings.
Is there a way to force (or ensure) string alignment on the Word boundaries? Fred?
User avatar
Demivec
Addict
Addict
Posts: 4259
Joined: Mon Jul 25, 2005 3:51 pm
Location: Utah, USA

Re: Unicode (UCS2) string length using SSE2

Post by Demivec »

Newer (and longer) code

Using: intel i5-4570 (4 cores) CPU @ 3.20GHz

Windows 8.1 5.30 x64

Code: Select all

Short string : 40(SSE2) vs 199(PB)
Longer string : 60(SSE2) vs 360(PB)
Very long string : 133(SSE2) vs 1166(PB))
Windows 8.1 5.30 x86

Code: Select all

Short string : 34(SSE2) vs 41(PB)
Longer string : 59(SSE2) vs 138(PB)
Very long string : 182(SSE2) vs 751(PB)
User avatar
netmaestro
PureBasic Bullfrog
PureBasic Bullfrog
Posts: 8451
Joined: Wed Jul 06, 2005 5:42 am
Location: Fort Nelson, BC, Canada

Re: Unicode (UCS2) string length using SSE2

Post by netmaestro »

new code:

Windows 8.1 x64 PB 5.30 x86

sse2 vs. PB
-------------
31 vs. 39
54 vs. 150
162 vs. 676
BERESHEIT
Post Reply