Page 1 of 3

Unicode string Len and Mid procedures using SSE2

Posted: Sun Aug 17, 2014 3:19 pm
by wilbert
I tried to create a procedure to get the length of a unicode string using SSE2 instructions.

Since results might be different on different platforms, it would be great if someone wants to test it.
On my iMac with OSX (Core2Duo) the PB version is faster for short strings and SSE2 for longer strings.

Edit:
I posted an updated code a few posts below.
It's faster and also contains a Mid procedure.
http://www.purebasic.fr/english/viewtop ... 25#p451425

Code: Select all

Procedure.i UCS2Len(*UCS2String)
  CompilerIf #PB_Compiler_Processor = #PB_Processor_x64
    !mov rdx, [p.p_UCS2String]
    !mov ecx, edx
    !and rdx, -16
    !movdqa xmm0, [rdx]
    !push rbx
  CompilerElse
    !mov edx, [p.p_UCS2String]
    !mov ecx, edx
    !and edx, -16
    !movdqa xmm0, [edx]
    !push ebx
  CompilerEndIf  
  !pxor xmm1, xmm1
  !pcmpeqb xmm0, xmm1
  !pmovmskb eax, xmm0
  !mov ebx, 0xaaaaaaaa
  !sub ecx, edx
  !shr eax, cl
  !add ecx, 16
  !shl eax, cl  
  !and ecx, 1
  !shr ebx, cl
  !jmp ucs2len1
  !ucs2len0:
  CompilerIf #PB_Compiler_Processor = #PB_Processor_x64
    !add rdx, 16
    !movdqa xmm0, [rdx]
  CompilerElse
    !add edx, 16
    !movdqa xmm0, [edx]
  CompilerEndIf
  !pcmpeqb xmm0, xmm1
  !pmovmskb ecx, xmm0
  !shr eax, 16
  !shl ecx, 16
  !or eax, ecx
  !ucs2len1:
  !lea ecx, [eax + eax]
  !and ecx, eax
  !and ecx, ebx 
  !jz ucs2len0
  CompilerIf #PB_Compiler_Processor = #PB_Processor_x64
    !pop rbx
  CompilerElse
    !pop ebx
  CompilerEndIf
  !bsf eax, ecx
  !lea eax, [eax + edx - 16]
  !sub eax, [p.p_UCS2String]
  !shr eax, 1
  ProcedureReturn
EndProcedure
Test code (important to compile with unicode enabled and debugger disabled !)

Code: Select all

S.s = "Unicode string length, SSE2 vs PB" + #CRLF$ + #CRLF$  

t1 = ElapsedMilliseconds()
For i = 1 To 10000000
  c = UCS2Len(@"Short")
Next
t2 = ElapsedMilliseconds()
For i = 1 To 10000000
  c = MemoryStringLength(@"Short")
Next
t3 = ElapsedMilliseconds()
S + "Short string : " + Str(t2 - t1) + "(SSE2) vs " + Str(t3 - t2) + "(PB)" + #CRLF$


t1 = ElapsedMilliseconds()
For i = 1 To 10000000
  c = UCS2Len(@"A bit longer string to test with")
Next
t2 = ElapsedMilliseconds()
For i = 1 To 10000000
  c = MemoryStringLength(@"A bit longer string to test with")
Next
t3 = ElapsedMilliseconds()
S + "Longer string : " + Str(t2 - t1) + "(SSE2) vs " + Str(t3 - t2) + "(PB)" + #CRLF$


SpaceString.s = Space(20000)
t1 = ElapsedMilliseconds()
For i = 1 To 100000
  c = UCS2Len(@SpaceString)
Next
t2 = ElapsedMilliseconds()
For i = 1 To 100000
  c = MemoryStringLength(@SpaceString)
Next
t3 = ElapsedMilliseconds()
S + "Very long string : " + Str(t2 - t1) + "(SSE2) vs " + Str(t3 - t2) + "(PB)" + #CRLF$

MessageRequester("Test results", S)

Re: Unicode (UCS2) string length using SSE2

Posted: Sun Aug 17, 2014 3:36 pm
by ts-soft
Windows 7 x64:
Test results wrote:---------------------------
Test results
---------------------------
Short string : 81(SSE2) vs 160(PB)

Longer string : 159(SSE2) vs 446(PB)

Very long string : 471(SSE2) vs 2053(PB)


---------------------------
OK
---------------------------
Windows 7 x86:
Test results wrote:---------------------------
Test results
---------------------------
Short string : 72(SSE2) vs 70(PB)

Longer string : 154(SSE2) vs 285(PB)

Very long string : 476(SSE2) vs 1158(PB)


---------------------------
OK
---------------------------

Re: Unicode (UCS2) string length using SSE2

Posted: Sun Aug 17, 2014 8:41 pm
by idle
linux x64
---------------------------
Test results
---------------------------
Short string : 107(SSE2) vs 67(PB)

Longer string : 180(SSE2) vs 259(PB)

Very long string : 416(SSE2) vs 1043(PB)

Re: Unicode (UCS2) string length using SSE2

Posted: Sun Aug 17, 2014 8:45 pm
by Fred
so you actually beat the strlen() routine on linux, that's quite a nice work and you should submit a patch for the libc ! :)

Re: Unicode (UCS2) string length using SSE2

Posted: Sun Aug 17, 2014 9:09 pm
by netmaestro
sse2 vs. pb

39 vs. 35

78 vs. 127

224 vs. 669

Windows 8.1 x64, pb 5.30x86, i7 4770 @ 3.4 ghz

fastest code overall: wilbert
fastest computer: netmaestro (yay!)

Re: Unicode (UCS2) string length using SSE2

Posted: Sun Aug 17, 2014 11:47 pm
by graph100
avec mon : i7-3630QM @2.40GHz

window 8, PB5.22x86 :
Short string : 48(SSE2) vs 59(PB)
Longer string : 106(SSE2) vs 297(PB)
Very long string : 285(SSE2) vs 929(PB)
Mandriva (VM), PB5.22x86 :
Short string : 72(SSE2) vs 66(PB)
Longer string : 125(SSE2) vs 231(PB)
Very long string : 283(SSE2) vs 839(PB)
Mac (VM), PB5.21x86 :
Short string : 91(SSE2) vs 77(PB)
Longer string : 124(SSE2) vs 257(PB)
Very long string : 288(SSE2) vs 850(PB)

Re: Unicode (UCS2) string length using SSE2

Posted: Mon Aug 18, 2014 12:09 am
by Demivec
Using: intel i5-4570 (4 cores) CPU @ 3.20GHz

Windows 8.1 5.30 x64

Code: Select all

Short string : 56(SSE2) vs 200(PB)
Longer string : 95(SSE2) vs 362(PB)
Very long string : 254(SSE2) vs 1179(PB)
Windows 8.1 5.30 x86

Code: Select all

Short string : 44(SSE2) vs 39(PB)
Longer string : 88(SSE2) vs 142(PB)
Very long string : 252(SSE2) vs 751(PB)

Re: Unicode (UCS2) string length using SSE2

Posted: Mon Aug 18, 2014 11:53 am
by wilbert
Thanks a lot for testing :-)
This helps to get a better perspective.
Fred wrote:so you actually beat the strlen() routine on linux, that's quite a nice work and you should submit a patch for the libc ! :)
I don't know if Linux is optimized for handling 16 bit unicode. As far as I understand it uses 32 bit internally.

My code in the first post, was a general approach being able to handle both strings aligned on a word boundary and not.
If most strings are aligned on a word boundary (the usually are) and it isn't a problem the source becomes a little bigger, the following code should be even faster.
It splits the source in a part to handle word aligned and a part to handle non word aligned strings.
It can be tested with the same test code from the first post.

Code: Select all

CompilerIf #PB_Compiler_Processor = #PB_Processor_x64
  ; x64 Macro definitions
  Macro UCS2Len_MOV
    !movdqa xmm0, [rdx]
  EndMacro
  Macro UCS2Len_ADD
    !add rdx, 16
  EndMacro    
CompilerElse
  ; x86 Macro definitions  
  Macro UCS2Len_MOV
    !movdqa xmm0, [edx]
  EndMacro
  Macro UCS2Len_ADD
    !add edx, 16
  EndMacro    
CompilerEndIf  

Procedure.i UCS2Len(*UCS2String)
  ; init and check word boundary alignment
  CompilerIf #PB_Compiler_Processor = #PB_Processor_x64
    !mov rdx, [p.p_UCS2String]
    !mov ecx, edx
    !and rdx, -16
  CompilerElse
    !mov edx, [p.p_UCS2String]
    !mov ecx, edx
    !and edx, -16
  CompilerEndIf
  UCS2Len_MOV
  !pxor xmm1, xmm1
  !sub ecx, edx
  !test ecx, 1
  !jnz ucs2len1
  
  ; handle strings aligned to word boundary
  !pcmpeqw xmm0, xmm1
  !pmovmskb eax, xmm0
  !shr eax, cl
  !shl eax, cl
  UCS2Len_ADD
  !and eax, eax
  !jnz ucs2len3
  !ucs2len0:
  UCS2Len_MOV
  !pcmpeqw xmm0, xmm1
  !pmovmskb eax, xmm0
  UCS2Len_ADD
  !and eax, eax
  !jz ucs2len0
  !jmp ucs2len3
  
  ; handle strings not aligned to word boundary
  !ucs2len1:
  !movdqa xmm3, xmm0
  !pslldq xmm0, 1
  !inc cl
  !pcmpeqw xmm0, xmm1
  !pmovmskb eax, xmm0
  !shr eax, cl
  !shl eax, cl
  UCS2Len_ADD
  !and eax, eax
  !jnz ucs2len3
  !ucs2len2:
  UCS2Len_MOV
  !movdqa xmm2, xmm0
  !pslldq xmm0, 1
  !psrldq xmm3, 15
  !por xmm0, xmm3
  !pcmpeqw xmm0, xmm1
  !pmovmskb eax, xmm0
  UCS2Len_ADD
  !movdqa xmm3, xmm2
  !and eax, eax
  !jz ucs2len2
  
  ; exit procedure
  !ucs2len3:
  !bsf ecx, eax
  !lea eax, [ecx + edx - 16]
  !sub eax, [p.p_UCS2String]
  !shr eax, 1
  ProcedureReturn
EndProcedure
Test code (important to compile with unicode enabled and debugger disabled !)

Code: Select all

S.s = "Unicode string length, SSE2 vs PB" + #CRLF$ + #CRLF$  

t1 = ElapsedMilliseconds()
For i = 1 To 10000000
  c = UCS2Len(@"Short")
Next
t2 = ElapsedMilliseconds()
For i = 1 To 10000000
  c = MemoryStringLength(@"Short")
Next
t3 = ElapsedMilliseconds()
S + "Short string : " + Str(t2 - t1) + "(SSE2) vs " + Str(t3 - t2) + "(PB)" + #CRLF$


t1 = ElapsedMilliseconds()
For i = 1 To 10000000
  c = UCS2Len(@"A bit longer string to test with")
Next
t2 = ElapsedMilliseconds()
For i = 1 To 10000000
  c = MemoryStringLength(@"A bit longer string to test with")
Next
t3 = ElapsedMilliseconds()
S + "Longer string : " + Str(t2 - t1) + "(SSE2) vs " + Str(t3 - t2) + "(PB)" + #CRLF$


SpaceString.s = Space(20000)
t1 = ElapsedMilliseconds()
For i = 1 To 100000
  c = UCS2Len(@SpaceString)
Next
t2 = ElapsedMilliseconds()
For i = 1 To 100000
  c = MemoryStringLength(@SpaceString)
Next
t3 = ElapsedMilliseconds()
S + "Very long string : " + Str(t2 - t1) + "(SSE2) vs " + Str(t3 - t2) + "(PB)" + #CRLF$

MessageRequester("Test results", S)

Re: Unicode (UCS2) string length using SSE2

Posted: Mon Aug 18, 2014 12:17 pm
by blueb
Using the new code....
Using: intel i7-2600 (4 cores) CPU @ 3.40GHz

Windows 7 Pro 5.30 x86

Code: Select all

Short string : 35(SSE2) vs 46(PB)
Longer string : 60(SSE2) vs 204(PB)
Very long string : 157(SSE2) vs 709(PB)

Re: Unicode (UCS2) string length using SSE2

Posted: Mon Aug 18, 2014 12:19 pm
by flaith
Windows 8.1 x64:
---------------------------
Test results V1
---------------------------
Short string : 99(SSE2) vs 100(PB)
Longer string : 181(SSE2) vs 391(PB)
Very long string : 479(SSE2) vs 1563(PB)
---------------------------
Test results V2
---------------------------
Short string : 69(SSE2) vs 102(PB)
Longer string : 121(SSE2) vs 392(PB)
Very long string : 331(SSE2) vs 1563(PB)

Re: Unicode (UCS2) string length using SSE2

Posted: Mon Aug 18, 2014 9:09 pm
by idle
v2 Linux x64

Twice as fast as V1 on the very long string, nice work Wilbert
---------------------------
Test results
---------------------------
Short string : 99(SSE2) vs 70(PB)

Longer string : 140(SSE2) vs 260(PB)

Very long string : 206(SSE2) vs 1041(PB)

Re: Unicode (UCS2) string length using SSE2

Posted: Mon Aug 18, 2014 9:17 pm
by ts-soft
Version 2

Windows 7 x64:
---------------------------
Test results
---------------------------
Unicode string length, SSE2 vs PB



Short string : 64(SSE2) vs 143(PB)

Longer string : 109(SSE2) vs 405(PB)

Very long string : 190(SSE2) vs 1896(PB)


---------------------------
OK
---------------------------
Windows 7 x86:
---------------------------
Test results
---------------------------
Unicode string length, SSE2 vs PB



Short string : 56(SSE2) vs 60(PB)

Longer string : 85(SSE2) vs 260(PB)

Very long string : 225(SSE2) vs 1057(PB)


---------------------------
OK
---------------------------

Re: Unicode (UCS2) string length using SSE2

Posted: Mon Aug 18, 2014 9:39 pm
by Tenaja
wilbert wrote:If most strings are aligned on a word boundary (the usually are) and it isn't a problem the source becomes a little bigger, the following code should be even faster.
It splits the source in a part to handle word aligned and a part to handle non word aligned strings.
Is there a way to force (or ensure) string alignment on the Word boundaries? Fred?

Re: Unicode (UCS2) string length using SSE2

Posted: Mon Aug 18, 2014 11:58 pm
by Demivec
Newer (and longer) code

Using: intel i5-4570 (4 cores) CPU @ 3.20GHz

Windows 8.1 5.30 x64

Code: Select all

Short string : 40(SSE2) vs 199(PB)
Longer string : 60(SSE2) vs 360(PB)
Very long string : 133(SSE2) vs 1166(PB))
Windows 8.1 5.30 x86

Code: Select all

Short string : 34(SSE2) vs 41(PB)
Longer string : 59(SSE2) vs 138(PB)
Very long string : 182(SSE2) vs 751(PB)

Re: Unicode (UCS2) string length using SSE2

Posted: Tue Aug 19, 2014 12:23 am
by netmaestro
new code:

Windows 8.1 x64 PB 5.30 x86

sse2 vs. PB
-------------
31 vs. 39
54 vs. 150
162 vs. 676