Page 1 of 3
Unicode string Len and Mid procedures using SSE2
Posted: Sun Aug 17, 2014 3:19 pm
by wilbert
I tried to create a procedure to get the length of a unicode string using SSE2 instructions.
Since results might be different on different platforms, it would be great if someone wants to test it.
On my iMac with OSX (Core2Duo) the PB version is faster for short strings and SSE2 for longer strings.
Edit:
I posted an updated code a few posts below.
It's faster and also contains a Mid procedure.
http://www.purebasic.fr/english/viewtop ... 25#p451425
Code: Select all
Procedure.i UCS2Len(*UCS2String)
CompilerIf #PB_Compiler_Processor = #PB_Processor_x64
!mov rdx, [p.p_UCS2String]
!mov ecx, edx
!and rdx, -16
!movdqa xmm0, [rdx]
!push rbx
CompilerElse
!mov edx, [p.p_UCS2String]
!mov ecx, edx
!and edx, -16
!movdqa xmm0, [edx]
!push ebx
CompilerEndIf
!pxor xmm1, xmm1
!pcmpeqb xmm0, xmm1
!pmovmskb eax, xmm0
!mov ebx, 0xaaaaaaaa
!sub ecx, edx
!shr eax, cl
!add ecx, 16
!shl eax, cl
!and ecx, 1
!shr ebx, cl
!jmp ucs2len1
!ucs2len0:
CompilerIf #PB_Compiler_Processor = #PB_Processor_x64
!add rdx, 16
!movdqa xmm0, [rdx]
CompilerElse
!add edx, 16
!movdqa xmm0, [edx]
CompilerEndIf
!pcmpeqb xmm0, xmm1
!pmovmskb ecx, xmm0
!shr eax, 16
!shl ecx, 16
!or eax, ecx
!ucs2len1:
!lea ecx, [eax + eax]
!and ecx, eax
!and ecx, ebx
!jz ucs2len0
CompilerIf #PB_Compiler_Processor = #PB_Processor_x64
!pop rbx
CompilerElse
!pop ebx
CompilerEndIf
!bsf eax, ecx
!lea eax, [eax + edx - 16]
!sub eax, [p.p_UCS2String]
!shr eax, 1
ProcedureReturn
EndProcedure
Test code (important to compile with
unicode enabled and
debugger disabled !)
Code: Select all
S.s = "Unicode string length, SSE2 vs PB" + #CRLF$ + #CRLF$
t1 = ElapsedMilliseconds()
For i = 1 To 10000000
c = UCS2Len(@"Short")
Next
t2 = ElapsedMilliseconds()
For i = 1 To 10000000
c = MemoryStringLength(@"Short")
Next
t3 = ElapsedMilliseconds()
S + "Short string : " + Str(t2 - t1) + "(SSE2) vs " + Str(t3 - t2) + "(PB)" + #CRLF$
t1 = ElapsedMilliseconds()
For i = 1 To 10000000
c = UCS2Len(@"A bit longer string to test with")
Next
t2 = ElapsedMilliseconds()
For i = 1 To 10000000
c = MemoryStringLength(@"A bit longer string to test with")
Next
t3 = ElapsedMilliseconds()
S + "Longer string : " + Str(t2 - t1) + "(SSE2) vs " + Str(t3 - t2) + "(PB)" + #CRLF$
SpaceString.s = Space(20000)
t1 = ElapsedMilliseconds()
For i = 1 To 100000
c = UCS2Len(@SpaceString)
Next
t2 = ElapsedMilliseconds()
For i = 1 To 100000
c = MemoryStringLength(@SpaceString)
Next
t3 = ElapsedMilliseconds()
S + "Very long string : " + Str(t2 - t1) + "(SSE2) vs " + Str(t3 - t2) + "(PB)" + #CRLF$
MessageRequester("Test results", S)
Re: Unicode (UCS2) string length using SSE2
Posted: Sun Aug 17, 2014 3:36 pm
by ts-soft
Windows 7 x64:
Test results wrote:---------------------------
Test results
---------------------------
Short string : 81(SSE2) vs 160(PB)
Longer string : 159(SSE2) vs 446(PB)
Very long string : 471(SSE2) vs 2053(PB)
---------------------------
OK
---------------------------
Windows 7 x86:
Test results wrote:---------------------------
Test results
---------------------------
Short string : 72(SSE2) vs 70(PB)
Longer string : 154(SSE2) vs 285(PB)
Very long string : 476(SSE2) vs 1158(PB)
---------------------------
OK
---------------------------
Re: Unicode (UCS2) string length using SSE2
Posted: Sun Aug 17, 2014 8:41 pm
by idle
linux x64
---------------------------
Test results
---------------------------
Short string : 107(SSE2) vs 67(PB)
Longer string : 180(SSE2) vs 259(PB)
Very long string : 416(SSE2) vs 1043(PB)
Re: Unicode (UCS2) string length using SSE2
Posted: Sun Aug 17, 2014 8:45 pm
by Fred
so you actually beat the strlen() routine on linux, that's quite a nice work and you should submit a patch for the libc !

Re: Unicode (UCS2) string length using SSE2
Posted: Sun Aug 17, 2014 9:09 pm
by netmaestro
sse2 vs. pb
39 vs. 35
78 vs. 127
224 vs. 669
Windows 8.1 x64, pb 5.30x86, i7 4770 @ 3.4 ghz
fastest code overall: wilbert
fastest computer: netmaestro (yay!)
Re: Unicode (UCS2) string length using SSE2
Posted: Sun Aug 17, 2014 11:47 pm
by graph100
avec mon : i7-3630QM @2.40GHz
window 8, PB5.22x86 :
Short string : 48(SSE2) vs 59(PB)
Longer string : 106(SSE2) vs 297(PB)
Very long string : 285(SSE2) vs 929(PB)
Mandriva (VM), PB5.22x86 :
Short string : 72(SSE2) vs 66(PB)
Longer string : 125(SSE2) vs 231(PB)
Very long string : 283(SSE2) vs 839(PB)
Mac (VM), PB5.21x86 :
Short string : 91(SSE2) vs 77(PB)
Longer string : 124(SSE2) vs 257(PB)
Very long string : 288(SSE2) vs 850(PB)
Re: Unicode (UCS2) string length using SSE2
Posted: Mon Aug 18, 2014 12:09 am
by Demivec
Using: intel i5-4570 (4 cores) CPU @ 3.20GHz
Windows 8.1 5.30 x64
Code: Select all
Short string : 56(SSE2) vs 200(PB)
Longer string : 95(SSE2) vs 362(PB)
Very long string : 254(SSE2) vs 1179(PB)
Windows 8.1 5.30 x86
Code: Select all
Short string : 44(SSE2) vs 39(PB)
Longer string : 88(SSE2) vs 142(PB)
Very long string : 252(SSE2) vs 751(PB)
Re: Unicode (UCS2) string length using SSE2
Posted: Mon Aug 18, 2014 11:53 am
by wilbert
Thanks a lot for testing

This helps to get a better perspective.
Fred wrote:so you actually beat the strlen() routine on linux, that's quite a nice work and you should submit a patch for the libc !

I don't know if Linux is optimized for handling 16 bit unicode. As far as I understand it uses 32 bit internally.
My code in the first post, was a general approach being able to handle both strings aligned on a word boundary and not.
If most strings are aligned on a word boundary (the usually are) and it isn't a problem the source becomes a little bigger, the following code should be even faster.
It splits the source in a part to handle word aligned and a part to handle non word aligned strings.
It can be tested with the same test code from the first post.
Code: Select all
CompilerIf #PB_Compiler_Processor = #PB_Processor_x64
; x64 Macro definitions
Macro UCS2Len_MOV
!movdqa xmm0, [rdx]
EndMacro
Macro UCS2Len_ADD
!add rdx, 16
EndMacro
CompilerElse
; x86 Macro definitions
Macro UCS2Len_MOV
!movdqa xmm0, [edx]
EndMacro
Macro UCS2Len_ADD
!add edx, 16
EndMacro
CompilerEndIf
Procedure.i UCS2Len(*UCS2String)
; init and check word boundary alignment
CompilerIf #PB_Compiler_Processor = #PB_Processor_x64
!mov rdx, [p.p_UCS2String]
!mov ecx, edx
!and rdx, -16
CompilerElse
!mov edx, [p.p_UCS2String]
!mov ecx, edx
!and edx, -16
CompilerEndIf
UCS2Len_MOV
!pxor xmm1, xmm1
!sub ecx, edx
!test ecx, 1
!jnz ucs2len1
; handle strings aligned to word boundary
!pcmpeqw xmm0, xmm1
!pmovmskb eax, xmm0
!shr eax, cl
!shl eax, cl
UCS2Len_ADD
!and eax, eax
!jnz ucs2len3
!ucs2len0:
UCS2Len_MOV
!pcmpeqw xmm0, xmm1
!pmovmskb eax, xmm0
UCS2Len_ADD
!and eax, eax
!jz ucs2len0
!jmp ucs2len3
; handle strings not aligned to word boundary
!ucs2len1:
!movdqa xmm3, xmm0
!pslldq xmm0, 1
!inc cl
!pcmpeqw xmm0, xmm1
!pmovmskb eax, xmm0
!shr eax, cl
!shl eax, cl
UCS2Len_ADD
!and eax, eax
!jnz ucs2len3
!ucs2len2:
UCS2Len_MOV
!movdqa xmm2, xmm0
!pslldq xmm0, 1
!psrldq xmm3, 15
!por xmm0, xmm3
!pcmpeqw xmm0, xmm1
!pmovmskb eax, xmm0
UCS2Len_ADD
!movdqa xmm3, xmm2
!and eax, eax
!jz ucs2len2
; exit procedure
!ucs2len3:
!bsf ecx, eax
!lea eax, [ecx + edx - 16]
!sub eax, [p.p_UCS2String]
!shr eax, 1
ProcedureReturn
EndProcedure
Test code (important to compile with
unicode enabled and
debugger disabled !)
Code: Select all
S.s = "Unicode string length, SSE2 vs PB" + #CRLF$ + #CRLF$
t1 = ElapsedMilliseconds()
For i = 1 To 10000000
c = UCS2Len(@"Short")
Next
t2 = ElapsedMilliseconds()
For i = 1 To 10000000
c = MemoryStringLength(@"Short")
Next
t3 = ElapsedMilliseconds()
S + "Short string : " + Str(t2 - t1) + "(SSE2) vs " + Str(t3 - t2) + "(PB)" + #CRLF$
t1 = ElapsedMilliseconds()
For i = 1 To 10000000
c = UCS2Len(@"A bit longer string to test with")
Next
t2 = ElapsedMilliseconds()
For i = 1 To 10000000
c = MemoryStringLength(@"A bit longer string to test with")
Next
t3 = ElapsedMilliseconds()
S + "Longer string : " + Str(t2 - t1) + "(SSE2) vs " + Str(t3 - t2) + "(PB)" + #CRLF$
SpaceString.s = Space(20000)
t1 = ElapsedMilliseconds()
For i = 1 To 100000
c = UCS2Len(@SpaceString)
Next
t2 = ElapsedMilliseconds()
For i = 1 To 100000
c = MemoryStringLength(@SpaceString)
Next
t3 = ElapsedMilliseconds()
S + "Very long string : " + Str(t2 - t1) + "(SSE2) vs " + Str(t3 - t2) + "(PB)" + #CRLF$
MessageRequester("Test results", S)
Re: Unicode (UCS2) string length using SSE2
Posted: Mon Aug 18, 2014 12:17 pm
by blueb
Using the new code....
Using: intel i7-2600 (4 cores) CPU @ 3.40GHz
Windows 7 Pro 5.30 x86
Code: Select all
Short string : 35(SSE2) vs 46(PB)
Longer string : 60(SSE2) vs 204(PB)
Very long string : 157(SSE2) vs 709(PB)
Re: Unicode (UCS2) string length using SSE2
Posted: Mon Aug 18, 2014 12:19 pm
by flaith
Windows 8.1 x64:
---------------------------
Test results V1
---------------------------
Short string : 99(SSE2) vs 100(PB)
Longer string : 181(SSE2) vs 391(PB)
Very long string : 479(SSE2) vs 1563(PB)
---------------------------
Test results V2
---------------------------
Short string : 69(SSE2) vs 102(PB)
Longer string : 121(SSE2) vs 392(PB)
Very long string : 331(SSE2) vs 1563(PB)
Re: Unicode (UCS2) string length using SSE2
Posted: Mon Aug 18, 2014 9:09 pm
by idle
v2 Linux x64
Twice as fast as V1 on the very long string, nice work Wilbert
---------------------------
Test results
---------------------------
Short string : 99(SSE2) vs 70(PB)
Longer string : 140(SSE2) vs 260(PB)
Very long string : 206(SSE2) vs 1041(PB)
Re: Unicode (UCS2) string length using SSE2
Posted: Mon Aug 18, 2014 9:17 pm
by ts-soft
Version 2
Windows 7 x64:
---------------------------
Test results
---------------------------
Unicode string length, SSE2 vs PB
Short string : 64(SSE2) vs 143(PB)
Longer string : 109(SSE2) vs 405(PB)
Very long string : 190(SSE2) vs 1896(PB)
---------------------------
OK
---------------------------
Windows 7 x86:
---------------------------
Test results
---------------------------
Unicode string length, SSE2 vs PB
Short string : 56(SSE2) vs 60(PB)
Longer string : 85(SSE2) vs 260(PB)
Very long string : 225(SSE2) vs 1057(PB)
---------------------------
OK
---------------------------
Re: Unicode (UCS2) string length using SSE2
Posted: Mon Aug 18, 2014 9:39 pm
by Tenaja
wilbert wrote:If most strings are aligned on a word boundary (the usually are) and it isn't a problem the source becomes a little bigger, the following code should be even faster.
It splits the source in a part to handle word aligned and a part to handle non word aligned strings.
Is there a way to force (or ensure) string alignment on the Word boundaries? Fred?
Re: Unicode (UCS2) string length using SSE2
Posted: Mon Aug 18, 2014 11:58 pm
by Demivec
Newer (and longer) code
Using: intel i5-4570 (4 cores) CPU @ 3.20GHz
Windows 8.1 5.30 x64
Code: Select all
Short string : 40(SSE2) vs 199(PB)
Longer string : 60(SSE2) vs 360(PB)
Very long string : 133(SSE2) vs 1166(PB))
Windows 8.1 5.30 x86
Code: Select all
Short string : 34(SSE2) vs 41(PB)
Longer string : 59(SSE2) vs 138(PB)
Very long string : 182(SSE2) vs 751(PB)
Re: Unicode (UCS2) string length using SSE2
Posted: Tue Aug 19, 2014 12:23 am
by netmaestro
new code:
Windows 8.1 x64 PB 5.30 x86
sse2 vs. PB
-------------
31 vs. 39
54 vs. 150
162 vs. 676