It's very easy to use and i just wrote a little procedure that uses it.
It's not much optimized, not unroled, etc. But it's allready 3 times faster than the PB routine. But the PB calculated CRC32 and the SSE4.2 calculated CRC32 are not comparable! They differ, i think this is because SSE4.2 calculates the CRC32 in reversed bit order.
Note: Your CPU must support SSE4.2. The code is just a example and is not checking if your CPU is supporting it!
Note 2: In order to compile it you need the newest stable version of FAsm. Download it from http://www.flatassembler.net and replace the old one in the "compilers" folder.
Code: Select all
Procedure.i CRC32_SSE4(*Buffer, Size.i, InitValue.i = -1 )
CompilerSelect #PB_Compiler_Processor
CompilerCase #PB_Processor_x86
!push esi
!mov esi,[p.p_Buffer+4]
!mov ecx,[p.v_Size+4]
!mov eax,[p.v_InitValue+4]
!xor edx,edx
!align 4
!CRC32_SSE4_LoopStart:
!mov dl,[esi]
!crc32 eax,dl
!inc esi
!dec ecx
!jne CRC32_SSE4_LoopStart
!pop esi
CompilerCase #PB_Processor_x64
!push rsi
!mov rsi,[p.p_Buffer+8]
!mov rcx,[p.v_Size+8]
!mov rax,[p.v_InitValue+8]
!xor rdx,rdx
!align 8
!CRC32_SSE4_LoopStart:
!mov dl,[rsi]
!crc32 rax,dl
!inc rsi
!dec rcx
!jne CRC32_SSE4_LoopStart
!pop rsi
CompilerEndSelect
ProcedureReturn
EndProcedure