calculate CRC32 with SSE4.2

Thorium · Post by **Thorium** » Sun Oct 25, 2009 3:34 pm

As i am learning SIMD and take a look at the newer instructions of SSE4 i saw a interessting instruction: CRC32.
It's very easy to use and i just wrote a little procedure that uses it.
It's not much optimized, not unroled, etc. But it's allready 3 times faster than the PB routine. But the PB calculated CRC32 and the SSE4.2 calculated CRC32 are not comparable! They differ, i think this is because SSE4.2 calculates the CRC32 in reversed bit order.

Note: Your CPU must support SSE4.2. The code is just a example and is not checking if your CPU is supporting it!

Note 2: In order to compile it you need the newest stable version of FAsm. Download it from http://www.flatassembler.net and replace the old one in the "compilers" folder.

Code: Select all

Procedure.i CRC32_SSE4(*Buffer, Size.i, InitValue.i = -1 )

  CompilerSelect #PB_Compiler_Processor
  
    CompilerCase #PB_Processor_x86

      !push esi
    
      !mov esi,[p.p_Buffer+4]
      !mov ecx,[p.v_Size+4]
      !mov eax,[p.v_InitValue+4]
      !xor edx,edx
      
      !align 4
      !CRC32_SSE4_LoopStart:
        
        !mov dl,[esi]
        !crc32 eax,dl
        
        !inc esi
        !dec ecx
      
      !jne CRC32_SSE4_LoopStart
    
      !pop esi
    
    CompilerCase #PB_Processor_x64
    
      !push rsi
    
      !mov rsi,[p.p_Buffer+8]
      !mov rcx,[p.v_Size+8]
      !mov rax,[p.v_InitValue+8]
      !xor rdx,rdx
      
      !align 8
      !CRC32_SSE4_LoopStart:
        
        !mov dl,[rsi]
        !crc32 rax,dl
        
        !inc rsi
        !dec rcx
      
      !jne CRC32_SSE4_LoopStart
    
      !pop rsi

  CompilerEndSelect

  ProcedureReturn

EndProcedure

DoubleDutch · Post by **DoubleDutch** » Sun Oct 25, 2009 3:51 pm

So which is the correct method? If either then maybe the official version should have a flag to select the direction?

Thorium · Post by **Thorium** » Sun Oct 25, 2009 3:57 pm

DoubleDutch wrote:So which is the correct method? If either then maybe the official version should have a flag to select the direction?

Thats not easy to say. ^^
I read a bit about it on the internet and it turned out that on network protocols the bit order is reversed. And the CRC32 instruction on SSE4.2 is mainly developed for use with network protocols.

DoubleDutch · Post by **DoubleDutch** » Sun Oct 25, 2009 3:59 pm

ic, then maybe there should be a flag to calculate the correct type of crc.

Thorium · Post by **Thorium** » Sun Oct 25, 2009 4:05 pm

DoubleDutch wrote:ic, then maybe there should be a flag to calculate the correct type of crc.

I think this would be a little bit complicated. Just read on wikipedia, it seams there are even more types of CRC32.

Wikipedia wrote: Commonly used and standardized CRCs

While cyclic redundancy checks form part of several standards, they are not themselves standardized to the point of adopting one algorithm of each degree worldwide: there are three polynomials reported for CRC-12[6], ten conflicting definitions of CRC-16, and four of CRC-32[7]. The polynomials usually seen are not the most efficient ones possible. Between 1993 and 2004, Koopman, Castagnoli and others surveyed the space of polynomials up to 16 bits[6], and of 24 and 32 bits,[8][9] finding examples that have much better performance (in terms of Hamming distance for a given message size) than the polynomials of earlier protocols, and publishing the best of these with the aim of improving the error detection capacity of future standards[9]. In particular, iSCSI has adopted one of the findings of this research.

The popular and IEEE-recommended CRC-32 polynomial, used by Ethernet, FDDI and others, is the generating polynomial of a Hamming code and, far from being arbitrarily chosen, was selected for its error detection performance[3]. Even so, the Castagnoli CRC-32C polynomial used in iSCSI matches its performance on messages from 58 bits–131 kbits, and outperforms it in several size ranges including the two most common sizes of Internet packet[9]. The ITU-T G.hn standard also uses CRC-32C to detect errors in the payload (although it uses CRC-16-CCITT for PHY headers).

The table below lists only the polynomials of the various algorithms in use. Any particular protocol can impose pre-inversion, post-inversion and reversed bit ordering as described above. CRCs in proprietary protocols might use a complicated initial value and final XOR for obfuscation but this does not add cryptographic strength to the algorithm.

Post by **Fred** » Sun Oct 25, 2009 11:12 pm

CRC32 is kind of standard, it's the first time i heard than you can get 2 several result depending of the implementation. Many application use them to interchange and validate data, so you have to respect that or it will be void. The PB one is the 'correct' one (ie: world wide used one

).

Thorium · Post by **Thorium** » Sun Oct 25, 2009 11:48 pm

Fred wrote:CRC32 is kind of standard, it's the first time i heard than you can get 2 several result depending of the implementation. Many application use them to interchange and validate data, so you have to respect that or it will be void. The PB one is the 'correct' one (ie: world wide used one ).

Maybe we just need to reverse the bit's and then they are the same. I will check that tomorow.

Thorium · Post by **Thorium** » Mon Oct 26, 2009 11:24 am

Well reversing the bits don't make them compatible.

And there are definitv different implementations of CRC32 with different results.
I just read a bit more about CRC32 and not every implementation uses the same polynomial. Therefor the polynomial is defined in technical specifications about systems that use CRC32. Another thing is that the data can be processed in reversed bit order or not.

So CRC32 is not generaly word wide compatible.

And of course there is the init value. Which is at the most implementations $FFFFFFFF = -1. But not on PB. CRC32Fingerprint(*Buffer, BfferLen) calculates a different CRC32 than CRC32Fingerprint(*Buffer, BfferLen, -1).

DoubleDutch · Post by **DoubleDutch** » Mon Oct 26, 2009 11:53 am

Which is at the most implementations $FFFFFFFF = -1. But not on PB. CRC32Fingerprint(*Buffer, BfferLen) calculates a different CRC32 than CRC32Fingerprint(*Buffer, BfferLen, -1).

Did you try an initial value of -1 with the bits reversed?

Post by **freak** » Mon Oct 26, 2009 6:45 pm

PB uses a different polynomial than the crc32 instruction, so you cannot get a compatible output out of it.

See here for a list of popular polynomials: http://en.wikipedia.org/wiki/Cyclic_red ... dized_CRCs

The CRC32 instruction uses the "CRC-32C" polynomial (0x1EDC6F41). PureBasic uses the "CRC-32-IEEE 802.3" polynomial (0xEDB88320). So no matter what you do, the output won't be the same. The CRC32 calculation defined in IEEE 802.3 is quite popular, so it is a good choise for PB.

btw, the "InitialValue" parameter to CRC32FingerPrint() does not directly specify the initial value for the CRC calculation. It is supposed to be the result of a previous call of CRC32FingerPrint() so that the new CRC is the checksum for both the old and new buffer. Because the result of CRC32FingerPrint() is complemented to be compatible to the IEEE method, the "InitialValue" is also complemented before it is used in the further CRC calculation. So pass 0 if you want to start with 0xFFFFFFFF. This is also the default for CRC32FingerPrint() without an initial value.

Thorium · Post by **Thorium** » Tue Oct 27, 2009 3:19 am

freak wrote: btw, the "InitialValue" parameter to CRC32FingerPrint() does not directly specify the initial value for the CRC calculation. It is supposed to be the result of a previous call of CRC32FingerPrint() so that the new CRC is the checksum for both the old and new buffer. Because the result of CRC32FingerPrint() is complemented to be compatible to the IEEE method, the "InitialValue" is also complemented before it is used in the further CRC calculation. So pass 0 if you want to start with 0xFFFFFFFF. This is also the default for CRC32FingerPrint() without an initial value.

But that means calculating the CRC32 of one big memory block and calculating the CRC32 a second time from the same memory block but chunk wise and use the CRC32 of the previous chunk as initial value of the next chunk will result in 2 different checksums. Or i am wrong? I have to test this. This would make absolutly no sense in my opinion. Well if it is defined in IEEE that it have to be done that way, then it's correct but it makes no sense anyway.

Post by **Fred** » Tue Oct 27, 2009 9:24 am

Sure you're wrong, both will produce exactly the same result (what would be the sense if not ?).

Thorium · Post by **Thorium** » Tue Oct 27, 2009 11:32 am

Fred wrote:Sure you're wrong, both will produce exactly the same result (what would be the sense if not ?).

Tested it and i am wrong. The whole thing confuses me. I don't get why the checksum is the same if the initial value (-1) is applied on every call, even if i define a initial value.

Post by **freak** » Tue Oct 27, 2009 2:28 pm

The chaining of CRC32FingerPrint() calls is not defined by IEEE, its a PB feature. As i explained above, the 'InitialValue' is _not_ the initial value of the CRC calculation. Just look at the help example to see how it is used.

PureBasic Forums - English

calculate CRC32 with SSE4.2

calculate CRC32 with SSE4.2

Re: calculate CRC32 with SSE4.2

Re: calculate CRC32 with SSE4.2

Re: calculate CRC32 with SSE4.2

Re: calculate CRC32 with SSE4.2

Re: calculate CRC32 with SSE4.2

Re: calculate CRC32 with SSE4.2

Re: calculate CRC32 with SSE4.2

Re: calculate CRC32 with SSE4.2

Re: calculate CRC32 with SSE4.2

Re: calculate CRC32 with SSE4.2

Re: calculate CRC32 with SSE4.2

Re: calculate CRC32 with SSE4.2

Re: calculate CRC32 with SSE4.2