little SSE example for aligned instructions

Share your advanced PureBasic knowledge/code with the community.
User avatar
pcfreak
User
User
Posts: 75
Joined: Sat May 22, 2004 1:38 am

little SSE example for aligned instructions

Post by pcfreak »

As this is about assembler not everyone might find it useful, anyway. I was a bit interested laterly what those MMX and SSE stuff is about. As MMX wasn't any real problem (just some other commands on other registers and the need to reset the FPU state by the EMMS command) I hit some problems with SSE commands. So before someone else stucks at the same problem, the alignment needed for some commands, here the solution. :wink:
The problem with some SSE instructions is the need of aligned addresses. That means the addresses needs to by a multiple of a specific value. For SSE instructions that's (as far as I can see it) 16. So if you use those SSE instructions on memory addresses the address must by dividable by 16 or said in another way, the modulo of the memory address to 16 has to be 0. Well, enough theory here is an example.

Code: Select all

Procedure aligned_malloc(size.l, align.l)
 *mem.INTEGER = AllocateMemory(size + align + SizeOf(INTEGER) - 1)
 Debug *mem % 16
 a.l = align - (*mem % align)
 If a < SizeOf(INTEGER) : a + align : EndIf
 PokeI(*mem + a - SizeOf(INTEGER), *mem)
 ProcedureReturn *mem + a
EndProcedure

Procedure aligned_free(*mem)
 FreeMemory(PeekI(*mem - SizeOf(INTEGER)))
EndProcedure

*a = aligned_malloc(16, 16)
*b = aligned_malloc(16, 16)
*c = aligned_malloc(16, 16)

PokeQ(*a, $0101010101010101)
PokeQ(*a+8, $0101010101010101)
PokeQ(*b, $1010101010101010)
PokeQ(*b+8, $1010101010101010)

!MOV eax, dword [p_a]
!MOV ebx, dword [p_b]
!MOV ecx, dword [p_c]
!MOVAPS xmm0, [eax]
!XORPS xmm0, [ebx]
!MOVAPS [ecx], xmm0
Debug RSet(Hex(PeekQ(*c)), 16, "0") + RSet(Hex(PeekQ(*c + 8)), 16, "0")

aligned_free(*a)
aligned_free(*b)
aligned_free(*c)
Because we don't know which address AllocateMemory returns we reserve enough space to set the address we need. As we are already allocating some memory we put in some extra space to save the original pointer to the memory block. If we don't do this we have problems to free the space later. In some environments like VC you have already an AllocateMemory function (with a different name of course) that can return aligned addresses. The example uses the SSE register xmm0 to store the value from *a and xor it with the value of *b. The result is stored in *c. You could have done the same with an unaligned address e.g.

Code: Select all

*a = AllocateMemory(16)
*b = AllocateMemory(16)
*c = AllocateMemory(16)

PokeQ(*a, $0101010101010101)
PokeQ(*a+8, $0101010101010101)
PokeQ(*b, $1010101010101010)
PokeQ(*b+8, $1010101010101010)

!MOV eax, dword [p_a]
!MOV ebx, dword [p_b]
!MOV ecx, dword [p_c]
!MOVUPS xmm0, [eax]
!MOVUPS xmm1, [ebx]
!XORPS xmm0, xmm1
!MOVUPS [ecx], xmm0
Debug RSet(Hex(PeekQ(*c)), 16, "0") + RSet(Hex(PeekQ(*c + 8)), 16, "0")
but it slows down the speed by more than 2 times.
So well, hope someone finds this information useful. :roll:
Thorium
Addict
Addict
Posts: 1271
Joined: Sat Aug 15, 2009 6:59 pm

Re: little SSE example for aligned instructions

Post by Thorium »

Yes it is usefull.
Especially that unaligned memory accesses are 2 times slower than aligned accesses. So it makes a big difference.

I can not test it on my PC because i have a Core i7 and they optimized unaligned memory accesses on that CPU, in fact there is no difference between aligned and unaligned accesses. So i wrote my procedures without alignment. Well loosing 50% speed on other CPU's is big, i have to rewrite my procedures with alignment.
Post Reply