little SSE example for aligned instructions
Posted: Tue Feb 16, 2010 8:09 pm
As this is about assembler not everyone might find it useful, anyway. I was a bit interested laterly what those MMX and SSE stuff is about. As MMX wasn't any real problem (just some other commands on other registers and the need to reset the FPU state by the EMMS command) I hit some problems with SSE commands. So before someone else stucks at the same problem, the alignment needed for some commands, here the solution.
The problem with some SSE instructions is the need of aligned addresses. That means the addresses needs to by a multiple of a specific value. For SSE instructions that's (as far as I can see it) 16. So if you use those SSE instructions on memory addresses the address must by dividable by 16 or said in another way, the modulo of the memory address to 16 has to be 0. Well, enough theory here is an example.Because we don't know which address AllocateMemory returns we reserve enough space to set the address we need. As we are already allocating some memory we put in some extra space to save the original pointer to the memory block. If we don't do this we have problems to free the space later. In some environments like VC you have already an AllocateMemory function (with a different name of course) that can return aligned addresses. The example uses the SSE register xmm0 to store the value from *a and xor it with the value of *b. The result is stored in *c. You could have done the same with an unaligned address e.g.
but it slows down the speed by more than 2 times.
So well, hope someone finds this information useful. :roll:

The problem with some SSE instructions is the need of aligned addresses. That means the addresses needs to by a multiple of a specific value. For SSE instructions that's (as far as I can see it) 16. So if you use those SSE instructions on memory addresses the address must by dividable by 16 or said in another way, the modulo of the memory address to 16 has to be 0. Well, enough theory here is an example.
Code: Select all
Procedure aligned_malloc(size.l, align.l)
*mem.INTEGER = AllocateMemory(size + align + SizeOf(INTEGER) - 1)
Debug *mem % 16
a.l = align - (*mem % align)
If a < SizeOf(INTEGER) : a + align : EndIf
PokeI(*mem + a - SizeOf(INTEGER), *mem)
ProcedureReturn *mem + a
EndProcedure
Procedure aligned_free(*mem)
FreeMemory(PeekI(*mem - SizeOf(INTEGER)))
EndProcedure
*a = aligned_malloc(16, 16)
*b = aligned_malloc(16, 16)
*c = aligned_malloc(16, 16)
PokeQ(*a, $0101010101010101)
PokeQ(*a+8, $0101010101010101)
PokeQ(*b, $1010101010101010)
PokeQ(*b+8, $1010101010101010)
!MOV eax, dword [p_a]
!MOV ebx, dword [p_b]
!MOV ecx, dword [p_c]
!MOVAPS xmm0, [eax]
!XORPS xmm0, [ebx]
!MOVAPS [ecx], xmm0
Debug RSet(Hex(PeekQ(*c)), 16, "0") + RSet(Hex(PeekQ(*c + 8)), 16, "0")
aligned_free(*a)
aligned_free(*b)
aligned_free(*c)
Code: Select all
*a = AllocateMemory(16)
*b = AllocateMemory(16)
*c = AllocateMemory(16)
PokeQ(*a, $0101010101010101)
PokeQ(*a+8, $0101010101010101)
PokeQ(*b, $1010101010101010)
PokeQ(*b+8, $1010101010101010)
!MOV eax, dword [p_a]
!MOV ebx, dword [p_b]
!MOV ecx, dword [p_c]
!MOVUPS xmm0, [eax]
!MOVUPS xmm1, [ebx]
!XORPS xmm0, xmm1
!MOVUPS [ecx], xmm0
Debug RSet(Hex(PeekQ(*c)), 16, "0") + RSet(Hex(PeekQ(*c + 8)), 16, "0")
So well, hope someone finds this information useful. :roll: