
The problem with some SSE instructions is the need of aligned addresses. That means the addresses needs to by a multiple of a specific value. For SSE instructions that's (as far as I can see it) 16. So if you use those SSE instructions on memory addresses the address must by dividable by 16 or said in another way, the modulo of the memory address to 16 has to be 0. Well, enough theory here is an example.
Code: Select all
Procedure aligned_malloc(size.l, align.l)
*mem.INTEGER = AllocateMemory(size + align + SizeOf(INTEGER) - 1)
Debug *mem % 16
a.l = align - (*mem % align)
If a < SizeOf(INTEGER) : a + align : EndIf
PokeI(*mem + a - SizeOf(INTEGER), *mem)
ProcedureReturn *mem + a
EndProcedure
Procedure aligned_free(*mem)
FreeMemory(PeekI(*mem - SizeOf(INTEGER)))
EndProcedure
*a = aligned_malloc(16, 16)
*b = aligned_malloc(16, 16)
*c = aligned_malloc(16, 16)
PokeQ(*a, $0101010101010101)
PokeQ(*a+8, $0101010101010101)
PokeQ(*b, $1010101010101010)
PokeQ(*b+8, $1010101010101010)
!MOV eax, dword [p_a]
!MOV ebx, dword [p_b]
!MOV ecx, dword [p_c]
!MOVAPS xmm0, [eax]
!XORPS xmm0, [ebx]
!MOVAPS [ecx], xmm0
Debug RSet(Hex(PeekQ(*c)), 16, "0") + RSet(Hex(PeekQ(*c + 8)), 16, "0")
aligned_free(*a)
aligned_free(*b)
aligned_free(*c)
Code: Select all
*a = AllocateMemory(16)
*b = AllocateMemory(16)
*c = AllocateMemory(16)
PokeQ(*a, $0101010101010101)
PokeQ(*a+8, $0101010101010101)
PokeQ(*b, $1010101010101010)
PokeQ(*b+8, $1010101010101010)
!MOV eax, dword [p_a]
!MOV ebx, dword [p_b]
!MOV ecx, dword [p_c]
!MOVUPS xmm0, [eax]
!MOVUPS xmm1, [ebx]
!XORPS xmm0, xmm1
!MOVUPS [ecx], xmm0
Debug RSet(Hex(PeekQ(*c)), 16, "0") + RSet(Hex(PeekQ(*c + 8)), 16, "0")
So well, hope someone finds this information useful. :roll: