I've been having some unexpected memory access violations with InterlockedCompareExchange. I believe this is due to memory alignment, according to MSDN:
The parameters for this function must be aligned on a 32-bit boundary; otherwise, the function will behave unpredictably on multiprocessor x86 systems and any non-x86 systems. See _aligned_malloc.
I've been looking at pcfreak's code as a solution:
However, if I want to align a long on a 32-bit system, what do I specify for the alignment in bytes? Also, what would I specify for an integer on a 64-bit system? I don't follow exactly how alignment works.
Alignment means the memory address is devidable by a fixed number.
Alignment is for performance. A memory access is fastest if the address is alignt to a number equal to the datatyp size that will be accessed. So if you access a long memory should be alignt to 4 byte = 32bit.
You also can align code, which can speed up
loops. For code you should use the native register size for alignment.
Isn't already all memory that AllocateMemory returns aligned? We need to worry about alignment only when we deal with pointers math (for example: *mem+k in loop will not be aligned always). Here on 64bit OS everything is aligned to 64bit boundary so there should be no problem with that function that needs memory to be aligned to 32bit boundary.
EnableExplicit
DisableDebugger
Define k,count,*mem
For k=0 To 1000000
*mem=AllocateMemory(20+Random(20))
If *mem%8<>0 ;check if it is not aligned to 64bit boundary
count+1
EndIf
;FreeMemory(*mem) ;don't free memory!
Next
EnableDebugger
Debug count
Thorium wrote:The size of the datatype you want to access is what tells you the best alignment.
Would you give an example of how this would apply to a structure which uses several differently sized data types? I've never understood where padding should be applied and why.
Structure TestAlign
Blub.b
;Blub brackes the alignment as its only 1 byte big
Align1[3].b
;we just insert a byte array with 3 elements to get the 4 byte alignment for the next structure element
Bla.l
EndStructure
Structure Test
A.w
B.b
C.b ; <- no alignment needed?
..[3].b
D.l
..[4].b ; <- should be aligned for 8 bytes?
E.q
F.b
..[7].b ; <- should be aligned for 8 bytes?
G.q
EndStructure
Also, why is alignment so important for 64-bit compiling but didn't seem to be an issue with 32-bit?
Wikipedia also mentions:
It is important to note that the last member is padded with the number of bytes required that the total size of the structure should be a least common multiple of the size of a largest structure member.
Structure Test
A.w
B.b
C.b ; <- no alignment needed? correct
..[3].b ;this is not needed, it's allready aligned to 4 bytes. A2 + B1 + C1 = 4
D.l
..[4].b ; <- should be aligned for 8 bytes? Yes but thats only needed on x64 or if you use mmx registers on x86
E.q
F.b
..[7].b ; <- should be aligned for 8 bytes? correct, but same as E
G.q
EndStructure
Also, why is alignment so important for 64-bit compiling but didn't seem to be an issue with 32-bit?
It isnt important. It's just for performance and intel improved misaligned memory accesses a lot with the Core i7 and is still improving them, dont know how things are on AMD.
Mistrel wrote:
Wikipedia also mentions:
It is important to note that the last member is padded with the number of bytes required that the total size of the structure should be a least common multiple of the size of a largest structure member.
For my understanding, that text does not make any sense.
A structure doesnt need to have a size padding. However it could improve performance of structure copieing. But it is not important at all. Except if a API function or lib specificaly askes for a size padding or alignment. Code can be written in a way it will only work with aligned data. Thats only do gain some performance.
Mistrel wrote:
Is this more of a requirement of programming with Win32 on a 64-bit processor rather than a restriction imposed by the architecture itself?
There is no restriction on the CPU, there are even special instructions to load xmm registers with unaligned data. So even SSE2 dont needs a 16 byte alignment, but it's faster if it is alignt.
Thorium wrote:There is no restriction on the CPU, there are even special instructions to load xmm registers with unaligned data. So even SSE2 dont needs a 16 byte alignment, but it's faster if it is alignt.
movups permits unaligned data, but try addps after that, and you realize that you still need aligned datas.