I understand using native 32 or 64 bits data (on x86 and x64) is usually more efficient at least for speed.
We have eax, ax, ah, al on 32bit and the same plus rax on 64 bits
What are the guidelines in using a certain register size instead of another ?
For example, using AH instead of AX sometimes can be beneficial if I have to manipulate one byte and it keeps AL available for something else.
Also if I'm not mistaken, instructions using rax are longer then instruction using eax.
And when writing code for x86 and x64, if you need just 32 bits could be simpler to just use eax in both cases instead of eax/rax.
For example, if I have to store something inside a .l variable (32 bits), does make sense to use rax to initially keep that value there when eax would be enough ?
In short, what are the criteria one has to keep in mind ?
Registers size choices question
Re: Registers size choices question
When you are assigning a value to ah, al or ax, part of the rax/eax register needs to be preserved and another part changed by the cpu.
When you are assigning a value to eax in 64 bit mode, the upper 32 bits of the rax register are set to zero so nothing needs to be preserved by the cpu. In this case it isn't faster to use rax.
If you are working with a 32 bits value, just use eax, even in 64 bit mode.
If you are working with 8 or 16 bits values and don't need the upper bits, use movzx or movsx with the eax register.
When you are assigning a value to eax in 64 bit mode, the upper 32 bits of the rax register are set to zero so nothing needs to be preserved by the cpu. In this case it isn't faster to use rax.
If you are working with a 32 bits value, just use eax, even in 64 bit mode.
If you are working with 8 or 16 bits values and don't need the upper bits, use movzx or movsx with the eax register.
Windows (x64)
Raspberry Pi OS (Arm64)
Raspberry Pi OS (Arm64)
Re: Registers size choices question
Thank you
Just to see if I have understood correctly:
Let's say I want to copy a single char from an address to another, is this ok:
movzx eax, byte [addr1] ; read one byte using movzx to clear the 24 upper bits
mov [addr2], al ; is it ok to use al here since it's 'read-only' ?
Or there is a better way ? I want to preserve what is there at addr2+1.
Just to see if I have understood correctly:
Yes, so it's not particularly efficient, is that what you mean ? Is it something tangible I should consider then ?wilbert wrote:When you are assigning a value to ah, al or ax, part of the rax/eax register needs to be preserved and another part changed by the cpu.
OK, so it's ok to use eax without penalties compared to rax.wilbert wrote: When you are assigning a value to eax in 64 bit mode, the upper 32 bits of the rax register are set to zero so nothing needs to be preserved by the cpu. In this case it isn't faster to use rax.
OKwilbert wrote: If you are working with a 32 bits value, just use eax, even in 64 bit mode.
So are you saying this is preferable to using the 8/16 bits registers for the reason explained in the first line of your answer ?wilbert wrote: If you are working with 8 or 16 bits values and don't need the upper bits, use movzx or movsx with the eax register.
Let's say I want to copy a single char from an address to another, is this ok:
movzx eax, byte [addr1] ; read one byte using movzx to clear the 24 upper bits
mov [addr2], al ; is it ok to use al here since it's 'read-only' ?
Or there is a better way ? I want to preserve what is there at addr2+1.
Re: Registers size choices question
It all depends on how important the speed it.
Here's a test you can run on PB x64
On my computer, the movzx version (first result) is slightly faster but it's not a very big difference.
When using AH however, there is a major speed impact (third result).
Here's a test you can run on PB x64
Code: Select all
DisableDebugger
*mem = AllocateMemory($200000)
t1 = ElapsedMilliseconds()
For i = 1 To 1000
!mov rdx, [p_mem]
!mov ecx, 0x100000
!loop1:
!movzx eax, byte [rdx]
!mov [rdx + 0x100000], al
!add rdx, 1
!sub ecx, 1
!jnz loop1
Next
t2 = ElapsedMilliseconds()
For i = 1 To 1000
!mov rdx, [p_mem]
!mov ecx, 0x100000
!loop2:
!mov al, [rdx]
!mov [rdx + 0x100000], al
!add rdx, 1
!sub ecx, 1
!jnz loop2
Next
t3 = ElapsedMilliseconds()
For i = 1 To 1000
!mov rdx, [p_mem]
!mov ecx, 0x100000
!loop3:
!mov ah, [rdx]
!mov [rdx + 0x100000], ah
!add rdx, 1
!sub ecx, 1
!jnz loop3
Next
t4 = ElapsedMilliseconds()
MessageRequester("results", Str(t2-t1) + " vs " + Str(t3-t2) + " vs " + Str(t4-t3))
When using AH however, there is a major speed impact (third result).
Windows (x64)
Raspberry Pi OS (Arm64)
Raspberry Pi OS (Arm64)
Re: Registers size choices question
Thanks for the example.
On my Intel Core i7 all the loops are executed at the same speed, sometimes one is a little slower and next time it's a little faster (probably just ElapsedMilliseconds() wandering), tried multiple times even after shuffling the code around.
What CPU do you have ? How much slower was the third loop ?
In any case I'll try to use the mov with extension when possible, maybe it's better for some CPUs.
On my Intel Core i7 all the loops are executed at the same speed, sometimes one is a little slower and next time it's a little faster (probably just ElapsedMilliseconds() wandering), tried multiple times even after shuffling the code around.
What CPU do you have ? How much slower was the third loop ?
In any case I'll try to use the mov with extension when possible, maybe it's better for some CPUs.
Re: Registers size choices question
Intel(R) Core(TM) i5-4570R CPU @ 2.70GHz
405 vs 489 vs 670
405 vs 489 vs 670
Windows (x64)
Raspberry Pi OS (Arm64)
Raspberry Pi OS (Arm64)
-
- Enthusiast
- Posts: 108
- Joined: Wed May 13, 2009 8:38 am
- Location: Arizona, USA
Re: Registers size choices question
Thank You, I'm learning alot from your examples
Intel(R) Core(TM)i7-6700K CPU 4.00GHz
259 vs 287 vs 257
Intel(R) Core(TM)i7-6700K CPU 4.00GHz
259 vs 287 vs 257