Registers size choices question

sys64802 · Post by **sys64802** » Mon Dec 28, 2015 5:06 pm

I understand using native 32 or 64 bits data (on x86 and x64) is usually more efficient at least for speed.

We have eax, ax, ah, al on 32bit and the same plus rax on 64 bits

What are the guidelines in using a certain register size instead of another ?

For example, using AH instead of AX sometimes can be beneficial if I have to manipulate one byte and it keeps AL available for something else.

Also if I'm not mistaken, instructions using rax are longer then instruction using eax.

And when writing code for x86 and x64, if you need just 32 bits could be simpler to just use eax in both cases instead of eax/rax.

For example, if I have to store something inside a .l variable (32 bits), does make sense to use rax to initially keep that value there when eax would be enough ?

In short, what are the criteria one has to keep in mind ?

wilbert · Post by **wilbert** » Mon Dec 28, 2015 6:26 pm

When you are assigning a value to ah, al or ax, part of the rax/eax register needs to be preserved and another part changed by the cpu.
When you are assigning a value to eax in 64 bit mode, the upper 32 bits of the rax register are set to zero so nothing needs to be preserved by the cpu. In this case it isn't faster to use rax.
If you are working with a 32 bits value, just use eax, even in 64 bit mode.
If you are working with 8 or 16 bits values and don't need the upper bits, use movzx or movsx with the eax register.

sys64802 · Post by **sys64802** » Mon Dec 28, 2015 8:42 pm

Thank you

Just to see if I have understood correctly:

wilbert wrote:When you are assigning a value to ah, al or ax, part of the rax/eax register needs to be preserved and another part changed by the cpu.

Yes, so it's not particularly efficient, is that what you mean ? Is it something tangible I should consider then ?

wilbert wrote: When you are assigning a value to eax in 64 bit mode, the upper 32 bits of the rax register are set to zero so nothing needs to be preserved by the cpu. In this case it isn't faster to use rax.

OK, so it's ok to use eax without penalties compared to rax.

wilbert wrote: If you are working with a 32 bits value, just use eax, even in 64 bit mode.

OK

wilbert wrote: If you are working with 8 or 16 bits values and don't need the upper bits, use movzx or movsx with the eax register.

So are you saying this is preferable to using the 8/16 bits registers for the reason explained in the first line of your answer ?

Let's say I want to copy a single char from an address to another, is this ok:

movzx eax, byte [addr1] ; read one byte using movzx to clear the 24 upper bits

mov [addr2], al ; is it ok to use al here since it's 'read-only' ?

Or there is a better way ? I want to preserve what is there at addr2+1.

wilbert · Post by **wilbert** » Tue Dec 29, 2015 8:31 am

It all depends on how important the speed it.

Here's a test you can run on PB x64

Code: Select all

DisableDebugger

*mem = AllocateMemory($200000)

t1 = ElapsedMilliseconds()
For i = 1 To 1000
  !mov rdx, [p_mem]
  !mov ecx, 0x100000
  !loop1:
  !movzx eax, byte [rdx]
  !mov [rdx + 0x100000], al
  !add rdx, 1
  !sub ecx, 1
  !jnz loop1
Next
t2 = ElapsedMilliseconds()
For i = 1 To 1000
  !mov rdx, [p_mem]
  !mov ecx, 0x100000
  !loop2:
  !mov al, [rdx]
  !mov [rdx + 0x100000], al
  !add rdx, 1
  !sub ecx, 1
  !jnz loop2
Next
t3 = ElapsedMilliseconds()
For i = 1 To 1000
  !mov rdx, [p_mem]
  !mov ecx, 0x100000
  !loop3:
  !mov ah, [rdx]
  !mov [rdx + 0x100000], ah
  !add rdx, 1
  !sub ecx, 1
  !jnz loop3
Next
t4 = ElapsedMilliseconds()

MessageRequester("results", Str(t2-t1) + " vs " + Str(t3-t2) + " vs " + Str(t4-t3))

On my computer, the movzx version (first result) is slightly faster but it's not a very big difference.
When using AH however, there is a major speed impact (third result).

sys64802 · Post by **sys64802** » Tue Dec 29, 2015 2:46 pm

Thanks for the example.

On my Intel Core i7 all the loops are executed at the same speed, sometimes one is a little slower and next time it's a little faster (probably just ElapsedMilliseconds() wandering), tried multiple times even after shuffling the code around.

What CPU do you have ? How much slower was the third loop ?

In any case I'll try to use the mov with extension when possible, maybe it's better for some CPUs.

wilbert · Post by **wilbert** » Tue Dec 29, 2015 3:28 pm

Intel(R) Core(TM) i5-4570R CPU @ 2.70GHz
405 vs 489 vs 670

GoodNPlenty · Post by **GoodNPlenty** » Wed Dec 30, 2015 8:54 am

Thank You, I'm learning alot from your examples
Intel(R) Core(TM)i7-6700K CPU 4.00GHz
259 vs 287 vs 257

PureBasic Forums - English

Registers size choices question

Registers size choices question

Re: Registers size choices question

Re: Registers size choices question

Re: Registers size choices question

Re: Registers size choices question

Re: Registers size choices question

Re: Registers size choices question