Registers size choices question

Bare metal programming in PureBasic, for experienced users
sys64802
Enthusiast
Enthusiast
Posts: 105
Joined: Sat Sep 12, 2015 6:55 pm

Registers size choices question

Post by sys64802 »

I understand using native 32 or 64 bits data (on x86 and x64) is usually more efficient at least for speed.

We have eax, ax, ah, al on 32bit and the same plus rax on 64 bits

What are the guidelines in using a certain register size instead of another ?

For example, using AH instead of AX sometimes can be beneficial if I have to manipulate one byte and it keeps AL available for something else.

Also if I'm not mistaken, instructions using rax are longer then instruction using eax.

And when writing code for x86 and x64, if you need just 32 bits could be simpler to just use eax in both cases instead of eax/rax.

For example, if I have to store something inside a .l variable (32 bits), does make sense to use rax to initially keep that value there when eax would be enough ?

In short, what are the criteria one has to keep in mind ?
wilbert
PureBasic Expert
PureBasic Expert
Posts: 3870
Joined: Sun Aug 08, 2004 5:21 am
Location: Netherlands

Re: Registers size choices question

Post by wilbert »

When you are assigning a value to ah, al or ax, part of the rax/eax register needs to be preserved and another part changed by the cpu.
When you are assigning a value to eax in 64 bit mode, the upper 32 bits of the rax register are set to zero so nothing needs to be preserved by the cpu. In this case it isn't faster to use rax.
If you are working with a 32 bits value, just use eax, even in 64 bit mode.
If you are working with 8 or 16 bits values and don't need the upper bits, use movzx or movsx with the eax register.
Windows (x64)
Raspberry Pi OS (Arm64)
sys64802
Enthusiast
Enthusiast
Posts: 105
Joined: Sat Sep 12, 2015 6:55 pm

Re: Registers size choices question

Post by sys64802 »

Thank you

Just to see if I have understood correctly:
wilbert wrote:When you are assigning a value to ah, al or ax, part of the rax/eax register needs to be preserved and another part changed by the cpu.
Yes, so it's not particularly efficient, is that what you mean ? Is it something tangible I should consider then ?
wilbert wrote: When you are assigning a value to eax in 64 bit mode, the upper 32 bits of the rax register are set to zero so nothing needs to be preserved by the cpu. In this case it isn't faster to use rax.
OK, so it's ok to use eax without penalties compared to rax.
wilbert wrote: If you are working with a 32 bits value, just use eax, even in 64 bit mode.
OK
wilbert wrote: If you are working with 8 or 16 bits values and don't need the upper bits, use movzx or movsx with the eax register.
So are you saying this is preferable to using the 8/16 bits registers for the reason explained in the first line of your answer ?


Let's say I want to copy a single char from an address to another, is this ok:

movzx eax, byte [addr1] ; read one byte using movzx to clear the 24 upper bits

mov [addr2], al ; is it ok to use al here since it's 'read-only' ?

Or there is a better way ? I want to preserve what is there at addr2+1.
wilbert
PureBasic Expert
PureBasic Expert
Posts: 3870
Joined: Sun Aug 08, 2004 5:21 am
Location: Netherlands

Re: Registers size choices question

Post by wilbert »

It all depends on how important the speed it.

Here's a test you can run on PB x64

Code: Select all

DisableDebugger

*mem = AllocateMemory($200000)

t1 = ElapsedMilliseconds()
For i = 1 To 1000
  !mov rdx, [p_mem]
  !mov ecx, 0x100000
  !loop1:
  !movzx eax, byte [rdx]
  !mov [rdx + 0x100000], al
  !add rdx, 1
  !sub ecx, 1
  !jnz loop1
Next
t2 = ElapsedMilliseconds()
For i = 1 To 1000
  !mov rdx, [p_mem]
  !mov ecx, 0x100000
  !loop2:
  !mov al, [rdx]
  !mov [rdx + 0x100000], al
  !add rdx, 1
  !sub ecx, 1
  !jnz loop2
Next
t3 = ElapsedMilliseconds()
For i = 1 To 1000
  !mov rdx, [p_mem]
  !mov ecx, 0x100000
  !loop3:
  !mov ah, [rdx]
  !mov [rdx + 0x100000], ah
  !add rdx, 1
  !sub ecx, 1
  !jnz loop3
Next
t4 = ElapsedMilliseconds()

MessageRequester("results", Str(t2-t1) + " vs " + Str(t3-t2) + " vs " + Str(t4-t3))
On my computer, the movzx version (first result) is slightly faster but it's not a very big difference.
When using AH however, there is a major speed impact (third result).
Windows (x64)
Raspberry Pi OS (Arm64)
sys64802
Enthusiast
Enthusiast
Posts: 105
Joined: Sat Sep 12, 2015 6:55 pm

Re: Registers size choices question

Post by sys64802 »

Thanks for the example.

On my Intel Core i7 all the loops are executed at the same speed, sometimes one is a little slower and next time it's a little faster (probably just ElapsedMilliseconds() wandering), tried multiple times even after shuffling the code around.

What CPU do you have ? How much slower was the third loop ?

In any case I'll try to use the mov with extension when possible, maybe it's better for some CPUs.
wilbert
PureBasic Expert
PureBasic Expert
Posts: 3870
Joined: Sun Aug 08, 2004 5:21 am
Location: Netherlands

Re: Registers size choices question

Post by wilbert »

Intel(R) Core(TM) i5-4570R CPU @ 2.70GHz
405 vs 489 vs 670
Windows (x64)
Raspberry Pi OS (Arm64)
GoodNPlenty
Enthusiast
Enthusiast
Posts: 107
Joined: Wed May 13, 2009 8:38 am
Location: Arizona, USA

Re: Registers size choices question

Post by GoodNPlenty »

Thank You, I'm learning alot from your examples
Intel(R) Core(TM)i7-6700K CPU 4.00GHz
259 vs 287 vs 257
Post Reply