wilbert wrote:
The FPU can handle three different types of float.
32 bit (single precision)
64 bit (double precision)
80 bit (extended precision http://en.wikipedia.org/wiki/Extended_precision )I know that. And you are referring to the x87 FPU, which does not exist any more.
Also if you look around some more on wikipedia you will see that 80bit float is deprecated in usermode on Windows now,
and 80bit float is not allowed at all in the Windows kernel.
wilbert wrote:
Internally it always uses 80 bit. When 32 or 64 bit are used, it converts them to 80 bit when the value is loaded into a FPU register or stored from such a register.
Also, just because a x87 FPU support 80bit float does not mean you get to use 80bit, by default a x87 FPU is set to 80bit.
but Windows (before they deprecated it etc) set it to 64bit instead, effectively "truncating" 80bit floats to 64bit floats.
I don't feel like diving into the huge x64 manuals from intel and amd, but I wouldn't be surprised to find that 80bit precision is "not" guaranteed, only 32bit float or 64bit float.
wilbert wrote:
PureBasic only supports 32 and 64 bit variables. The FPU is capable of loading and storing 80 bit also but PB has no support for extended precision variables.
Capable of handling 80bit float does not mean it actually performs 80bit float math. Also 80bit floating point is not really standardized at all like 32bit and 64bit IEEE floats are.
128bit floats are much more interesting in that regard.
wilbert wrote:
If you compile using /commented and look at the generated ASM code, you will see it uses the exact same opcodes to handle the floats on x86 and on x64.
Um. *points* >>
Rescator wrote:
I'm just curious about the speed difference here. and I checked the ASM source and the exact same instructions are used for doubles on x64 and x86 so it's not the reason for the speed dif.
wilbert wrote:
There is no such thing as a x64 float. The difference between x86 and x64 is that x64 handles quad variables natively and those are simulated when using x86.
Typo: x64 float=64bit float
Also when you look at the generated code, you will see the opcodes to handle the loop are different on x86 and x64.
Therefore speed differences between the two modes are most likely due to the code that handles the loop itself and not due to the FPU opcodes.[/quote]
Really?
Code:
; For i=1 To l
MOV qword [v_i],1
_For1:
MOV rax,qword [v_l]
CMP rax,qword [v_i]
JL _Next2
; s3=Test(s1,s2)
FLD dword [v_s1]
FMUL dword [v_s2]
FSTP dword [v_s3]
; Next
_NextContinue2:
INC qword [v_i]
JNO _For1
_Next2:
vs
Code:
; For i=1 To l
MOV dword [v_i],1
_For1:
MOV eax,dword [v_l]
CMP eax,dword [v_i]
JL _Next2
; s3=Test(s1,s2)
FLD dword [v_s1]
FMUL dword [v_s2]
FSTP dword [v_s3]
; Next
_NextContinue2:
INC dword [v_i]
JNO _For1
_Next2:
Looks pretty damn similar if you ask me other than the obvious parts like eax and rax and the dword and qword for the integer.
I didn't do any empty loop test but I'm pretty sure that these tiny differences in are so miniscule that any overhead/bias will hardly impact the results, if at all.
But I'll do such a test anyway later today for my own sake, feel free to do the same.
wilbert wrote:
The other way of handling floats is using SSE2.
SSE2 can use single or double precision but has no support for extended precision.
The advantage of SSE2 is that you can do parallel calculations up to 4 single precision values at once improving speed but SSE2 is the same on x86 and x64.
*points* >>
Rescator wrote:
it might look like "x87" float is used it could instead be SSE2
and yeah I've seen and read the same articles you have.
And I've read enough about the x87 instruction set and the 80bit float to know that I'll stay the hell away from it, NTSC has been nicknamed Never Twice The same Color,
80bit float could easily be nicknamed never twice the same number, as that is what many have discovered with JIT compiling as the conversion to/from 80bit float from 32bit float or 64bit float is sometimes done differently, sometimes truncated(?) sometimes not, and that two identical lines of code executed at different times end up with slightly different numbers, on.the.same.machine... O.o wow!
As I said I have not dived into the intel and amd CPU architecture books yet, but I would not be surprised if the reading about 80bit float is rather dismal there too.
Quote:
Section 12.1.3 (Virtual Execution System: Supported data types: Handling of floating-point data types) an implementation is free to use an internal representation available on a machine provided that there’s at least 32 (single) or 64 (double) bits.
Also many compilers compile exe's that on "32-bit is using the 80-bit FPU registers, 64-bit is using the 128-bit SSE registers. Neither is using SIMD instructions despite this being obviously parallelizable".
Also the ironic fact that all x64 CPU's have SSE and SSE2, and those from 2005 and later also has SSE3. I'm hoping PureBasic x64 will take advantage of the x64 features that are guaranteed to be there, like SSE2 etc in PureBasic x64 commands.
Trond wrote:
You have a big problem: When the debugger is enabled in the menu the peephole optimizer of the compiler is turned off, even if you have DisableDebugger in the source. The debugger needs to be off globally for a speed test.
Ah crap, right, forgot about that quirk, thanks. But not much I can do about that really, DisableDebugger is the closest you can get in a source posting. (there's no way to copy'n'paste config lines along with the code and have that work in the IDE is there?)