Keya wrote:hrm I see, thankyou. However the actual number is 2.1 or 2.1000000000, not 2.0999999046, and i want to display full precision but i don't want to display any trailing zero's, so I was assuming StrF() without specifying decimal places would achieve this but seemingly doesn't...
It was requested:
http://www.purebasic.fr/english/viewtop ... =3&t=53614
I agree with you and I don't like very much how strf() and strd() work.
When I want to print a float I use something like this (really it's the same way proposed by BasicallyPure)
Code: Select all
Procedure.s StrFa(x.f) ; StrF auto
s$ = RTrim(StrF(x, 38), "0") ; 38 decimals digits top
If Right(s$,1) = "."
s$ + "0"
EndIf
ProcedureReturn s$
EndProcedure
Debug StrF(1.0 + 1.1)
Debug StrF(0.0)
Debug StrF(-3.1415)
Debug ""
Debug StrFa(1.0 + 1.1)
Debug StrFa(0.0)
Debug StrFa(-3.1415)
I prefer 0.0 for the output instead of 0, but it's just a preference.
For fun I've put together a small PB program which
1) takes a float stored in memory
2) prints its binary representation
3) show how to decompose sign, exponent and mantissa (or significand) to obtain the normalized floating point number in binary
4) from there get the denormalized number
5) convert the binary integer and fractional part to decimal and sum them to get the final number
6) print the fractional part as the components fractions having base 2 denominators, to better understand why not any number can be successfully represented in floating point and you can only approximate some of them (most of them actually)
You can try it with other numbers but keep them small enough, the program is simple and has limited formatting capabilities.
Floating-point binary numbers are stored in normalized form to maximize the precision of the significand.
To normalize a floating point binary number, you shift the binary point until a single 1 appears to the left of the binary point.
The 1 in the 1.xxx...xxx is omitted to save 1 bit of space, so it's implicitly there in a normalized number.
The exponent indicates the number of positions the binary point is moved to the left (positive exponent) or to the right (negative exponent).
To denormalize a binary floating point number just shift the binary point until the exponent is zero.
If the exponent is +n, you shift the binary point n positions to the right, if the exponent is -n, you shift the binary point n positions to the left filling with zero if required.
The exponent is biased, exponents are stored as 8 bit unsigned integers with a bias of 127.
This means the actual exponent must be added to 127 before storing it.
The biased exponent is always positive, between 1 and 254, as a consequence the actual exponent ranges from -126 to +127.
Code: Select all
; to be readable, the debug window must have a fixed pitch font like Courier, no variable pitch
Procedure.s StripTrailingZeros(s$)
s$ = RTrim(s$, "0") ; 38 decimal digits top
If Right(s$,1) = "."
s$ + "0"
EndIf
If Len(s$) = 0
s$ = "0"
endif
ProcedureReturn s$
EndProcedure
Procedure.s StrFa(x.f)
s$ = StrF(x, 38) ; 38 decimal digits top
s$ = StripTrailingZeros(s$)
ProcedureReturn s$
EndProcedure
Procedure.s GetFloatBinaryStr(x.f)
Protected float$
Protected b1, b2, b3, b4
; it's little endian, so the most significant byte is at the end, let's flip them to build the string
b1 = PeekB(@x+3)
b2 = PeekB(@x+2)
b3 = PeekB(@x+1)
b4 = PeekB(@x)
; build a string for the 32 bit binary representation of the passed float
float$ = RSet(Bin(b1,#PB_Byte),8,"0") + RSet(Bin(b2,#PB_Byte),8,"0") + RSet(Bin(b3,#PB_Byte),8,"0") + RSet(Bin(b4,#PB_Byte),8,"0")
ProcedureReturn float$
EndProcedure
Procedure.s GetFloatSignStr(x$)
ProcedureReturn Left(x$, 1)
EndProcedure
Procedure.s GetFloatExponentStr(x$)
ProcedureReturn Mid(x$, 2, 8)
EndProcedure
Procedure.s GetFloatMantissaStr(x$)
ProcedureReturn Right(x$, 23)
EndProcedure
Procedure.i GetFloatExponentBiased(e$)
ProcedureReturn Val("%" + e$)
EndProcedure
Procedure.i GetFloatExponentUnbiased(e$)
ProcedureReturn GetFloatExponentBiased(e$) - 127
EndProcedure
Procedure.s BuildNormalizedFloatStr(s$, e$, m$)
Protected x$
If s$ = "1"
x$ = "-1."
Else
x$ = "+1."
EndIf
x$ + StripTrailingZeros(m$)
x$ + " x 2^" + Str(GetFloatExponentUnbiased(e$))
ProcedureReturn x$
EndProcedure
Procedure.s GetDenormalizedFromNormalizedStr(nfs$)
Protected dfs$, sign$, exp$, l$, r$
Protected x, exp, dot
sign$ = Left(nfs$,1) ; save sign
nfs$ = Mid(nfs$,2) ; remove sign
x = FindString(nfs$,"x")
exp$ = Mid(nfs$, x+4) ; save exponent
exp = Val(exp$)
nfs$ = Left(nfs$, x-2) ; remove exponent
dot = FindString(nfs$, ".")
l$ = Left(nfs$, dot-1)
r$ = Mid(nfs$, dot+1)
If exp >= 0
r$ + LSet("",exp,"0")
r$ = InsertString(r$,".", exp+1)
r$ = StripTrailingZeros(r$)
dfs$ = sign$ + l$ + r$
Else
l$ = LSet("",-exp,"0") + l$
l$ = InsertString(l$,".", Len(l$)+exp+1)
l$ = StripTrailingZeros(l$)
dfs$ = sign$ + l$ + r$
EndIf
ProcedureReturn dfs$
EndProcedure
Procedure SplitDenormalized(dfs$, *int_part.String, *frac_part.String)
Protected dot
dot = FindString(dfs$, ".")
*int_part\s = Mid(dfs$, 2, dot-2)
*frac_part\s = Mid(dfs$, dot+1)
EndProcedure
Procedure DumpFloat(x.f)
Debug "Stored float value = " + StrFa(x)
fs$ = GetFloatBinaryStr(x)
Debug "Binary float representation = " + fs$
Debug ""
; split the binary float representation into its components
s$ = GetFloatSignStr(fs$) ; sign (1 bit)
e$ = GetFloatExponentStr(fs$) ; exponent (8 bits)
m$ = GetFloatMantissaStr(fs$) ; mantissa (23 bits)
; show
Debug "S ESP MANTISSA"
Debug s$ + " " + e$ + " " + m$
Debug ""
; let's check the sign
If s$ = "1"
sign$ = "-"
Debug "Number is negative"
Else
sign$ = "+"
Debug "Number is positive"
EndIf
Debug "Biased exponent is " + GetFloatExponentBiased(e$) ; exponent with bias
Debug "Unbiased exponent is " + GetFloatExponentUnbiased(e$) ; exponent without bias
Debug ""
; let's combine the 3 components into a normalized binary float string
nfs$ = BuildNormalizedFloatStr(s$, e$, m$)
; show
Debug "Normalized float in binary is " + nfs$
Debug ""
; transform the normalized float in a denormilized one
dfs$ = GetDenormalizedFromNormalizedStr(nfs$)
; show
Debug "Denormalized float in binary is " + dfs$
Debug ""
SplitDenormalized(dfs$, @int_part.String, @frac_part.String)
ip = Val("%"+int_part\s) ; integer part
fpn = Val("%"+frac_part\s) ; fractional part numerator
fpd = Pow(2, Len(frac_part\s)) ; fractional part denominator (2 ^ number of bits)
Debug "Integer part in binary = " + int_part\s
Debug "Integer part in decimal = " + ip + ".0"
Debug ""
Debug "Fractional part in binary = " + frac_part\s
Debug "Fractional part in decimal = " + fpn + "/" + fpd + " = " + StrFa(fpn / fpd)
Debug ""
Debug "Result = " + ip + ".0 + " + StrFa(fpn / fpd) + " = " + sign$ + StrFa(ip + fpn / fpd)
Debug ""
Debug "Breakout of the denormalized fractional part bit per bit :"
Debug ""
fract_total.f = 0.0
tab = 30
For i = 1 To Len(frac_part\s)
bit = Val(Mid(frac_part\s , i, 1))
bitval$ = "Bit " + RSet(Str(i),2,"0") + " -> "
bitval$ + Str(bit) + " = " + Str(bit) + "/" + Str(Pow(2,i)) ; bit to fractional (power of 2)
If bit
fract.f = 1 / Pow(2,i)
fract_total + fract
bitval$ = Left(bitval$ + Space(tab), tab) + StrFa(fract)
EndIf
Debug bitval$
Next
Debug "--------------------------------------------------------"
Debug Left("Total " + Space(tab), tab) + StrFa(fract_total)
Debug ""
EndProcedure
DumpFloat(10.75) ; this CAN be represented perfectly
DumpFloat(-10.75) ; this CAN be represented perfectly
DumpFloat(0.25) ; this CAN be represented perfectly
DumpFloat(3.14159265) ; this CANNOT be represented perfectly
DumpFloat(65000.14159265) ; this CANNOT be represented perfectly and shows how a big integer part reduce precision for the float part (less bits available)
DumpFloat(1.0) ; this CAN be represented perfectly
DumpFloat(1.1) ; this CANNOT be represented perfectly
For example: 10.75
Code: Select all
[20:00:12] Stored float value = 10.75
[20:00:12] Binary float representation = 01000001001011000000000000000000
[20:00:12]
[20:00:12] S ESP MANTISSA
[20:00:12] 0 10000010 01011000000000000000000
[20:00:12]
[20:00:12] Number is positive
[20:00:12] Biased exponent is 130
[20:00:12] Unbiased exponent is 3
[20:00:12]
[20:00:12] Normalized float in binary is +1.01011 x 2^3
[20:00:12]
[20:00:12] Denormalized float in binary is +1010.11
[20:00:12]
[20:00:12] Integer part in binary = 1010
[20:00:12] Integer part in decimal = 10.0
[20:00:12]
[20:00:12] Fractional part in binary = 11
[20:00:12] Fractional part in decimal = 3/4 = 0.75
[20:00:12]
[20:00:12] Result = 10.0 + 0.75 = +10.75
[20:00:12]
[20:00:12] Breakout of the denormalized fractional part bit per bit :
[20:00:12]
[20:00:12] Bit 01 -> 1 = 1/2 0.5
[20:00:12] Bit 02 -> 1 = 1/4 0.25
[20:00:12] --------------------------------------------------------
[20:00:12] Total 0.75
As you see denormalized is +1010.11
The integer part is 1010 or 10 (ten), the fractional part is 11 which must be decoded as: 1/2 + 1/4
10 + 1/2 + 1/4 = 10.75
Neat uh ?
For a more complicate number like PI (3.14159265)
Code: Select all
[20:03:34] Stored float value = 3.1415927410125732
[20:03:34] Binary float representation = 01000000010010010000111111011011
[20:03:34]
[20:03:34] S ESP MANTISSA
[20:03:34] 0 10000000 10010010000111111011011
[20:03:34]
[20:03:34] Number is positive
[20:03:34] Biased exponent is 128
[20:03:34] Unbiased exponent is 1
[20:03:34]
[20:03:34] Normalized float in binary is +1.10010010000111111011011 x 2^1
[20:03:34]
[20:03:34] Denormalized float in binary is +11.0010010000111111011011
[20:03:34]
[20:03:34] Integer part in binary = 11
[20:03:34] Integer part in decimal = 3.0
[20:03:34]
[20:03:34] Fractional part in binary = 0010010000111111011011
[20:03:34] Fractional part in decimal = 593883/4194304 = 0.14159274101257324
[20:03:34]
[20:03:34] Result = 3.0 + 0.14159274101257324 = +3.1415927410125732
[20:03:34]
[20:03:34] Breakout of the denormalized fractional part bit per bit :
[20:03:34]
[20:03:34] Bit 01 -> 0 = 0/2
[20:03:34] Bit 02 -> 0 = 0/4
[20:03:34] Bit 03 -> 1 = 1/8 0.125
[20:03:34] Bit 04 -> 0 = 0/16
[20:03:34] Bit 05 -> 0 = 0/32
[20:03:34] Bit 06 -> 1 = 1/64 0.015625
[20:03:34] Bit 07 -> 0 = 0/128
[20:03:34] Bit 08 -> 0 = 0/256
[20:03:34] Bit 09 -> 0 = 0/512
[20:03:34] Bit 10 -> 0 = 0/1024
[20:03:34] Bit 11 -> 1 = 1/2048 0.00048828125
[20:03:34] Bit 12 -> 1 = 1/4096 0.000244140625
[20:03:34] Bit 13 -> 1 = 1/8192 0.0001220703125
[20:03:34] Bit 14 -> 1 = 1/16384 0.00006103515625
[20:03:34] Bit 15 -> 1 = 1/32768 0.000030517578125
[20:03:34] Bit 16 -> 1 = 1/65536 0.0000152587890625
[20:03:34] Bit 17 -> 0 = 0/131072
[20:03:34] Bit 18 -> 1 = 1/262144 0.000003814697265625
[20:03:34] Bit 19 -> 1 = 1/524288 0.0000019073486328125
[20:03:34] Bit 20 -> 0 = 0/1048576
[20:03:34] Bit 21 -> 1 = 1/2097152 0.000000476837158203125
[20:03:34] Bit 22 -> 1 = 1/4194304 0.0000002384185791015625
[20:03:34] --------------------------------------------------------
[20:03:34] Total 0.14159274101257324
Again see how the fractional part is expressed just like a sum of fractions ? And you get near but you can't get the original 3.14159265.
Another : 65000.14159265
Code: Select all
[20:05:38] Stored float value = 65000.140625
[20:05:38] Binary float representation = 01000111011111011110100000100100
[20:05:38]
[20:05:38] S ESP MANTISSA
[20:05:38] 0 10001110 11111011110100000100100
[20:05:38]
[20:05:38] Number is positive
[20:05:38] Biased exponent is 142
[20:05:38] Unbiased exponent is 15
[20:05:38]
[20:05:38] Normalized float in binary is +1.111110111101000001001 x 2^15
[20:05:38]
[20:05:38] Denormalized float in binary is +1111110111101000.001001
[20:05:38]
[20:05:38] Integer part in binary = 1111110111101000
[20:05:38] Integer part in decimal = 65000.0
[20:05:38]
[20:05:38] Fractional part in binary = 001001
[20:05:38] Fractional part in decimal = 9/64 = 0.140625
[20:05:38]
[20:05:38] Result = 65000.0 + 0.140625 = +65000.140625
[20:05:38]
[20:05:38] Breakout of the denormalized fractional part bit per bit :
[20:05:38]
[20:05:38] Bit 01 -> 0 = 0/2
[20:05:38] Bit 02 -> 0 = 0/4
[20:05:38] Bit 03 -> 1 = 1/8 0.125
[20:05:38] Bit 04 -> 0 = 0/16
[20:05:38] Bit 05 -> 0 = 0/32
[20:05:38] Bit 06 -> 1 = 1/64 0.015625
[20:05:38] --------------------------------------------------------
[20:05:38] Total 0.140625
You see the error is bigger then in the previous cases, that's because the denormalized number is +1111110111101000.001001 and so only few bits were available to represent the fractional part, and so less precision.
It's pretty hairy stuff but very interesting.
I hope the code above can be useful to demystify floating point a little and shed some light on why it works the way it does in our programs.
edit: corrected a bug when exponent is zero and added some more info