Page 1 of 2
Faster ways to convert strings to doubles?
Posted: Tue Sep 06, 2016 12:40 am
by Samuel
I'm loading very large data files and the process of converting the strings to doubles is one of the bigger bottlenecks.
Does anyone know of a faster method or do I just have to live with it as it is?
Code: Select all
;Runs at about 1600 milliseconds for me with the debugger off.
;I haven't noticed any major speed differences between floats and doubles.
EnableExplicit
Define.i CTR
Define.i StartTime
Define.i ElapsedTime
Define.d Value
Define.s String
StartTime = ElapsedMilliseconds()
For CTR = 1 To 1000000
String = "13215.33414554664"
Value = ValD(String)
Next
ElapsedTime = ElapsedMilliseconds() - StartTime
MessageRequester("Time!", "Elapsed Milliseconds : " + Str(ElapsedTime))
Re: Faster ways to convert strings to doubles?
Posted: Tue Sep 06, 2016 12:53 am
by J. Baker
Turning unicode off will give some speed. Otherwise someone who knows asm may be of help.
Re: Faster ways to convert strings to doubles?
Posted: Tue Sep 06, 2016 2:31 am
by JHPJHP
Removed; unhelpful results.
Re: Faster ways to convert strings to doubles?
Posted: Tue Sep 06, 2016 6:46 am
by wilbert
Samuel wrote:Does anyone know of a faster method or do I just have to live with it as it is?
ValD has to be able to handle a lot of different situations. For example
It also has to check for invalid characters.
If the strings you have to parse are all positive decimal values like the example you posted and you don't have to check for invalid characters, you could try to write your own parser.
Otherwise I think you just have to live with it.
Re: Faster ways to convert strings to doubles?
Posted: Tue Sep 06, 2016 7:00 am
by J. Baker
wilbert wrote:
It also has to check for invalid characters.
That's a good point. Maybe a #PB_Numeric_Only optional flag could be requested for speed optimization. The developer would have to know that the string(s) only use numeric characters of course.
Re: Faster ways to convert strings to doubles?
Posted: Tue Sep 06, 2016 7:47 am
by infratec
Hi,
a first step:
Code: Select all
Procedure.d CustomValD(String$)
Protected DoubleValue.d, MainValue.i, DezimalValue.i, Factor.i, *pString.Character, Length.i, i.i, PointFlag.i
*pString = @String$
Length = Len(String$) - 1
Factor = 1
For i = 0 To Length
Debug *pString\c
If PointFlag
DezimalValue * 10
DezimalValue + (*pString\c - '0')
Factor * 10
Else
If *pString\c = '.'
PointFlag = #True
Else
MainValue * 10
MainValue + (*pString\c - '0')
EndIf
EndIf
*pString + 2
Next i
Debug "----"
ProcedureReturn MainValue + (DezimalValue / Factor)
EndProcedure
Bernd
Re: Faster ways to convert strings to doubles?
Posted: Tue Sep 06, 2016 8:18 am
by infratec
Step 2
But as usual the improvements are smaller
Code: Select all
Procedure.d CustomValD(String$)
Protected MainValue.i, DezimalValue.i, Factor.i, *pString.Character
*pString = @String$
Factor = 1
While *pString\c
Debug *pString\c
If *pString\c = '.'
*pString + 2
Break
Else
MainValue * 10 + (*pString\c - '0')
EndIf
*pString + 2
Wend
While *pString\c
DezimalValue * 10 + (*pString\c - '0')
Factor * 10
*pString + 2
Wend
Debug "----"
ProcedureReturn MainValue + (DezimalValue / Factor)
EndProcedure
Bernd
Re: Faster ways to convert strings to doubles?
Posted: Tue Sep 06, 2016 8:27 am
by wilbert
@Bernd,
You used the integer format. This means on x86 you can't have numbers with more than 9 digits.
If you want to support both unicode and ascii, add the character size instead of 2.
*pString + SizeOf(Character)
I could try something asm based as well if someone would be interested.
If so, I need to know if ascii still needs to be supported and if it should use x87 or SSE for floats.
Re: Faster ways to convert strings to doubles?
Posted: Tue Sep 06, 2016 8:44 am
by infratec
Hi, hi ,
I'm always up to date.
And in PB 5.50 you don't have to decide: a character is always 2 bytes long.
If he needs larger stuff he have to use .q
It always depends on the needs.
Btw. I tried also
<< 2
+
<< 1
Instead of * 10 but in PB it took longer.
Bernd
Re: Faster ways to convert strings to doubles?
Posted: Tue Sep 06, 2016 8:53 am
by wilbert
infratec wrote:It always depends on the needs.
I couldn't agree with you more

Re: Faster ways to convert strings to doubles?
Posted: Tue Sep 06, 2016 9:03 am
by infratec
What Wilbert mean is:
you have to replace my definition line with:
Code: Select all
Protected MainValue.q, DezimalValue.q, Factor.q, *pString.Character
to get the correct(er) values.
Which slowes it down, of course.
Bernd
Re: Faster ways to convert strings to doubles?
Posted: Tue Sep 06, 2016 10:54 am
by Helle
This is an old code from me, changed now for ASCII and unicode:
Code: Select all
;ONLY for test, not optimal! Don“t use the debugger! No check for string-correctness! For 64-bit-Windows! Make your own tests!
;ASCII and Unicode
Procedure StringToDouble(PointerToString.q, PointerToDouble.q) ;= ATOF(); Parameters: PointerToString = RCX, PointerToDouble = RDX
;!mov rcx,[p.v_PointerToString] ;for Debug
;!mov rdx,[p.v_PointerToDouble]
!PUSH r12
!PUSH r13
!PUSH r14
CompilerIf #PB_Compiler_Unicode
!MOV r14,2
CompilerElse
!MOV r14,1
CompilerEndIf
!XOR rax,rax
!MOV r8,rax
!MOV r9,rax
!MOV r11,rax ;signum_mantissa
!MOV r12,rax ;signum_exponent
!MOV r13,rax ;decimal_point
!MOV al,[rcx]
!CMP al,'-'
!JE .Signum_Neg
!CMP al,'+'
!JE .Signum_Pos
!JMP .Signum_End
!.Signum_Neg:
!INC r11
!.Signum_Pos:
!ADD rcx,r14
!MOV al,byte[rcx]
!.Signum_End:
!FLDZ
!.Read_Mantissa:
!CMP eax,'E'
!JE .Exponent
!CMP eax,'e'
!JE .Exponent
!CMP eax,'.'
!JE .Decimal_Point
!XOR eax,'0'
!CMP eax,9
!JA .Digits_End
!MOV [rdx],rax ;temp
!FMUL qword[Zehner]
!FIADD word[rdx]
!INC r8
!JMP .No_Decimal_Point
!.Decimal_Point:
!CMP r13,0 ;decimal_point
!JNE .Digits_End
!MOV r13,r8 ;R13 = decimal_point
!.No_Decimal_Point:
!ADD rcx,r14
!MOVZX rax,byte[rcx]
!JMP .Read_Mantissa
!.Exponent:
!ADD rcx,r14
!MOVZX rax,byte[rcx]
!CMP al,'-'
!JE .Signum_Exponent_Neg
!CMP al,'+'
!JE .Signum_Exponent_Pos
!JMP .Read_Exponent
!.Signum_Exponent_Neg:
!INC r12 ;signum_exponent
!.Signum_Exponent_Pos:
!ADD rcx,r14
!MOVZX rax,byte[rcx]
!.Read_Exponent:
!XOR rax,'0'
!CMP rax,9
!JA .Digits_End
!LEA r9,[r9*4+r9] ;R9 = Mul 5
!LEA r9,[r9*2+rax] ;R9 = Mul 10
!ADD rcx,r14
!MOVZX rax,byte[rcx]
!JMP .Read_Exponent
!.Digits_End:
!CMP r12,0 ;signum_exponent
!JE .Exponent_Ready
!NEG r9
!.Exponent_Ready:
!MOV rax,r13 ;R13 = decimal_point
!OR rax,rax
!JE .No_Decimal_Point2
!SUB r8,rax
!SUB r9,r8
!.No_Decimal_Point2:
!Or r9,r9
!JE .No_Exponent
!MOV rax,r9
!CMP rax,0
!JGE .Signum_Exponent_OK
!NEG rax
!.Signum_Exponent_OK:
!FLD1
!MOV r10b,al
!AND r10,0Fh
!JE .Big_Exponent
!LEA r10,[r10+r10*4]
!LEA r8,[Zehner_1]
!FLD tword[r10*2+r8-10]
!FMULP st1,st0
!.Big_Exponent:
!MOV r10b,al
!SHR r10b,4
!AND r10,0Fh
!JE .Bigger_Exponent
!LEA r10,[r10+r10*4]
!LEA r8,[Zehner_16]
!FLD tword[r10*2+r8-10]
!FMULP st1,st0
!.Bigger_Exponent:
!SHR rax,8
!AND rax,1Fh
!JE .Signum_Exponent
!LEA rax,[rax+rax*4]
!LEA r8,[Zehner_256]
!FLD tword[rax*2+r8-10]
!FMULP st1,st0
!.Signum_Exponent:
!CMP r9,0
!JGE .Exponent_Pos
!FDIVP st1,st0
!JMP .No_Exponent
!.Exponent_Pos:
!FMULP st1,st0
!.No_Exponent:
!CMP r11,0 ;signum_mantissa
!JE .Ready
!FCHS
!.Ready:
!FSTP qword[rdx] ;dword for Float
!POP r14
!POP r13
!POP r12
ProcedureReturn
!Zehner dq 10.0
!Zehner_1 dt 1.0e1
! dt 1.0e2
! dt 1.0e3
! dt 1.0e4
! dt 1.0e5
! dt 1.0e6
! dt 1.0e7
! dt 1.0e8
! dt 1.0e9
! dt 1.0e10
! dt 1.0e11
! dt 1.0e12
! dt 1.0e13
! dt 1.0e14
! dt 1.0e15
!Zehner_16 dt 1.0e16
! dt 1.0e32
! dt 1.0e48
! dt 1.0e64
! dt 1.0e80
! dt 1.0e96
! dt 1.0e112
! dt 1.0e128
! dt 1.0e144
! dt 1.0e160
! dt 1.0e176
! dt 1.0e192
! dt 1.0e208
! dt 1.0e224
! dt 1.0e240
!Zehner_256 dt 1.0e256
! dt 1.0e512 ;.... for extended double, for this test not used
EndProcedure
Double.d
StartTime = ElapsedMilliseconds()
For CTR = 1 To 1000000
StringToDouble$ = "13215.33414554664"
StringToDouble(@StringToDouble$, @Double)
Next
ElapsedTime = ElapsedMilliseconds() - StartTime
MessageRequester("Time!", "Elapsed Milliseconds : " + Str(ElapsedTime) + #LFCR$ + StrD(Double, 15))
Please make a test!
Re: Faster ways to convert strings to doubles?
Posted: Tue Sep 06, 2016 5:52 pm
by Samuel
Thank you everyone for the replies!
@infratec
Your procedure gives a nice speed boost. In my example above the loop completes in about 560 milliseconds instead of the original 1600 milliseconds. I am using quads and as you said it's a little slower then integers (350 milliseconds), but it's still a nice boost compared to what I had before.
I did need to add support for negative values. I believe what I added to your example should be good enough to handle the negative values unless someone can spot a problem with it.
Code: Select all
Procedure.d UnicodeValD(String$)
Protected MainValue.q, DezimalValue.q, Factor.q, ValueSign.q, *pString.Character
*pString = @String$
ValueSign = 1
Factor = 1
If *pString\c = '-'
Debug "-"
*pString + 2
ValueSign = -1
EndIf
While *pString\c
Debug *pString\c
If *pString\c = '.'
*pString + 2
Break
Else
MainValue * 10 + (*pString\c - '0')
EndIf
*pString + 2
Wend
While *pString\c
DezimalValue * 10 + (*pString\c - '0')
Factor * 10
*pString + 2
Wend
Debug "----"
ProcedureReturn (MainValue + (DezimalValue / Factor)) * ValueSign
EndProcedure
@wilbert and @Helle
Asm is definitely fast, but I can barely understand any of it. I have a hard time using code when I can't follow it.
I do plan on learning asm, but at the moment my plate is pretty full. Anyways, thanks for the offer/example.
Re: Faster ways to convert strings to doubles?
Posted: Tue Sep 06, 2016 6:05 pm
by wilbert
Samuel wrote:Asm is definitely fast, but I can barely understand any of it. I have a hard time using code when I can't follow it.
I do plan on learning asm, but at the moment my plate is pretty full. Anyways, thanks for the offer/example.
There are still some things you can do to improve the speed without the use of asm.
For user defined procedures that take a string as an argument, PureBasic always makes a temporary copy of the string and passes the copy.
If you use a pointer as procedure argument and pass the address of the string instead, that will increase the performance.
Another thing is that division is very slow so instead of multiply Factor each time by 10 after the decimail dot has occurred, I would suggest to simply use a counter and use a small lookup table with values (0.1, 0.01, 0.001 etc.) to multiply with instead of divide.
Re: Faster ways to convert strings to doubles?
Posted: Tue Sep 06, 2016 6:35 pm
by skywalk
Infratec's ValDcustom() ~6x faster than ValD() but does not support scientific notation.