Expression Evaluator

Xombie · Post by **Xombie** » Wed Nov 02, 2005 3:29 pm

Can you give a more detailed explanation and post your version of the IsNumber() function so I can see what you mean? Or are you talking about the number like 2.32 x 10^3 numbers?

jack · Post by **jack** » Thu Nov 03, 2005 12:50 am

sorry for the late reply Xombie, I just got back from work, here's my mod.

Code: Select all

Procedure.l IsNumber(inString.s, DecimalCharacter.b, ThousandsSeparator.b, *ReturnValue.l) 
   ; TODO: Update to test for international decimal value and then check for IsNumeric calls to make sure they are 
   ; international ready.  Maybe need a StringToSingle call first? 
   ; 
   ; *ReturnValue (if a LONG address is passed) will contain a 'cleaned' version of the number.  eg, no commas. 
   ; IT MUST BE INITIALZED TO EMPTY OR OTHERWISE OR ELSE NOTHING WILL BE PASSED BACK. 
   ; 
   iLoop.l 
   ; 
   HoldCleaned.s 
   ; String used to hold the 'cleaned' value. 
   CountDecimal.l : CountThousands.l : CountNumeric.l : CountDecimalNumerics.l 
   ; The count of the decimal/thousands separator.  Also the count of the numbers. 
   isHex.b = #False 
   isSci.b = #False
   ; 
   PositionDecimal.l : PositionThousands.l 
   ; The location of the decimal/thousands separator. 
   IsNegative.b 
   ; True if the value is negative. 
   HoldChar.b 
   ; This will store an individual character to test if it's numeric. 
   *MemPosition.l = @inString 
   ; 
   HoldLength.l = Len(inString) 
   ; This will store the length of our string in characters. 
   Repeat 
      ; 
      HoldChar = PeekB(*MemPosition) 
      ; Store the current character. 
      If HoldChar > 47 And HoldChar < 58 
         ; Numeral 0 to 9. 
         If CountDecimal : CountDecimalNumerics + 1 : Else : CountNumeric + 1 : EndIf 
         ; 
         HoldCleaned + PeekS(*MemPosition, 1) 
         ; Add the number to the 'clean' string. 
      ElseIf (HoldChar = '-') Or (HoldChar = '+') 
         ; - (negative sign) 
         If iLoop > 0 : ProcedureReturn #False : EndIf 
         ; 
         If (isSci = #False) And (HoldChar = '-') :IsNegative = #True : EndIf
         ; If the minus sign is not at the front, it's not a numeral. 
         HoldCleaned + PeekS(*MemPosition, 1) 
         ; Add the negative sign to the 'clean' string. 
      ElseIf HoldChar = 36 
         ; $ (hex sign) 
         If iLoop > 0 : ProcedureReturn #False : EndIf 
         ; If the $ isn't at the front of the string, it's not a hex value. 
         isHex = #True 
         ; 
      ElseIf HoldChar = ThousandsSeparator ; 44 
         ; , (comma) 
         If CountDecimal Or isHex : ProcedureReturn #False : EndIf 
         ; Never thousands after a decimal.  Also, hex values never use a comma. 
         If CountThousands And CountNumeric < 3 : ProcedureReturn #False : EndIf 
         ; Thousands separator requires at least three numbers. 
         CountNumeric = 0 
         ; Reset the number count. 
         CountThousands + 1 
         ; 
      ElseIf HoldChar = DecimalCharacter ; 46 
         ; . (decimal sign) 
         If CountDecimal > 1 : ProcedureReturn #False : EndIf 
         ; If there is more than one decimal, it's not an integer. 
         If isHex : ProcedureReturn #False : EndIf 
         ; A hexidecimal will never have a decimal. 
         If CountThousands And CountNumeric < 3 : ProcedureReturn #False : EndIf 
         ; Thousands separator requires at least three numbers. 
         PositionDecimal = *MemPosition - @inString 
         ; Store the location of the decimal character. 
         CountDecimal + 1 
         ; Increment our decimal count. 
         HoldCleaned + PeekS(*MemPosition, 1) 
         ; Add the decimal to the 'clean' string. 
      ElseIf (HoldChar='e') Or (HoldChar='E')
         HoldCleaned + PeekS(*MemPosition, 1)
         isSci = #True        
      Else
         ; 
         ProcedureReturn #False 
         ; Unknown character, non-numeral. 
      EndIf 
      ; 
      *MemPosition + 1 
      ; 
   Until *MemPosition - @inString = HoldLength 
   ; 
   If CountThousands And CountNumeric < 3 : ProcedureReturn #False : EndIf 
   ; Thousands separator requires at least three numbers. 
   If (CountDecimal And CountDecimalNumerics) Or isSci 
      ; 
      If *ReturnValue : PokeS(*ReturnValue, HoldCleaned) : EndIf 
      ; Return a 'cleaned' float value. 
      If IsNegative : ProcedureReturn -2 : Else : ProcedureReturn 2 : EndIf 
      ; Return 2 for a float or -2 for a negative float.  There must be numbers after the decimal to be considered a float. 
   EndIf 
   ; 
   If CountNumeric 
      ; 
      If *ReturnValue : PokeS(*ReturnValue, HoldCleaned) : EndIf 
      ; Return a 'cleaned' integer value. 
      If IsNegative : ProcedureReturn -1 : Else : ProcedureReturn 1 : EndIf 
      ; Return -1 for a negative integer or 1 for a positive integer. 
   EndIf 
   ; 
   ProcedureReturn #False 
   ; If we got this far, it must be a string. 
EndProcedure

s.s=Space(200)
Debug IsNumber("1.23e+5",'.',0,@s)
Debug s

Xombie · Post by **Xombie** » Sun Dec 25, 2005 12:40 am

Well, adding in support for "1.23e+5" as a number would be painful for my expression evaluator. The "e" and "+" primarily since it would easily confuse those as operators. Just write your numbers like a normal person!

josku_x · Post by **josku_x** » Sun Dec 25, 2005 10:58 am

1+1=2e+0?

jack · Post by **jack** » Sun Dec 25, 2005 11:07 pm

hi Xombie, I realize that adding scientific notation to your parser is not easy, but being that it uses strings as a stack to store intermediate results, I fear for loss of precision.
perhaps in the future you may want to rewrite the evaluator as a byte-code compiler, where you have the compiler taking care of parsing the expression, extracting literal constants and storing them in merory thus making the interpreter much simpler and faster.

Xombie · Post by **Xombie** » Sun Dec 25, 2005 11:12 pm

jack wrote:...perhaps in the future you may want to rewrite the evaluator as a byte-code compiler, where you have the compiler taking care of parsing the expression, extracting literal constants and storing them in merory thus making the interpreter much simpler and faster.

...byte-code compiler? ('_' ) Eh? ( '_') You'll have to explain that one to me. What exactly is it and how would I go about doing it so that it works with PB? And would it even be worth it for this function?

jack · Post by **jack** » Mon Dec 26, 2005 2:42 am

Xombie wrote:would it even be worth it for this function?

probably not for this evaluator, I was thinking about the evaluator in your grid control, about explaining the byte-code compiler stuff, Trond could probably help you.

Xombie · Post by **Xombie** » Wed Jan 18, 2006 11:59 pm

New update. This is my first asm code that I've released. Well, aside from that IsOdd() code a long time (which was slow anyway

)

Code: Select all

Procedure.l IsNumber(InString.s, DecimalCharacter.b, ThousandsSeparator.b)
   ; +0, +4, +5
   CaughtSpace.b ; + 6
   ; True if an empty space exists in the string.
   CaughtDecimal.b ; + 7
   ; True if a decimal character exists.
   CountNumeric.l ; + 8
   ; The count of numbers before the decimal place.
   CountDecimal.l ; + 12
   ; The count of numbers after the decimal place.
   IsNegative.b ; + 16
   ; True if the number is negative.
   CaughtThousand.b ; + 17
   ; True if a thousands character exists. 
   !MOV esi, dword [esp]
   ; Store the address of the string in esi.
   !DEC esi
   ; Decrement to the previous byte.  This is only because I wanted to store the INC call at the start of the 
   ; loop instead of having multiple "INC esi" calls within the code.
   ;/ Character loop start
   !Start:
   ; This is the beginning of the character loop.
   !INC esi
   ; Increment the character pointer.  The characters are stored as bytes.
   !MOVZX eax, byte [esi]
   ; Store the current character.
   !CMP eax, 0
   !JE Finished
   ; Check for EOL character.
   !MOVZX edi, byte [esp + 4]
   ; Store the decimal character as a dword value so the compare operation will work.
   !CMP eax, edi
   !JE CharDecimal
   ; Check for the decimal charater.
   !CMP eax, 32
   !JE CharSpace
   ; Check for the empty character.
   !CMP eax, 45
   !JE CharNegative
   ; Check for the negative sign character.
   !MOVZX edi, byte [esp + 5]
   ; Store the thousands separator character as a dword value so the compare operation will work.
   !CMP eax, edi
   !JE CharThousand
   ; Check for the thousands separator charater.
   !CMP eax, 48
   !JB NonNumeric
   !CMP eax, 57
   !JA NonNumeric
   ; Check for a non-numeric character.
   ;/ Character is a numeric character. 
   !CMP byte [esp + 6], 1
   !JE NonNumeric
   ; Check if a space exists mid-string.  If so, the string is non-numeric.
   !CMP byte [esp + 7], 0
   !JE @f
   ; Check if a decimal character exists.
   !INC dword [esp + 12]
   ; Increase the count of numbers after the decimal place.
   !JMP Start
   ; Process the next character.
   !@@:
   ; A decimal character does not exist in the string.  This number is a non-decimal numeric.
   !CMP byte [esp + 17], 0
   !JE @f
   ; Check if a thousands separator exists.
   !CMP dword [esp + 8], 3
   !JA NonNumeric
   ; There can be no more than 3 numbers after a thousands separator.
   !@@:
   ; No thousands separator exists.
   !INC dword [esp + 8]
   ; Increase the count of non-decimal numerics.
   !JMP Start
   ; Process the next character.
   ;/ Character is a decimal.
   !CharDecimal:
   ; Located a decimal character.
   !CMP byte [esp + 6], 1
   !JE NonNumeric
   ; Check if a space exists mid-string.  If so, the string is non-numeric.
   !CMP byte [esp + 7], 1
   !JE NonNumeric
   ; Check if a decimal character already exists.  Only one decimal character is allowed per string.
   !CMP dword [esp + 8], 0
   !JE NonNumeric
   ; There must be numbers before the decimal place.
   !MOV byte [esp + 7], 1
   ; Set CaughtDecimal to True.
   !JMP Start
   ; Process the next character.
   ;/ Character is a thousands separator.
   !CharThousand:
   ; Located a thousands separator character.
   !CMP byte [esp + 6], 1
   !JE NonNumeric
   ; Check if a space exists mid-string.  If so, the string is non-numeric.
   !CMP byte [esp + 7], 1
   !JE NonNumeric
   ; Check if a decimal character already exists.  The thousands separator is invalid after the decimal.
   !CMP byte [esp + 17], 0
   !JE @f
   ; Check if a thousands separator already exists in the string.
   !CMP dword [esp + 8], 3
   !JNE NonNumeric
   ; There must be 3 numbers before a thousands separator when a thousands separator already exists in the string.
   !@@:
   ; No thousands separator exist yet.
   !CMP dword [esp + 8], 3
   !JA NonNumeric
   ; Check the non-decimal numeric count.  There must be no more than three numbers before a thousands character.  There
   ; will be less than 3 for the first thousands separator (typically) - "3,323" or "38,391" or "849,328". 
   !CMP dword [esp + 8], 0
   !JE NonNumeric
   ; There must be at least 1 number in front of the thousands separator.
   !MOV byte [esp + 17], 1
   ; Set CaughtThousands to true.
   !MOV dword [esp + 8], 0
   ; Reset the non-decimal numeric count.
   !JMP Start
   ;Process the next character.
   ;/ Character is a negative sign.
   !CharNegative:
   ; Located a negative sign.
   !CMP dword [esp], esi
   !JNE NonNumeric
   ; The negative sign must be the first character.
   !MOV byte [esp + 16], 1
   ; The number is negative.
   !JMP Start
   ; Process the next character.
   ;/ Character is a space.
   !CharSpace:
   ; Located an empty character.
   !CMP byte [esp + 7], 1
   !JE @f
   !CMP byte [esp + 17], 1
   !JE @f
   !CMP dword [esp + 8], 0
   !JNE @f
   ; Check if a decimal, thousand or numeric character exists.  If so, the space is either a trailing space or a space mid-string.
   !JMP Start
   ; The space is a leading space.  Ignore it and process the next character.
   !@@:
   ; The space exists mid-string and is not a leading space.
   !MOV byte [esp + 6], 1
   ; Set CaughtSpace to True.
   !JMP Start
   ; Process the next character.
   ;/ Finished processing the string.
   !Finished:
   ; Reached the end of the string.
   !CMP dword [esp + 8], 0
   !JE NonNumeric
   ; There must be some numbers in the string.
   !CMP byte [esp + 17], 0
   !JE @f
   !CMP dword [esp + 8], 3
   !JNE NonNumeric
   ; There must be three numbers after the last thousands separator.
   !@@:
   ; No thousands character exists or there are 3 numbers after the thousand character. 
   !CMP byte [esp + 7], 0
   !JE NonDecimal
   !CMP dword [esp + 12], 0
   !JE NonNumeric
   ; Check if a decimal character exists.  If it does, ensure there are numbers after the decimal place.
   !CMP byte [esp + 16], 0
   !JE @f
   ; Check if the number is negative.
   !MOV eax, -2
   ProcedureReturn
   ; -2 signifies a negative decimal number.
   !@@:
   ; The decimal number is positive.
   !MOV eax, 2
   ProcedureReturn
   ; 2 signifies a positive decimal number.
   !NonDecimal:
   ; No decimal character exists.
   !CMP byte [esp + 16], 0
   !JE @f
   ; Check if the number is negative.
   !MOV eax, -1
   ProcedureReturn
   ; -1 signifies a negative integer.
   !@@:
   ; The integer is positive.
   !MOV eax, 1
   ProcedureReturn
   ; 1 signifies a positive integer.
   !NonNumeric:
   ; The string contains a non-numeric character.
   !MOV eax, 0
   ; The string is non-numeric.  Return False.
   ProcedureReturn
   ;
EndProcedure

I've removed the ability to "clean" a numeric string since I wasn't using it at all.

So, now it is roughly 7 times faster than my original code since I've moved it to ASM. And not only that but it's also... err... actually producing correct results. The last code I had would return valid for numbers that were not.

This may still miss some things but it should be much more correct now. Let me know if you spot some errors on it.

The asm part of it makes it faster but it is one of my first tries so I'm sure it's still unoptimized. So don't laugh too hard >_>

I'll work on converting the expression evaluator to asm next and see how that goes

dracflamloc · Post by **dracflamloc** » Fri Mar 03, 2006 8:57 pm

Hey Xombie, I'm using your evaluator modified a bit for my personal scripting language I'm releasing under the LGPL. I found an inconsistancy in your evaluator:

If you evaluate something like this: "Hi" & "Sup", the returned string is:
HiSup

However if you evaluate this: "HiSup", the returned string is:
"HiSup"
(Notice it includes the quotes)

Now in my case I don't want those quotes so I added this to the end of Evaluate to fix my version:

Code: Select all


If Left(HoldExpression,1)="'" ; for your version use chr(34)
      HoldExpression=Right(HoldExpression,Len(HoldExpression)-1)
   EndIf 
   If Right(HoldExpression,1)="'" ; for your version use chr(34)
      HoldExpression=Left(HoldExpression,Len(HoldExpression)-1)
   EndIf

If you DO want the quotes on anything that had a string operation it'll have to be fixed another way. And even this way is a bit of a hack since strings could have the quote in them after all the processing.

But I'm just letting you all know.

Btw thanks for the code Xombie, really saved me a lot of time and debugging!

dracflamloc · Post by **dracflamloc** » Fri Mar 03, 2006 9:17 pm

Also to ensure it does not crash when running execute on a blank string, add this to the very top of Evaluate():

Code: Select all

If Trim(expression)=""
  ProcedureReturn ""
EndIf

Xombie · Post by **Xombie** » Tue Mar 07, 2006 12:18 am

Ah. Thanks for the input but I had this other code here that's a complete rewrite. It's at the beginning stages so be careful.

At the very least it should be faster than my old code. It's for PB4 and it should be unicode compatible.

EDITED: Code removed. Check out links in my later post.

Let me know if you spot something else. As it's just starting, there is only one function. Also, I changed adding strings together to use "+" rather than "&". Have a good day ^_^

dracflamloc · Post by **dracflamloc** » Tue Mar 07, 2006 12:24 am

Good move with the +. I wrote my own concatenate functions to handle strings since I dont like the & and want to make this as pb-like as possible. By the way I'm making it in pb 3.94. I'll forward port it once pb4-linux is in beta, so at that time I'll check out your new version =)

Xombie · Post by **Xombie** » Tue Mar 07, 2006 12:35 am

No problem. It'd be pretty easy to convert this to 3.94. If you want, I can do it and post a different version. Otherwise, the only PB4 things are... hmm... should "Define", ".d" (double variables) and ValD(), StrD().

Just convert the double functions/variables to floats and get rid of the Defines.

dracflamloc · Post by **dracflamloc** » Tue Mar 07, 2006 2:23 am

Looks like some macros too

Xombie · Post by **Xombie** » Tue Mar 07, 2006 7:29 pm

Ah yeah. Those darn macros. So easy to forget.

http://www.seijin.net/Storage/xSolve/xSolve-394.pb

http://www.seijin.net/Storage/xSolve/xSolve-400.pb

There are two separate versions now. One for 3.94 and one for 4.00b5.

Let me know if that 3.94 one works out for you. I'd be curious to see how it does in your speed testing. It uses pure PB floats so decimal stuff will be sketchy. It'd probably take about 5 minutes to convert it to use something like jack's F64 library but I'll leave it as floats for now.

Take care

(Edit: I uploaded again since I noticed a few bugs. Download from the same spot.)