How do I know when Val[F/D]() reads a string or "0"

Derren · Post by **Derren** » Thu Mar 05, 2020 4:37 pm

Hello everybody,

this might be a feature request or there might be a quick work around that is not obvious to me.

I just found out that there is now a NaN-value in PB. That's awesome.
So why is Val("not a number") not NaN?
Val() is older than NaN in PB, so I understand that it is not changed so older codes don't suddenly break (then again, my old 2D games broke, too)
But could there be a flag added or something?

What's your way of determining whether you just a number or not?

Code: Select all

Define test1.s, test2.s, test3.s

test1 = "33.5"
test2 = "0"
test3 = "string"

Debug ValD(test1) ;33.5
Debug ValD(test2) ;0
Debug ValD(test3) ;0 - should be NaN, imho?

I'm currently reading values from a file and I have no way of telling PB that a certain value "does not exist".
So currently I'm reading strings and configured my input file in such a way, that "NaN" is written there.
I read a string, check if it's "NaN" and if not, I use Val(). Otherwise I assign NaN to the variable and later use IsNaN().
But if for some reason the file contains "?" or "XYZ", it will show as a 0, which is not something that I want. I want a clear way to distinguish between "0" and "whatever".

The only way I could think of that works for all strings would be to check if Ucase = LCase. This is true for numbers (other than exponentials) and false for strings. And I'd have to do this check everytime I read a 0-value

skywalk · Post by **skywalk** » Thu Mar 05, 2020 5:06 pm

My IsNumeric()...

kenmo · Post by **kenmo** » Thu Mar 05, 2020 5:25 pm

Derren wrote:So why is Val("not a number") not NaN?

There are a few things to consider here...
1. NaN is a special float value, not an integer value. Val() returns an integer so it can't return NaN. ValF() can.
2. StrF(NaN()) = "NaN" so it makes sense that ValF("NaN") returns the value NaN.
3. ValF() understands "NAN" and "nan" etc. but it's not reasonable for it to parse other language variations like "not a number"

Code: Select all

nan.f = NaN() ; NaN is a special float value
Debug nan
Debug Str(nan) ; Integer version of Str doesn't understand NaN value.
Debug StrF(nan)
Debug ValF(StrF(nan))
Debug ValF("NaN")
Debug ValF("NAN")
Debug ValF("nan")
Debug ValF("Not A Num") ; Not understood
Debug ValF("Not A Number") ; Not understood
Debug ValF("not a number") ; Not understood

The only way I could think of that works for all strings would be to check if Ucase = LCase.

Simple and clever, but it doesn't handle other characters, for example a question mark.

What's your way of determining whether you just a number or not?

For serious work you should validate the number with a RegEx or procedure, like skywalk's. There are other versions on the forum too.

this might be a feature request or there might be a quick work around that is not obvious to me.

I'm not aware of a built-in way to validate number strings.
What should the feature request be?
1. A built-in IsNumeric() function? Possible, but it gets complicated quickly... does it just validate integers, or floats too? Allow "e" float notation? Allow "+" prefix? Use "." or European style ","? etc.
2. ValF() always validates the string? I think that would slow down performance in bulk-processing.
3. An optional flag added to ValF() to validate the text? Maybe... return NaN() if invalid? But then the integer version Val() can't do the same.

Since PB can't throw exceptions on invalid input, the best method for now is definitely validate with a custom procedure or regex.

skywalk · Post by **skywalk** » Thu Mar 05, 2020 6:31 pm

Be advised that regex is incredibly slow.
My old speed tests had regex 600x slower than PB custom function.

kenmo · Post by **kenmo** » Fri Mar 06, 2020 2:55 pm

Yes, RegEx can be slow, also it adds RegEx library bloat to your program, also it requires the PCRE license to be included in your program...

Derren · Post by **Derren** » Fri Mar 06, 2020 3:30 pm

kenmo wrote:
Code: Select all
Debug ValF("NAN")

Well, this is a good start. I didn't realize this.

NaN is also not only a float, because ValD() also returns NaN, so why can't an integer be NaN? (rheteoric question, or aimed at the Devs, rather).

The whole dilemma spawns from the fact that #False and #True in PB are 0 and 1. If there were true a false and true, then you could return these. That's an issue I face on a regular basis when writing my own procedures. Most of the time "error" or "false" will be -1, but if you work with a negative range of numbers, -1 is not a good error-indicator, either.
Then you need an error flag or something. And this is what we need here.
Another check anytime Val() reads 0.

kenmo wrote:1. A built-in IsNumeric() function? Possible, but it gets complicated quickly... does it just validate integers, or floats too? Allow "e" float notation? Allow "+" prefix? Use "." or European style ","? etc.

Val() does all of that right now. It even acceps both "." and "," (maybe "," is dependent on a locale setting, no clue)

Code: Select all

Debug ValD("-10.0")
Debug ValD("+10.0")
Debug ValD("+10,0")
Debug ValD("3E2")

The only issue is that it returns "0" on something that is clearly not a number and that integers can't "be NaN".
So if it returns NaN (or anything else that is not a real number) there you already have your isNumeric().

But in the end, "NaN" just like "#True" is just some special number in the range of floats and doubles interpreted in a special way and Murphy's Law dictates that someday a proper value will be read a NaN.

Code: Select all

a.i =  ValD("NaN")
Debug a

I try to avoid RegEx in PB, because of the speed and the licence issue. Most of the time, things are not that complicated and can be done checked with the String Library, just like here.
I could easily write an isNumeric() procedure and I probably will (or use one already posted on the forum, I think there are a few), if that's what's required to deal with this issue.

Since ValF() already checks all the valid notations (and even Hex and Binary), you probably only need to check if the first character of a string is 0 (or - and the 2nd is 0) in order to identify a true 0-number. Anything else is not a number, then.

Thanks for all your input, guys

Marc56us · Post by **Marc56us** » Fri Mar 06, 2020 5:52 pm

Recognizing all numerical values is not that simple, even with a RegEx.
Here is my proposal
^[-+0-9.,'Ee ]+$
(probably incomplete but recognizes also some country formatting (thousand separator))

Adding the regex machine only takes 180kb (not Mb), which is nothing nowadays.
Adding the license file is not a constraint either.
As for the speed, test it.

Code: Select all

If Not CreateRegularExpression(0, "^[-+0-9.,'Ee ]+$")
    Debug RegularExpressionError() : End
EndIf

Procedure IsNumeric(Txt$)
    If MatchRegularExpression(0, Txt$)
        Debug LSet(Txt$, 10) + " Number"
    Else
        Debug LSet(Txt$, 10) + " ------" 
    EndIf
EndProcedure

IsNumeric("33.5")
IsNumeric("0")
IsNumeric("1")
IsNumeric("string")
IsNumeric("string0")
IsNumeric("0string")
IsNumeric("-10.0")
IsNumeric("+10.0")
IsNumeric("+10.0A")
IsNumeric("+10,0")
IsNumeric("3E2")
IsNumeric("3E-2")
IsNumeric("-3E-2")
IsNumeric("M1")
IsNumeric("1M1")
IsNumeric("1 000")
IsNumeric("1 000.30")
IsNumeric("1'000.30")

Code: Select all

33.5       Number
0          Number
1          Number
string     ------
string0    ------
0string    ------
-10.0      Number
+10.0      Number
+10.0A     ------
+10,0      Number
3E2        Number
3E-2       Number
-3E-2      Number
M1         ------
1M1        ------
1 000      Number
1 000.30   Number
1'000.30   Number

Don't forget to differentiate between "is a number" and "contains a number."

kenmo · Post by **kenmo** » Fri Mar 06, 2020 6:46 pm

^[-+0-9.,'Ee ]+$

Lots of false positives!

Code: Select all

IsNumeric("...")
IsNumeric("e+e")
IsNumeric("1, 2, 3")
IsNumeric("------")

NaN is also not only a float, because ValD() also returns NaN, so why can't an integer be NaN? (rheteoric question, or aimed at the Devs, rather).

Doubles are floats, they are double-precision floating points. PB and most languages just call single-precision a "float".

NaN are defined for floats: https://en.wikipedia.org/wiki/NaN
"In IEEE 754 standard-conforming floating-point storage formats, NaNs are identified by specific, pre-defined bit patterns unique to NaNs."
"Most fixed-size integer formats cannot explicitly indicate invalid data."

As you showed in your own example, a NaN interpreted as an integer is just.... a "random looking" but valid integer.

A PB-friendly solution is, use the result of the procedure to indicate success, and use a pointer to store the actual result.

Code: Select all

; Use the result (True/False) to indicate success, use the pointer to store result

Procedure.i Uppercase(*Char.INTEGER)
  Select *Char\i
    Case 'A' To 'Z'
      ProcedureReturn #True
    Case 'a' To 'z'
      *Char\i = *Char\i + ('A' - 'a')
      ProcedureReturn #True
    Default
      ProcedureReturn #False
  EndSelect
EndProcedure

x.i = 'a'
If Uppercase(@x)
  Debug Chr(x)
Else
  Debug "Not a letter"
EndIf

x.i = 'B'
If Uppercase(@x)
  Debug Chr(x)
Else
  Debug "Not a letter"
EndIf

x.i = '%'
If Uppercase(@x)
  Debug Chr(x)
Else
  Debug "Not a letter"
EndIf

PureBasic Forums - English

How do I know when Val[F/D]() reads a string or "0"

How do I know when Val[F/D]() reads a string or "0"

Re: How do I know when Val[F/D]() reads a string or "0"

Re: How do I know when Val[F/D]() reads a string or "0"

Re: How do I know when Val[F/D]() reads a string or "0"

Re: How do I know when Val[F/D]() reads a string or "0"

Re: How do I know when Val[F/D]() reads a string or "0"

Re: How do I know when Val[F/D]() reads a string or "0"

Re: How do I know when Val[F/D]() reads a string or "0"