IsAlpha IsNumeric

Share your advanced PureBasic knowledge/code with the community.
Iria
User
User
Posts: 43
Joined: Sat Nov 29, 2003 8:49 pm

IsAlpha IsNumeric

Post by Iria »

Hi,

Came across the need to parse large files of ASCII data and needed a couple of routines to check if a number or character was present in the ASCII data, made these a little more generic and useful i.e. they return the position of a character or number if its found. Wrote them in ASM as well you know its more fun that way :) Commented them for people to use change etc...here they are. (IF you find any errors or optimisations then please post them as I use these routines quite a lot!)

Code: Select all

  
  ; *************************************************************************
    ;   Procedure to determine if all the characters within a string
    ;   are alpha chars, useful for validating user input, or 
    ;   passing to Windows API's etc...
    ; 
    ;   Input:    Expects a Null terminated string - All Purebasic strings 
    ;             are null terminated,  which is handy
    ;   Returns : 0 - success i.e. no numerics were found, or XX where XX
    ;             is an offset value pointing to the first numeric value
    ;             found in the string
    ;
    ;   NOTE: An input string of ZERO length will return 0 i.e. it hasnt failed
    ;         validation - but may not be what you expected!
    ;
    ; *************************************************************************
    
    Procedure.l IsAlpha(String$)
      
      Shared is_non_alpha_in_this_string.s
      Shared non_alpha_pointer.l
      is_non_alpha_in_this_string = String$
      non_alpha_pointer = 0                     ; assume we will not find any numberics
      
      ! CLD                                       ; Clear Direction - ensure we are incrementing
      ! MOV ESI, [v_is_non_alpha_in_this_string]  ; Point ESI to our null terminated string in memory
   non_alpha_loop:                           ; start loop
      ! lodsb                                   ; Move byte into AL and inc ESI
      ! TEST al, al                               ; check next byte 
      ! JZ  l_is_alpha_null                       ; is it a null/0 i.e. end of string, jump out
      ! CMP al, $30                               ; Ascii 48  
      ! JB l_non_alpha_loop                       ; this is less than 48 ascii ok try next one
      ! CMP  al, $39                              ; Ascii 58 
      ! JA l_non_alpha_loop                       ; this is more than 58 ascii ok try next one
      ; ok were in between lets find out where and report
      ! SUB ESI, [v_is_non_alpha_in_this_string]  ; take original mem location away from current location to get
      ; position within string where numberic was found
      ! MOV [v_non_alpha_pointer], ESI            ; then slap it in the return value           
   is_alpha_null:                            ; program end
      
      ProcedureReturn non_alpha_pointer         ; dont forget the ESI pointer was incremented anyway
      
    EndProcedure

Code: Select all

    ; *************************************************************************
    ;   Procedure to determine if all the characters within a string
    ;   are numerical, useful for later converting to numerical types, or 
    ;   passing to Windows API's etc...
    ; 
    ;   Input:    Expects a Null terminated string - All Purebasic strings 
    ;             are null terminated,  which is handy
    ;   Returns : 0 - success i.e. no alpha chars were found, or XX where XX
    ;             is an offset value pointing to the first alpha char value
    ;             found in the string
    ;
    ;   NOTE: An input string of ZERO length will return 0 i.e. it hasnt failed
    ;         validation - but may not be what you expected!
    ;
    ; *************************************************************************
    
    Procedure.l IsNumeric(String$)
      
      
      Shared is_non_numeric_in_this_string.s
      Shared non_numeric_pointer.l
      is_non_numeric_in_this_string = String$
      non_numeric_pointer = 0                     ; assume we will not find any alpha chars
      
      ! CLD                                         ; Clear Direction - ensure we are incrementing
      ! MOV ESI, [v_is_non_numeric_in_this_string]  ; Point ESI to our null terminated string in memory
   is_numeric_loop:                            ; start loop
      ! lodsb                                     ; Move first byte into AL and inc ESI
      ! TEST al, al                                 ; check next byte is not a null/0 i.e. end of string
      ! JZ  l_is_numeric_null                       ; end the loop when a null is found 
      ! CMP al, $39                                 ; Ascii 57  
      ! JA l_non_numeric_found                      ; Jump if above ascii 57 (i.e. higher than number 9)
      ! CMP  al, $30                                ; Ascii 48 
      ! JB l_non_numeric_found                      ; Jump if bellow ascii 48 (i.e. lower than number 0)
      ! JMP l_is_numeric_loop                       ; ok try next byte
   non_numeric_found:                          ; if we find an alpha char! 
      !SUB ESI, [v_is_non_numeric_in_this_string]                   ; take original mem location away from current location to get
      ; position within string where numberic was found
      !MOV [v_non_numeric_pointer], ESI              ; then slap it in the return value           
   is_numeric_null:                            ; program end
      
      ProcedureReturn non_numeric_pointer         ; ok output where (if at all!) we found a non numberic char
      
    EndProcedure
User avatar
Danilo
Addict
Addict
Posts: 3036
Joined: Sat Apr 26, 2003 8:26 am
Location: Planet Earth

Post by Danilo »

And PB-Style (for platform-independent use):

Code: Select all

Procedure.l IsAlpha(String$)
  ; check if only a-z and A-Z is used in the string
  ; returns 1 (true)  for alphastrings,
  ; returns 0 (false) if other chars found
  If String$
    *p.BYTE = @String$
    Repeat
      char = *p\b & $FF
      If (char >= 'a' And char <= 'z') Or (char >= 'A' And char <= 'Z') Or char=0
        *p+1
      Else
        ProcedureReturn 0
      EndIf
    Until char = 0
    ProcedureReturn 1
  EndIf
EndProcedure 

Procedure.l IsNumeric(String$) 
  ; check if only numbers 0-9 are used in the string
  ; returns 1 (true)  for number-string,
  ; returns 0 (false) if other chars are found
  If String$
    *p.BYTE = @String$
    Repeat
      char = *p\b & $FF
      If (char >= '0' And char <= '9') Or char=0
        *p+1
      Else
        ProcedureReturn 0
      EndIf
    Until char = 0
    ProcedureReturn 1
  EndIf
EndProcedure 

Debug IsAlpha("")
Debug IsAlpha("aBcDeF")
Debug IsAlpha("abc1")
Debug "---"
Debug IsNumeric("")
Debug IsNumeric("0123456789")
Debug IsNumeric("1234a")
cya,
...Danilo
...:-=< http://codedan.net/work >=-:...
-= FaceBook.com/DaniloKrahn =-
PB
PureBasic Expert
PureBasic Expert
Posts: 7581
Joined: Fri Apr 25, 2003 5:24 pm

Post by PB »

Shouldn't Debug IsNumeric("3.141") return true? ;)
Iria
User
User
Posts: 43
Joined: Sat Nov 29, 2003 8:49 pm

ASM version should be platform independent

Post by Iria »

Correct me if Im wrong 8O but my ASM version should work on any OS that FASM works on? Amiga/Linux/Windoze?

Your right a small mod is needed to deal with the floats, tbh its not what I needed, happy to make the mod's if its needed/requested 8)

The reason for the ASM is the speed, which was quite fast when dealing with 64k string buffers. Be interesting to benchmark them both and see which was faster :)

Cheers
User avatar
Danilo
Addict
Addict
Posts: 3036
Joined: Sat Apr 26, 2003 8:26 am
Location: Planet Earth

Re: ASM version should be platform independent

Post by Danilo »

Iria wrote:Correct me if Im wrong 8O but my ASM version should work
on any OS that FASM works on? Amiga/Linux/Windoze?
Assembly = machine language, it depends 1. on the processor
you code on and 2. on the operating system rules (you overwrite
ESI for example, dont know if PB procedure preserves this on both
Win+Linux).
Maybe it works on x86-Linux, it for sure doesnt work on Amiga
because Amiga has a 680x0 processor instead 80x86.
ASM isnt a high level language, its directly the language of the
processor (made readable for humans), so its a different language
on different processors.
Iria wrote:Be interesting to benchmark them both and see which was
faster
Speed is nothing if you code multi-platform, thats the reason
i posted this procedures - so users can choose themself what
to use -> the speedy procedure or the more compatible procedure.

Its nice to have both ways, IMO.

Another thing: You and me can read/modify your code, but
at least 90% of PB users cant do anything with ASM, but
they can read/code PureBasic - its the language of their choise. ;)
For this users a version they can understand is better sometimes
instead using a procedure they dont understand a bit.
cya,
...Danilo
...:-=< http://codedan.net/work >=-:...
-= FaceBook.com/DaniloKrahn =-
Wayne Diamond
User
User
Posts: 38
Joined: Tue Dec 30, 2003 1:37 pm
Location: Australia

Post by Wayne Diamond »

For this users a version they can understand is better sometimes
instead using a procedure they dont understand a bit.
True, but as long as they understand how to CALL an assembly function and know what it RETURNS, then knowledge of the internal workings isn't really required - they just need to get the job done, they don't necessarily need to know how the job got done :)
TronDoc
Enthusiast
Enthusiast
Posts: 310
Joined: Wed Apr 30, 2003 3:50 am
Location: 3DoorsDown

Post by TronDoc »

Thank you for both solutions.
I am leaning towards non-dependent code
these days...
--jb
peace
[pI 166Mhz 32Mb w95]
[pII 350Mhz 256Mb atir3RagePro WinDoze '98 FE & 2k]
[Athlon 1.3Ghz 160Mb XPHome & RedHat9]
User avatar
Danilo
Addict
Addict
Posts: 3036
Joined: Sat Apr 26, 2003 8:26 am
Location: Planet Earth

Post by Danilo »

Wayne Diamond wrote:True, but as long as they understand how to CALL an assembly
function and know what it RETURNS, then knowledge of the
internal workings isn't really required - they just need to get the
job done, they don't necessarily need to know how the job got
done :)
It happened often enough that old PB codes dont run anymore
after some versions because PB syntax or internals got changed.

People who dont understand ASM are lost with this codes then,
but they can modify the PB version themself.
The BASIC guys mostly dont want to learn ASM... they dont care
about it... (or just dont get it ;))

The used assembler changed some time ago from NASM to FASM,
so some codes need *very small* modifications (for example:
EXTERN -> extrn).
People dont know that and only think "the code doesnt work and
gives a weird error, xyz cant code - he wrote non-working code" -
and write an own procedure then...

Anyway... like i said: Its nice to have both ways, IMO.
cya,
...Danilo
...:-=< http://codedan.net/work >=-:...
-= FaceBook.com/DaniloKrahn =-
naw
Enthusiast
Enthusiast
Posts: 573
Joined: Fri Apr 25, 2003 4:57 pm

Post by naw »

If PB supported Unix Style Regular Expressions, then lots of things would become possible that are very difficult today:

[A-Z] matches Capital Alpha
[A-z] matches any Alpha
[A-Z]* matches any number of Alphas
[0-9,.]* matches numbers
[H|h]ello [W|w]orld matches 'Hello World' or 'hello world' or 'Hello world'

- there must be a library out there that could be bolted into PB by one of the PB *Professors* (Danilo?)
Ta - N
Kale
PureBasic Expert
PureBasic Expert
Posts: 3000
Joined: Fri Apr 25, 2003 6:03 pm
Location: Lincoln, UK
Contact:

Post by Kale »

- there must be a library out there that could be bolted into PB by one of the PB *Professors* (Danilo?)
Check out http://www.reelmediaproductions.com/pb and look in to the ASM libs section.
--Kale

Image
User avatar
blueznl
PureBasic Expert
PureBasic Expert
Posts: 6166
Joined: Sat May 17, 2003 11:31 am
Contact:

Post by blueznl »

naw, perhaps i'll rework my x_matchpattern function to support those unix variations, if there is any demand for it... can you give me some more examples on the syntax?
( PB6.00 LTS Win11 x64 Asrock AB350 Pro4 Ryzen 5 3600 32GB GTX1060 6GB)
( The path to enlightenment and the PureBasic Survival Guide right here... )
naw
Enthusiast
Enthusiast
Posts: 573
Joined: Fri Apr 25, 2003 4:57 pm

Post by naw »

Infact here is a free C RE Library :-)

www.cs.umd.edu/~daveho/software/software.htm

- totally beyond me to turn this into a PB Library, but I'm sure someone out there could...
Ta - N
naw
Enthusiast
Enthusiast
Posts: 573
Joined: Fri Apr 25, 2003 4:57 pm

Post by naw »

- I just posted this from an AWK man page - I have to assume that the Library above will support the same RE's (I would expect it to...).

REs are a little tricky to learn, but incredibly useful and can chop out huge chunks of code with a single statement. Of course the Unix & Linux communities should be very familar with these statements through using 'vi', 'sed' & 'awk' etc...


syntax

. Matches any single character except newline. In awk, dot can match newline also.

* Matches any number (including zero) of the single character (including a character specified by a regular expression) that immediately precedes it.

[...] Matches any one of the class of characters enclosed between the brackets. A circumflex (^) as first character inside brackets reverses the match to all characters except newline and those listed in the class. In awk, newline will also match. A hyphen (-) is used to indicate a range of characters. The close bracket (]) as the first character in class is a member of the class. All other metacharacters lose their meaning when specified as members of a class.

^ First character of regular expression, matches the beginning of the line. Matches the beginning of a string in awk, even if the string contains embedded newlines.

$ As last character of regular expression, matches the end of the line. Matches the end of a string in awk, even if the string contains embedded newlines.

\{n,m\} Matches a range of occurrences of the single character (including a character specified by a regular expression) that immediately precedes it. \{n\} will match exactly n occurrences, \{n,\} will match at least n occurrences, and \{n,m\} will match any number of occurrences between n and m. (sed and grep only, may not be in some very old versions.)

\ Escapes the special character that follows.


Characters Usage

+ Matches one or more occurrences of the preceding regular expression.

? Matches zero or one occurrences of the preceding regular expression.

| Specifies that either the preceding or following regular expression can be matched (alternation).

() Groups regular expressions.

{n,m} Matches a range of occurrences of the single character (including a character specified by a regular expression) that immediately precedes it. {n} will match exactly n occurrences, {n,} will match at least n occurrences, and {n,m} will match any number of occurrences between n and m.


examples

Postal Abbreviation for State [A-Z][A-Z]
City, State ^.*,[A-Z][A-Z]
City, State, Zip (POSIX egrep) ^.*,[A-Z][A-Z][0-9]{5}(-[0-9]{4})?
Month, Day, Year [A-Z][a-z]\{3,9\}[0-9]\{1,2\},[0-9]\{4\}
U.S. Social Security Number [0-9]\{3\}-[0-9]\{2\}-[0-9]\{4\}
North-American Local Telephone [0-9]\{3\}-[0-9]\{4\}
Formatted Dollar Amounts \$[0-9]*\.[0-9][0-9]
troff In-line Font Requests \\f[(BIRP]C*[BW]*
troff Requests ^\.[a-z]\{2\}
troff Macros ^\.[A-Z12].
troff Macro with arguments ^\.[A-Z12].".*"
HTML In-line Codes <[^>]*>
Ventura Publisher Style Codes ^@.*=.*
Match blank lines ^$
Match entire line ^.*$
Match one or more spaces *
Ta - N
FloHimself
Enthusiast
Enthusiast
Posts: 229
Joined: Wed May 14, 2003 3:38 pm
Location: Lüneburg - Germany

Post by FloHimself »

naw wrote:Infact here is a free C RE Library :-)

www.cs.umd.edu/~daveho/software/software.htm

- totally beyond me to turn this into a PB Library, but I'm sure someone out there could...
The C++ regular expression library by David Hovemeyer, that you mention here, is based on Henry Spencer's classic regular expression library. I've already compiled 2 RegExp libraries for Purebasic (PBRegExp and PBRegExpEx), also based on Henry Spencer's classic regular expression library. Both can be found at the http://www.reelmediaproductions.com/pb.

regards,
Flo
Kale
PureBasic Expert
PureBasic Expert
Posts: 3000
Joined: Fri Apr 25, 2003 6:03 pm
Location: Lincoln, UK
Contact:

Post by Kale »

I've already compiled 2 RegExp libraries for Purebasic (PBRegExp and PBRegExpEx), also based on Henry Spencer's classic regular expression library. Both can be found at the http://www.reelmediaproductions.com/pb.
Thats exactly what i was pointing at earlier on in this thread. 8)
--Kale

Image
Post Reply