Some simple regular expression examples
Posted: Sat Aug 27, 2011 5:03 pm
This is not really a "trick'n'tips" post, but I thought here was a better place than "general discussion".
Someone asked for some easy example, so I put together some. I'm not a regex expert so I hope this is not a pile of *beep*, I made it with good intentions
Just don't ask me for more complicate patterns !
Some more or less simple regexes :
RegExMatch()
RegExMatch examples
ExtractRegExMatch()
ExtractRegExMatch examples
Someone asked for some easy example, so I put together some. I'm not a regex expert so I hope this is not a pile of *beep*, I made it with good intentions

Just don't ask me for more complicate patterns !
Some more or less simple regexes :
RegExMatch()
Code: Select all
Procedure.i RegExMatch (text$, regex$ = "")
; [DESC]
; Verify if a regular expression match with the passed string.
;
; [INPUT]
; text$ : The string to be checked.
; regex$ : The regular expression to be used.
;
; [RETURN]
; 1 if there is a match, 0 if not, -1 if the regular expression is invalid.
;
; [NOTES]
; You can omit the regular expression between multiple calls it the regex doesn't change.
;
; Here is a list of some of the more useful commands for pattern matching and rember:
; the regex engine is eager, and ?, *, +, are greedy
;
; ^ = look at the start of the line (if used at the begin of the expression)
; $ = look at the end of the line
; \A = look at the start of the string
; \Z = look at the end of the string
; | = OR
; ^ = NOT (if not at the begin)
;
; . = any char that is not a newline (equivalent to [^\r\n] under Windows)
; ? = the preceding char/token is optional (also used to switch from greedy to lazy, ie: *?)
;\s = any whitespace such as [ \t\r\n]
;\S = anything that is not a whitespace
;\d = any decimal digit [0-9]
;\D = anything that is not a decimal digit
;\w = any "word" character [A-Za-z0-9_]
;\W = anything that is not a "word" character
;\b = a boundary between a word and a non-word
;\B = no boundary between word characters
;\Q \E = interpret all the characters between the \Q and the \E are as literal characters
;
; * = 0 or many times {0,}
; + = 1 or many times {1,} (also used to switch from greedy to possessive, ie: *+)
; ( = start sub expression
; ) = end sub expression
; {n} = repeated n times
; {n,m} = repeated n times min, but no more than m
; {n,} = repeated n times min, unlimited max
; [ = start char specifier
; ] = end char specifier
; (?i) = case insensitive matching
; (?-i) = case sensitive matching
;
; ?: = turn off capturing
; ?! = negative lookahed, usually to match something not followed by something else
; example: a(?!b) = a not followed by b
;
; \ = the escape character
; \e = escape ($1B)
; \n = newline ($0A)
; \r = carriage return ($0D)
; \t = tab ($09)
; \xhh = character with hex code hh
;
; Character classes, for example: [[:alpha:]]
;
; alnum letters and digits
; alpha letters
; ascii character codes 0 - 127
; blank space or tab only
; cntrl control characters
; digit decimal digits (same as \d)
; graph printing characters, excluding space
; lower lower case letters
; print printing characters, including space
; punct printing characters, excluding letters and digits and space
; space white space (not quite the same as \s)
; upper upper case letters
; word "word" characters (same as \w)
; xdigit hexadecimal digits
;
; refer to PCRESYNTAX(3) at http://www.pcre.org/pcre.txt for more info about the syntax
Static iRegEx
Protected iRetVal = -1 ; regex error
If regex$
If iRegEx > 0
FreeRegularExpression(iRegEx)
EndIf
iRegEx = CreateRegularExpression(#PB_Any, regex$)
EndIf
If iRegEx
iRetVal = MatchRegularExpression(iRegEx, text$)
EndIf
ProcedureReturn iRetVal
EndProcedure
Code: Select all
Define r$
r$ = "(?i)insensitive(?-i:sensitive)insensitive" ; mixing case sensitive/unsensitive modifiers
Debug RegExMatch("INSENSITIVEsensitiveINSENSITIVE", r$) ; 1
Debug RegExMatch("INSENSITIVEsEnSiTiVeINSENSITIVE") ; 0
r$ = "colou?r" ; one optional char
Debug RegExMatch("English says colour.", r$) ; 1
Debug RegExMatch("The other ones says color.") ; 1
r$ = "test" ; "test" somewhere
Debug RegExMatch("This is a test string.", r$) ; 1
Debug RegExMatch("This is a TEST string.") ; 0
r$ = "test|strong" ; "test" or "strong" somewhere
Debug RegExMatch("This is a test string.", r$) ; 1
Debug RegExMatch("This is a strong string.") ; 1
r$ = "^test" ; "test" at the beginning
Debug RegExMatch("This is a test string.", r$) ; 0
r$ = "^This" ; "This" at the beginning
Debug RegExMatch("This is a test string.", r$) ; 1
r$ = "string\.$" ; "string" at the end
Debug RegExMatch("This is a test string.", r$) ; 1
r$="^num[0-9]$" ; match any single digit
Debug RegExMatch("num1", r$) ; 1
Debug RegExMatch("numa") ; 0
Debug RegExMatch("num12") ; 0
r$="^num[0-9]+$" ; match any number of digits
Debug RegExMatch("num1", r$) ; 1
Debug RegExMatch("numa") ; 0
Debug RegExMatch("num12") ; 1
r$ = ".*\r\n$" ; any strings terminated by a #CRLF$
Debug RegExMatch("This end with CR+LF" + #CRLF$, r$) ; 1
Debug RegExMatch("This one does not" + #CRLF$ + ".") ; 0
r$ = "^B.*" ; any number of chars, even zero
Debug RegExMatch("BB DDD", r$) ; 1
Debug RegExMatch("B") ; 1
Debug RegExMatch("A") ; 0
r$ = "\b(cat|dog)\b" ; "cat" or "dog" as whole words
Debug RegExMatch("I like my dog.", r$) ; 1
Debug RegExMatch("You are a copycat.") ; 0
r$ = "^\d{3}$" ; exactly three digits
Debug RegExMatch("123", r$) ; 1
Debug RegExMatch("1234", r$) ; 0
Debug RegExMatch("a23", r$) ; 0
r$ = "^Match th.. Text!" ; two chars
Debug RegExMatch("Match this Text!", r$) ; 1
Debug RegExMatch("Match the Text!") ; 0
r$ = "^Match th.? Text!" ; one or two chars
Debug RegExMatch("Match the Text!", r$) ; 1
r$ = "^ERROR: \d{3}$" ; three digits at the end
Debug RegExMatch("ERROR: 404", r$) ; 1
r$ = "^Star: \* ABC\d$" ; one escaped '*', one digit at the end
Debug RegExMatch("One escaped asterisk: * and one digit at the end ABC3", r$) ; 1
r$ = "(?m)X$" ; enable multiline mode, match X at the end of a line
Debug RegExMatch("At the end of this line there is one X" + #LF$ + "and nothing here", r$) ; 1
Debug RegExMatch("No X here" + #LF$ + "this one end with X") ; 1
r$ = "(?m)X\Z" ; enable multiline mode, match X at the end of the string only
Debug RegExMatch("At the end of this line there is one X" + #LF$ + "and nothing here", r$) ; 0
Debug RegExMatch("No X here" + #LF$ + "this one end with X") ; 1
r$ = "^AAA[[:lower:]]{3}CCC$" ; three lowercase chars
Debug RegExMatch("AAAbbbCCC", r$) ; 1
Debug RegExMatch("AAABBBCCC") ; 0
r$="\Q[]\^$.|?*+()\E" ; match the problematic string "[]\^$.|?*+()" using \Q \E
Debug RegExMatch("abc []\^$.|?*+() def", r$) ; 1
r$="^((?!beer).)*$" ; match the string if it does not contain "beer" using negative lookahed
Debug RegExMatch("bread and water match", r$) ; 1
Debug RegExMatch("bread and beer doesn't") ; 0
r$ = "^\w+\d{3}\.(?i)(jpg|bmp)$" ; any "word" char, followed by 3 digits, one dot, and "jpg" or "bmp" (case insensitive)
Debug RegExMatch("Image_001.bmp", r$) ; 1
Debug RegExMatch("TEST123.BMP") ; 1
Debug RegExMatch("123.bmp") ; 0
r$ = "^\w+\\\w+\.(?i)txt$" ; any "word" char, followed by '\', any "word" char, '.' and "txt" (case insensitive)
Debug RegExMatch("Example\File.txt", r$) ; 1
Debug RegExMatch("\File.txt") ; 0
Debug RegExMatch("c:\File.txt") ; 0
r$= "^(?i)[a-z]:\\([^/:*?\x22.<>]+\\)*[^/:*?\x22<>]*$" ; full qualified path to a Window's file or folder
Debug RegExMatch("c:\example\test\file.txt", r$) ; 1
Debug RegExMatch("c:\f\ile.txt", r$) ; 1
Debug RegExMatch("\test\file.txt") ; 0
Debug RegExMatch("c:\test\fil*.txt") ; 0
Debug RegExMatch("C:\test\", r$) ; 1
r$ = "^(?i)[A-Z0-9+_.-]+@[A-Z0-9.-]+$" ; not too strict email address validator
Debug RegExMatch("president@whitehouse.org", r$) ; 1
Debug RegExMatch("president") ; 0
Debug RegExMatch("@whitehouse.org") ; 0
Debug RegExMatch("pre$ident@whitehouse.org") ; 0
Debug RegExMatch("user@host") ; 1
Code: Select all
Procedure ExtractRegExMatch (text$, Array result$(1), regex$ = "")
; [DESC]
; Exctract all the matching strings anc copy them to the result$() array.
;
; [INPUT]
; text$ : The string to be checked.
; regex$ : The regular expression to be used.
;
; [OUTPUT]
; result$() : An array of strings dimensioned to 0, it will contain all the matching strings.
;
; [RETURN]
; Return the number of matches or -1 if the regular expression is invalid.
;
; [NOTES]
; You can omit the regular expression between multiple calls it the regex doesn't change.
; See RegExMatch() for help on the PCRE syntax.
Static iRegEx
Protected iRetVal = -1 ; regex error
If regex$
If iRegEx > 0
FreeRegularExpression(iRegEx)
EndIf
iRegEx = CreateRegularExpression(#PB_Any, regex$)
EndIf
If iRegEx
iRetVal = ExtractRegularExpression(iRegEx, text$, result$())
EndIf
ProcedureReturn iRetVal
EndProcedure
Code: Select all
Define r$
Dim result$(0)
r$ = "\b\d{3}$" ; three digits at the end, preceded by a word boundary.
Debug ExtractRegExMatch ("ERROR: 404", result$(), r$) ; 1
Debug result$(0) ; "404" extracted
r$ = "\$[A-Fa-f0-9]+" ; a valid PB hex number
Debug ExtractRegExMatch ("Escape in hex is $1B, Return is $0d and 4096 is $1000.", result$(), r$) ; 3
Debug result$(0) ; "$1B" extracted
Debug result$(1) ; "$0d" extracted
Debug result$(2) ; "$1000" extracted
r$="\d\d[- /.]\d\d[- /.]\d\d" ; simple date match extractor with various separators
Debug ExtractRegExMatch ("Today is 10/08/12 and it's raining.", result$(), r$) ; 1
Debug result$(0) ; "10/08/12" extracted
r$="\b([0-9]|[1-9][0-9]|1[0-9][0-9]|2[0-4][0-9]|25[0-5])\b" ; extract only numbers between 0-255
Debug ExtractRegExMatch ("111,256,0,1,50,100,777,255,1000", result$(), r$) ; 6
Debug result$(0) ; "111" extracted
Debug result$(1) ; "0" extracted
Debug result$(2) ; "1" extracted
Debug result$(3) ; "50" extracted
Debug result$(4) ; "100" extracted
Debug result$(5) ; "255" extracted