Page 1 of 1
Wildcard Pattern Matching
Posted: Thu Aug 23, 2012 11:24 pm
by Shield
Nothing too exciting or new, just a small pattern matching algorithm.
Maybe someone can make use of it where the regular expressions are not required.
In addition to most other simple wildcard functions, this one accepts
'#' as a digit placeholder and '\' as the escape character.
Code: Select all
; Checks if the text matches the specified pattern.
; Supported Wildcards:
; -> * Match everything.
; -> ? Match exactly one character.
; -> # Match a digit.
; -> \ Used to escape '*', '?', '#' and '\'.
; Returns '1' if the text matches the pattern, '0' if not.
; -1 will be returned if the pattern is invalid.
Procedure.i IsWildcardMatch(text.s, pattern.s)
Protected *text.Character
Protected *pattern.Character
Protected *match.Character
Protected *current.Character
*text = @text
*pattern = @pattern
While *text\c <> #Null
Select *pattern\c
Case '\'
*pattern + SizeOf(Character)
Select *pattern\c
Case '*', '?', '#', '\'
If *pattern\c <> *text\c
ProcedureReturn #False
Else
*pattern + SizeOf(Character)
*text + SizeOf(Character)
EndIf
Default
ProcedureReturn -1
EndSelect
Case '*'
*pattern + SizeOf(Character)
If *pattern\c = #Null
ProcedureReturn #True
EndIf
*match = *pattern
*current = *text + SizeOf(Character)
Case '#'
If *text\c < '0' Or *text\c > '9'
ProcedureReturn #False
Else
*text + SizeOf(Character)
*pattern + SizeOf(Character)
EndIf
Case '?', *text\c
*text + SizeOf(Character)
*pattern + SizeOf(Character)
Default
If *current = #Null
ProcedureReturn #False
Else
*pattern = *match
*text = *current
*current + SizeOf(Character)
EndIf
EndSelect
Wend
While *pattern\c = '*'
*pattern + SizeOf(Character)
Wend
If *pattern\c = #Null
ProcedureReturn #True
EndIf
ProcedureReturn #False
EndProcedure
Debug IsWildcardMatch("343", "###")
Debug IsWildcardMatch("Match this Text!", "Match th?? Text!")
Debug IsWildcardMatch("Match that Text!", "Match th?? Text!")
Debug IsWildcardMatch("ERROR: 404", "ERROR: ###")
Debug IsWildcardMatch("Star: * ABC3", "Star: \* ABC#")
Debug IsWildcardMatch("BBB DDD", "B*")
Debug "----------------------------------------------------------------------"
Debug IsWildcardMatch("No Match!", "No Match\?")
Debug IsWildcardMatch("AAABBBCCC", "AAA???DDD")
Debug IsWildcardMatch("BBB DDD", "#")
Re: Wildcard Pattern Matching
Posted: Thu Aug 23, 2012 11:26 pm
by IdeasVacuum
Now that is nice Shield, thanks for sharing it.
I really struggle to get my head around regex.
Re: Wildcard Pattern Matching
Posted: Thu Aug 23, 2012 11:53 pm
by luis
Very nice and very easy to modify/extend (I added '@' to match alpha chars), good job!
Thanks!
Re: Wildcard Pattern Matching
Posted: Thu Aug 23, 2012 11:55 pm
by idle
nice tip.
Re: Wildcard Pattern Matching
Posted: Fri Aug 24, 2012 12:16 am
by STARGÅTE
here are two bugs
Code: Select all
Debug IsWildcardMatch("Image_001.jpg", "*###.jpg")
Debug IsWildcardMatch("Example\File.txt", "*\*.txt")
I think you have to create a list for *match and *current, because once "back" is not always enough.
I had already written this too, I know that problem ^^
Re: Wildcard Pattern Matching
Posted: Fri Aug 24, 2012 1:06 am
by Shield
Interesting find.
However I don't have the time to fix that at the moment,
maybe somebody else wants to take a look into solving that problem.

Re: Wildcard Pattern Matching
Posted: Fri Aug 24, 2012 3:27 pm
by jassing
W/o doing a ton of testing; this looks to do what you want:
Code: Select all
; Checks if the text matches the specified pattern.
; Supported Wildcards:
; -> * Match everything.
; -> ? Match exactly one character.
; -> # Match a digit.
; -> \ Used to escape '*', '?', '#' and '\'.
; Returns '1' if the text matches the pattern, '0' if not.
; -1 will be returned if the pattern is invalid.
#Error = -1
Procedure IsWildcardMatch(text.s, pattern.s)
Protected bMatch = #False
Protected r
; Handle any real periods
pattern = ReplaceString(pattern,".","[.]")
; convert '*' to regex
r=CreateRegularExpression(#PB_Any,"(?<!\\)[*]")
pattern = ReplaceRegularExpression(r,pattern,".*")
FreeRegularExpression(r)
; convet ? t regex
r=CreateRegularExpression(#PB_Any,"(?<!\\)[?]")
pattern = ReplaceRegularExpression(r,pattern,".") ; {1} not needed..
FreeRegularExpression(r)
; Convert # to regex
r=CreateRegularExpression(#PB_Any,"(?<!\\)#")
pattern = ReplaceRegularExpression(r,pattern,"[0-9]")
FreeRegularExpression(r)
If CreateRegularExpression(0,pattern)
If MatchRegularExpression(0,text)
bMatch = #True
EndIf
Else
bMatch = #Error
EndIf
ProcedureReturn bMatch
EndProcedure
Debug IsWildcardMatch("343", "###")
Debug IsWildcardMatch("Match this Text!", "Match th?? Text!")
Debug IsWildcardMatch("Match that Text!", "Match th?? Text!")
Debug IsWildcardMatch("ERROR: 404", "ERROR: ###")
Debug IsWildcardMatch("Star: * ABC3", "Star: \* ABC#")
Debug IsWildcardMatch("BBB DDD", "B*")
Debug "----------------------------------------------------------------------"
Debug IsWildcardMatch("No Match!", "No Match\?")
Debug IsWildcardMatch("AAABBBCCC", "AAA???DDD")
Debug IsWildcardMatch("BBB DDD", "#")
Debug "----------------------------------------------------------------------"
Debug "Should match:"
Debug IsWildcardMatch("Image_001.jpg", "*###.jpg")
Debug "Should Fail (escaped \* to be literal '*')"
Debug IsWildcardMatch("Example\File.txt", "*\*.txt")
Debug "Should match, using escaped '*'"
Debug IsWildcardMatch("Example\*.txt", "*\*.txt")
Debug "Should match (Properly escaped backslash)"
Debug IsWildcardMatch("Example\File.txt", "*\\*.txt")
Re: Wildcard Pattern Matching
Posted: Fri Aug 24, 2012 3:28 pm
by Danilo
Added '@' for alphabetic characters and '_' and Space
Added '$' for AlphaNumeric -> digits and alphabetic characters and '_' and Space
'#+' = multiple digits
'@+' = multiple alphabetic characters and '_' and Spaces
'$+' = multiple digits and alphabetic characters and '_' and Spaces
changed '*' to eat everything to the end. '?' is used to match any one character anyway.
Code: Select all
;
; Wildcard Pattern Matching
;
; by Shield, slightly modified by Danilo
;
; http://www.purebasic.fr/english/viewtopic.php?f=12&t=51048
;
Procedure.i IsNum(char.c)
If char < '0' Or char > '9'
ProcedureReturn #False
EndIf
ProcedureReturn #True
EndProcedure
Procedure.i IsAlpha(char.c)
If (char >= 'A' And char <= 'Z') Or (char >= 'a' And char <= 'z') Or (char = '_') Or (char = ' ')
ProcedureReturn #True
EndIf
ProcedureReturn #False
EndProcedure
Procedure.i IsAlphaNum(char.c)
If IsNum(char) Or IsAlpha(char)
ProcedureReturn #True
EndIf
ProcedureReturn #False
EndProcedure
; Checks if the text matches the specified pattern.
; Supported Wildcards:
; -> * Match everything to the end
; -> ? Match exactly one character.
; -> # Match a digit.
; -> @ Match an alphabetic character or '_'
; -> $ Match a digit or an alphabetic character or '_'
; -> #+ Match multiple digits.
; -> @+ Match multiple alphabetic characters or '_'
; -> $+ Match multiple digits or alphabetic characters or '_'
; -> \ Used to escape '*', '?', '#', '@', '$' and '\'.
; Returns '1' if the text matches the pattern, '0' if not.
; -1 will be returned if the pattern is invalid.
Procedure.i IsWildcardMatch(text.s, pattern.s)
Protected *text.Character
Protected *pattern.Character
Protected *match.Character
Protected *current.Character
*text = @text
*pattern = @pattern
While *text\c <> #Null
Select *pattern\c
Case '\'
*pattern + SizeOf(Character)
Select *pattern\c
Case '*', '?', '#', '\', '@', '$'
If *pattern\c <> *text\c
ProcedureReturn #False
Else
*pattern + SizeOf(Character)
*text + SizeOf(Character)
EndIf
Default
ProcedureReturn -1
EndSelect
Case '*'
*pattern + SizeOf(Character)
If *pattern\c = #Null
ProcedureReturn #True
EndIf
While *text\c
*text + SizeOf(Character)
Wend
Case '@'
*pattern + SizeOf(Character)
If *pattern\c = '+'
*pattern + SizeOf(Character) ; multi alpha: @+
If IsAlpha(*text\c)
While IsAlpha(*text\c)
*text + SizeOf(Character)
Wend
Else
ProcedureReturn #False
EndIf
ElseIf IsAlpha(*text\c) ; single alpha: @
*text + SizeOf(Character)
Else
ProcedureReturn #False
EndIf
Case '#'
*pattern + SizeOf(Character)
If *pattern\c = '+'
*pattern + SizeOf(Character) ; multi number: #+
If IsNum(*text\c)
While IsNum(*text\c)
*text + SizeOf(Character)
Wend
Else
ProcedureReturn #False
EndIf
ElseIf IsNum(*text\c) ; single number: #
*text + SizeOf(Character)
Else
ProcedureReturn #False
EndIf
Case '$'
*pattern + SizeOf(Character)
If *pattern\c = '+'
*pattern + SizeOf(Character) ; multi AlphaNum: $+
If IsAlphaNum(*text\c)
While IsAlphaNum(*text\c)
*text + SizeOf(Character)
Wend
Else
ProcedureReturn #False
EndIf
ElseIf IsAlphaNum(*text\c) ; single AlphaNum: $
*text + SizeOf(Character)
Else
ProcedureReturn #False
EndIf
Case '?', *text\c
*text + SizeOf(Character)
*pattern + SizeOf(Character)
Default
If *current = #Null
ProcedureReturn #False
Else
*pattern = *match
*text = *current
*current + SizeOf(Character)
EndIf
EndSelect
Wend
While *pattern\c = '*'
*pattern + SizeOf(Character)
Wend
If *pattern\c = #Null
ProcedureReturn #True
EndIf
ProcedureReturn #False
EndProcedure
Debug IsWildcardMatch("343", "###")
Debug IsWildcardMatch("Match this Text!", "Match th?? Text!")
Debug IsWildcardMatch("Match that Text!", "Match th?? Text!")
Debug IsWildcardMatch("ERROR: 404", "ERROR: ###")
Debug IsWildcardMatch("Star: * ABC3", "Star: \* ABC#")
Debug IsWildcardMatch("BBB DDD", "B*")
Debug IsWildcardMatch("Image_001.jpg", "*")
Debug IsWildcardMatch("Example\File.txt", "*")
Debug "----------------------------------------------------------------------"
Debug IsWildcardMatch("No Match!", "No Match\?")
Debug IsWildcardMatch("AAABBBCCC", "AAA???DDD")
Debug IsWildcardMatch("BBB DDD", "#")
Debug IsWildcardMatch("Image_001.jpg", "*###.jpg")
Debug IsWildcardMatch("Example\File.txt", "*\*.txt")
Debug "--[ STARGÅTE ]--------------------------------------------------------"
Debug IsWildcardMatch("Image_001.jpg", "@+###.jpg")
Debug IsWildcardMatch("Example\File.txt", "@+\\@+.txt")
Debug IsWildcardMatch("Image_001.jpg", "$+.jpg")
Debug IsWildcardMatch("Example 03\File_003a.txt", "$+\\$+.txt")
Re: Wildcard Pattern Matching
Posted: Fri Aug 24, 2012 7:21 pm
by Shield
Thanks, good additions.

Re: Wildcard Pattern Matching
Posted: Fri Aug 24, 2012 7:47 pm
by STARGÅTE
@Danilo
changed '*' to eat everything to the end
why ?
Code: Select all
IsWildcardMatch("File.txt", "*.txt") ; should be return 1!
so he, you or some include backtracking.
Re: Wildcard Pattern Matching
Posted: Sat Aug 25, 2012 4:08 am
by Danilo
STARGÅTE wrote:@Danilo
changed '*' to eat everything to the end
why ?
Code: Select all
IsWildcardMatch("File.txt", "*.txt") ; should be return 1!
so he, you or some include backtracking.
OK, here a simple recursive version for wildcard *
Please see the comment in Case '*' -> you can change how * behaves:
* = at least 1 or more characters
* = 0 or more characters
Code: Select all
;
; Wildcard Pattern Matching
;
; by Shield, slightly modified by Danilo
;
; http://www.purebasic.fr/english/viewtopic.php?f=12&t=51048
;
;EnableExplicit
Procedure.i IsNum(char.c)
If char < '0' Or char > '9'
ProcedureReturn #False
EndIf
ProcedureReturn #True
EndProcedure
Procedure.i IsAlpha(char.c)
If (char >= 'A' And char <= 'Z') Or (char >= 'a' And char <= 'z') Or (char = '_') Or (char = ' ')
ProcedureReturn #True
EndIf
ProcedureReturn #False
EndProcedure
Procedure.i IsAlphaNum(char.c)
If IsNum(char) Or IsAlpha(char)
ProcedureReturn #True
EndIf
ProcedureReturn #False
EndProcedure
; Checks if the text matches the specified pattern.
; Supported Wildcards:
; -> * Match everything
; -> ? Match exactly one character.
; -> # Match a digit.
; -> @ Match an alphabetic character or '_'
; -> $ Match a digit or an alphabetic character or '_'
; -> #+ Match multiple digits.
; -> @+ Match multiple alphabetic characters or '_'
; -> $+ Match multiple digits or alphabetic characters or '_'
; -> \ Used to escape '*', '?', '#', '@', '$' and '\'.
; Returns '1' if the text matches the pattern, '0' if not.
; -1 will be returned if the pattern is invalid.
Procedure.i IsWildcardMatch(text.s, pattern.s)
Protected *text.Character
Protected *pattern.Character
Protected new_pattern.s
;Protected *match.Character
;Protected *current.Character
*text = @text
*pattern = @pattern
While *text\c <> #Null
Select *pattern\c
Case '\'
*pattern + SizeOf(Character)
Select *pattern\c
Case '*', '?', '#', '\', '@', '$'
If *pattern\c <> *text\c
ProcedureReturn #False
Else
*pattern + SizeOf(Character)
*text + SizeOf(Character)
EndIf
Default
ProcedureReturn -1
EndSelect
Case '*'
*pattern + SizeOf(Character)
If *pattern\c = #Null
ProcedureReturn #True
EndIf
; with the following line: IsWildcardMatch(".txt", "*.txt") = #False ( * = at least 1 or more character)
; without the following line: IsWildcardMatch(".txt", "*.txt") = #True ( * = 0 or more characters)
;*text + SizeOf(Character)
new_pattern = PeekS(*pattern)
While *text\c
If IsWildcardMatch( PeekS(*text), new_pattern )=#True
ProcedureReturn #True
EndIf
*text + SizeOf(Character)
Wend
ProcedureReturn #False
Case '@'
*pattern + SizeOf(Character)
If *pattern\c = '+'
*pattern + SizeOf(Character) ; multi alpha: @+
If IsAlpha(*text\c)
While IsAlpha(*text\c)
*text + SizeOf(Character)
Wend
Else
ProcedureReturn #False
EndIf
ElseIf IsAlpha(*text\c) ; single alpha: @
*text + SizeOf(Character)
Else
ProcedureReturn #False
EndIf
Case '#'
*pattern + SizeOf(Character)
If *pattern\c = '+'
*pattern + SizeOf(Character) ; multi number: #+
If IsNum(*text\c)
While IsNum(*text\c)
*text + SizeOf(Character)
Wend
Else
ProcedureReturn #False
EndIf
ElseIf IsNum(*text\c) ; single number: #
*text + SizeOf(Character)
Else
ProcedureReturn #False
EndIf
Case '$'
*pattern + SizeOf(Character)
If *pattern\c = '+'
*pattern + SizeOf(Character) ; multi AlphaNum: $+
If IsAlphaNum(*text\c)
While IsAlphaNum(*text\c)
*text + SizeOf(Character)
Wend
Else
ProcedureReturn #False
EndIf
ElseIf IsAlphaNum(*text\c) ; single AlphaNum: $
*text + SizeOf(Character)
Else
ProcedureReturn #False
EndIf
Case '?', *text\c
*text + SizeOf(Character)
*pattern + SizeOf(Character)
Default
;If *current = #Null
ProcedureReturn #False
;Else
; *pattern = *match
; *text = *current
; *current + SizeOf(Character)
;EndIf
EndSelect
Wend
While *pattern\c = '*'
*pattern + SizeOf(Character)
Wend
If *pattern\c = #Null
ProcedureReturn #True
EndIf
ProcedureReturn #False
EndProcedure
Debug "--[ Match ]-----------------------------------------------------------"
Debug IsWildcardMatch("343", "###")
Debug IsWildcardMatch("Match this Text!", "Match th?? Text!")
Debug IsWildcardMatch("Match that Text!", "Match th?? Text!")
Debug IsWildcardMatch("ERROR: 404", "ERROR: ###")
Debug IsWildcardMatch("Star: * ABC3", "Star: \* ABC#")
Debug IsWildcardMatch("BBB DDD", "B*")
Debug "--[ no Match ]--------------------------------------------------------"
Debug IsWildcardMatch("No Match!", "No Match\?")
Debug IsWildcardMatch("AAABBBCCC", "AAA???DDD")
Debug IsWildcardMatch("BBB DDD", "#")
Debug "--[ STARGÅTE ]--------------------------------------------------------"
Debug IsWildcardMatch("Image_001.jpg", "@+###.jpg")
Debug IsWildcardMatch("Example\File.txt", "@+\\@+.txt")
Debug IsWildcardMatch("Image_001.jpg", "$+.jpg")
Debug IsWildcardMatch("Example 03\File_003a.txt", "$+\\$+.txt")
Debug "--[ * Wildcard ]------------------------------------------------------"
Debug IsWildcardMatch("AAABBBCCC", "AAA*CCC")
Debug IsWildcardMatch("AAABBBCCC", "A*A*C*C")
Debug IsWildcardMatch("Image_001.jpg", "*")
Debug IsWildcardMatch("Example\File.txt", "*")
Debug IsWildcardMatch("Image_001.jpg", "*###.jpg")
Debug IsWildcardMatch("Example\File.txt", "*\\*.txt")
Debug IsWildcardMatch("File_01.txt", "*.txt")
Debug IsWildcardMatch(".txt", "*.txt")
Is it OK?