Page 1 of 1

Wildcard Pattern Matching

Posted: Thu Aug 23, 2012 11:24 pm
by Shield
Nothing too exciting or new, just a small pattern matching algorithm. :wink:
Maybe someone can make use of it where the regular expressions are not required.

In addition to most other simple wildcard functions, this one accepts
'#' as a digit placeholder and '\' as the escape character.

Code: Select all

; Checks if the text matches the specified pattern.
; Supported Wildcards:
; -> *	Match everything.
; -> ?	Match exactly one character.
; -> #	Match a digit.
; -> \	Used to escape '*', '?', '#' and '\'.
; Returns '1' if the text matches the pattern, '0' if not.
; -1 will be returned if the pattern is invalid.
Procedure.i IsWildcardMatch(text.s, pattern.s)
	Protected *text.Character
	Protected *pattern.Character
	Protected *match.Character
	Protected *current.Character
	
	
	*text = @text
	*pattern = @pattern
	While *text\c <> #Null
		Select *pattern\c
			Case '\'
				*pattern + SizeOf(Character)
				Select *pattern\c
					Case '*', '?', '#', '\'				
						If *pattern\c <> *text\c
							ProcedureReturn #False
						Else
							*pattern + SizeOf(Character)
							*text + SizeOf(Character)
						EndIf
					Default
						ProcedureReturn -1
				EndSelect
				
			Case '*'
				*pattern + SizeOf(Character)
				If *pattern\c = #Null 
					ProcedureReturn #True
				EndIf
				
				*match = *pattern
				*current = *text + SizeOf(Character)
				
			Case '#'
				If *text\c < '0' Or *text\c > '9'
					ProcedureReturn #False
				Else
					*text + SizeOf(Character)
					*pattern + SizeOf(Character)
				EndIf
				
			Case '?', *text\c
				*text + SizeOf(Character)
				*pattern + SizeOf(Character)
				
			Default
				If *current = #Null
					ProcedureReturn #False
				Else
					*pattern = *match
					*text = *current
					*current + SizeOf(Character)
				EndIf
		EndSelect
	Wend
	
	
	While *pattern\c = '*'
		*pattern + SizeOf(Character)
	Wend
	
	If *pattern\c = #Null
		ProcedureReturn #True
	EndIf
	
	ProcedureReturn #False
EndProcedure


Debug IsWildcardMatch("343", "###")
Debug IsWildcardMatch("Match this Text!", "Match th?? Text!")
Debug IsWildcardMatch("Match that Text!", "Match th?? Text!")
Debug IsWildcardMatch("ERROR: 404", "ERROR: ###")
Debug IsWildcardMatch("Star: * ABC3", "Star: \* ABC#")
Debug IsWildcardMatch("BBB DDD", "B*")
Debug "----------------------------------------------------------------------"
Debug IsWildcardMatch("No Match!", "No Match\?")
Debug IsWildcardMatch("AAABBBCCC", "AAA???DDD")
Debug IsWildcardMatch("BBB DDD", "#")

Re: Wildcard Pattern Matching

Posted: Thu Aug 23, 2012 11:26 pm
by IdeasVacuum
Now that is nice Shield, thanks for sharing it. 8)
I really struggle to get my head around regex.

Re: Wildcard Pattern Matching

Posted: Thu Aug 23, 2012 11:53 pm
by luis
Very nice and very easy to modify/extend (I added '@' to match alpha chars), good job!
Thanks!

Re: Wildcard Pattern Matching

Posted: Thu Aug 23, 2012 11:55 pm
by idle
nice tip.

Re: Wildcard Pattern Matching

Posted: Fri Aug 24, 2012 12:16 am
by STARGÅTE
here are two bugs

Code: Select all

Debug IsWildcardMatch("Image_001.jpg", "*###.jpg")
Debug IsWildcardMatch("Example\File.txt", "*\*.txt")
I think you have to create a list for *match and *current, because once "back" is not always enough.
I had already written this too, I know that problem ^^

Re: Wildcard Pattern Matching

Posted: Fri Aug 24, 2012 1:06 am
by Shield
Interesting find. :)
However I don't have the time to fix that at the moment,
maybe somebody else wants to take a look into solving that problem. :)

Re: Wildcard Pattern Matching

Posted: Fri Aug 24, 2012 3:27 pm
by jassing
W/o doing a ton of testing; this looks to do what you want:

Code: Select all

; Checks if the text matches the specified pattern.
; Supported Wildcards:
; -> *   Match everything.
; -> ?   Match exactly one character.
; -> #   Match a digit.
; -> \   Used to escape '*', '?', '#' and '\'.
; Returns '1' if the text matches the pattern, '0' if not.
; -1 will be returned if the pattern is invalid.
#Error = -1
Procedure IsWildcardMatch(text.s, pattern.s)
	Protected bMatch = #False	
	Protected r
	
	; Handle any real periods
	pattern = ReplaceString(pattern,".","[.]")
	
	; convert '*' to regex
	r=CreateRegularExpression(#PB_Any,"(?<!\\)[*]")
	pattern = ReplaceRegularExpression(r,pattern,".*")
	FreeRegularExpression(r)
	
	; convet ? t regex
	r=CreateRegularExpression(#PB_Any,"(?<!\\)[?]")
	pattern = ReplaceRegularExpression(r,pattern,".") ; {1} not needed..
	FreeRegularExpression(r)
	
	; Convert # to regex
	r=CreateRegularExpression(#PB_Any,"(?<!\\)#")
	pattern = ReplaceRegularExpression(r,pattern,"[0-9]")
	FreeRegularExpression(r)
	
	If CreateRegularExpression(0,pattern)
		If MatchRegularExpression(0,text)
			bMatch = #True
		EndIf
	Else
		bMatch = #Error
	EndIf
	ProcedureReturn bMatch
EndProcedure


Debug IsWildcardMatch("343", "###")
Debug IsWildcardMatch("Match this Text!", "Match th?? Text!")
Debug IsWildcardMatch("Match that Text!", "Match th?? Text!")
Debug IsWildcardMatch("ERROR: 404", "ERROR: ###")
Debug IsWildcardMatch("Star: * ABC3", "Star: \* ABC#")
Debug IsWildcardMatch("BBB DDD", "B*")
Debug "----------------------------------------------------------------------"
Debug IsWildcardMatch("No Match!", "No Match\?")
Debug IsWildcardMatch("AAABBBCCC", "AAA???DDD")
Debug IsWildcardMatch("BBB DDD", "#")

Debug "----------------------------------------------------------------------"

Debug "Should match:"
Debug IsWildcardMatch("Image_001.jpg", "*###.jpg")

Debug "Should Fail (escaped \* to be literal '*')"
Debug IsWildcardMatch("Example\File.txt", "*\*.txt")

Debug "Should match, using escaped '*'"
Debug IsWildcardMatch("Example\*.txt", "*\*.txt")


Debug "Should match (Properly escaped backslash)"
Debug IsWildcardMatch("Example\File.txt", "*\\*.txt")

Re: Wildcard Pattern Matching

Posted: Fri Aug 24, 2012 3:28 pm
by Danilo
Added '@' for alphabetic characters and '_' and Space
Added '$' for AlphaNumeric -> digits and alphabetic characters and '_' and Space

'#+' = multiple digits
'@+' = multiple alphabetic characters and '_' and Spaces
'$+' = multiple digits and alphabetic characters and '_' and Spaces

changed '*' to eat everything to the end. '?' is used to match any one character anyway.

Code: Select all

;
; Wildcard Pattern Matching
;
; by Shield, slightly modified by Danilo
;
; http://www.purebasic.fr/english/viewtopic.php?f=12&t=51048
;
Procedure.i IsNum(char.c)
    If char < '0' Or char > '9'
        ProcedureReturn #False
    EndIf
    ProcedureReturn #True
EndProcedure

Procedure.i IsAlpha(char.c)
    If (char >= 'A' And char <= 'Z') Or (char >= 'a' And char <= 'z') Or (char = '_') Or (char = ' ')
        ProcedureReturn #True
    EndIf
    ProcedureReturn #False
EndProcedure

Procedure.i IsAlphaNum(char.c)
    If IsNum(char) Or IsAlpha(char)
        ProcedureReturn #True
    EndIf
    ProcedureReturn #False
EndProcedure


; Checks if the text matches the specified pattern.
; Supported Wildcards:
; -> *   Match everything to the end
; -> ?   Match exactly one character.
; -> #   Match a digit.
; -> @   Match an alphabetic character or '_'
; -> $   Match a digit or an alphabetic character or '_'
; -> #+  Match multiple digits.
; -> @+  Match multiple alphabetic characters or '_'
; -> $+  Match multiple digits or alphabetic characters or '_'
; -> \   Used to escape '*', '?', '#', '@', '$' and '\'.
; Returns '1' if the text matches the pattern, '0' if not.
; -1 will be returned if the pattern is invalid.
Procedure.i IsWildcardMatch(text.s, pattern.s)
   Protected *text.Character
   Protected *pattern.Character
   Protected *match.Character
   Protected *current.Character
   
   
   *text = @text
   *pattern = @pattern
   While *text\c <> #Null
      Select *pattern\c
         Case '\'
            *pattern + SizeOf(Character)
            Select *pattern\c
               Case '*', '?', '#', '\', '@', '$'  
                  If *pattern\c <> *text\c
                     ProcedureReturn #False
                  Else
                     *pattern + SizeOf(Character)
                     *text + SizeOf(Character)
                  EndIf
               Default
                  ProcedureReturn -1
            EndSelect
            
         Case '*'
            *pattern + SizeOf(Character)
            If *pattern\c = #Null 
               ProcedureReturn #True
            EndIf
            
            While *text\c
                *text + SizeOf(Character)
            Wend

         Case '@'
            *pattern + SizeOf(Character)
            If *pattern\c = '+'
                *pattern + SizeOf(Character)   ; multi  alpha: @+
                If IsAlpha(*text\c)
                   While IsAlpha(*text\c)
                       *text + SizeOf(Character)
                   Wend
                Else
                   ProcedureReturn #False
                EndIf
            ElseIf IsAlpha(*text\c)            ; single alpha: @
               *text + SizeOf(Character)
            Else
               ProcedureReturn #False
            EndIf
            
         Case '#'
            *pattern + SizeOf(Character)
            If *pattern\c = '+'
                *pattern + SizeOf(Character)  ; multi  number: #+
                If IsNum(*text\c)
                   While IsNum(*text\c)
                       *text + SizeOf(Character)
                   Wend
                Else
                   ProcedureReturn #False
                EndIf
            ElseIf IsNum(*text\c)             ; single number: #
               *text + SizeOf(Character)
            Else
               ProcedureReturn #False
            EndIf

         Case '$'
            *pattern + SizeOf(Character)
            If *pattern\c = '+'
                *pattern + SizeOf(Character)  ; multi  AlphaNum: $+
                If IsAlphaNum(*text\c)
                   While IsAlphaNum(*text\c)
                       *text + SizeOf(Character)
                   Wend
                Else
                   ProcedureReturn #False
                EndIf
            ElseIf IsAlphaNum(*text\c)        ; single AlphaNum: $
               *text + SizeOf(Character)
            Else
               ProcedureReturn #False
            EndIf
            
         Case '?', *text\c
            *text + SizeOf(Character)
            *pattern + SizeOf(Character)
            
         Default
            If *current = #Null
               ProcedureReturn #False
            Else
               *pattern = *match
               *text = *current
               *current + SizeOf(Character)
            EndIf
      EndSelect
   Wend
   
   
   While *pattern\c = '*'
      *pattern + SizeOf(Character)
   Wend
   
   If *pattern\c = #Null
      ProcedureReturn #True
   EndIf
   
   ProcedureReturn #False
EndProcedure


Debug IsWildcardMatch("343", "###")
Debug IsWildcardMatch("Match this Text!", "Match th?? Text!")
Debug IsWildcardMatch("Match that Text!", "Match th?? Text!")
Debug IsWildcardMatch("ERROR: 404", "ERROR: ###")
Debug IsWildcardMatch("Star: * ABC3", "Star: \* ABC#")
Debug IsWildcardMatch("BBB DDD", "B*")
Debug IsWildcardMatch("Image_001.jpg", "*")
Debug IsWildcardMatch("Example\File.txt", "*")

Debug "----------------------------------------------------------------------"

Debug IsWildcardMatch("No Match!", "No Match\?")
Debug IsWildcardMatch("AAABBBCCC", "AAA???DDD")
Debug IsWildcardMatch("BBB DDD", "#")

Debug IsWildcardMatch("Image_001.jpg", "*###.jpg")
Debug IsWildcardMatch("Example\File.txt", "*\*.txt")

Debug "--[ STARGÅTE ]--------------------------------------------------------"

Debug IsWildcardMatch("Image_001.jpg", "@+###.jpg")
Debug IsWildcardMatch("Example\File.txt", "@+\\@+.txt")

Debug IsWildcardMatch("Image_001.jpg", "$+.jpg")
Debug IsWildcardMatch("Example 03\File_003a.txt", "$+\\$+.txt")

Re: Wildcard Pattern Matching

Posted: Fri Aug 24, 2012 7:21 pm
by Shield
Thanks, good additions. :)

Re: Wildcard Pattern Matching

Posted: Fri Aug 24, 2012 7:47 pm
by STARGÅTE
@Danilo
changed '*' to eat everything to the end
why ?

Code: Select all

IsWildcardMatch("File.txt", "*.txt") ; should be return 1!
so he, you or some include backtracking.

Re: Wildcard Pattern Matching

Posted: Sat Aug 25, 2012 4:08 am
by Danilo
STARGÅTE wrote:@Danilo
changed '*' to eat everything to the end
why ?

Code: Select all

IsWildcardMatch("File.txt", "*.txt") ; should be return 1!
so he, you or some include backtracking.
OK, here a simple recursive version for wildcard *

Please see the comment in Case '*' -> you can change how * behaves:
* = at least 1 or more characters
* = 0 or more characters

Code: Select all

;
; Wildcard Pattern Matching
;
; by Shield, slightly modified by Danilo
;
; http://www.purebasic.fr/english/viewtopic.php?f=12&t=51048
;
;EnableExplicit

Procedure.i IsNum(char.c)
    If char < '0' Or char > '9'
        ProcedureReturn #False
    EndIf
    ProcedureReturn #True
EndProcedure

Procedure.i IsAlpha(char.c)
    If (char >= 'A' And char <= 'Z') Or (char >= 'a' And char <= 'z') Or (char = '_') Or (char = ' ')
        ProcedureReturn #True
    EndIf
    ProcedureReturn #False
EndProcedure

Procedure.i IsAlphaNum(char.c)
    If IsNum(char) Or IsAlpha(char)
        ProcedureReturn #True
    EndIf
    ProcedureReturn #False
EndProcedure


; Checks if the text matches the specified pattern.
; Supported Wildcards:
; -> *   Match everything
; -> ?   Match exactly one character.
; -> #   Match a digit.
; -> @   Match an alphabetic character or '_'
; -> $   Match a digit or an alphabetic character or '_'
; -> #+  Match multiple digits.
; -> @+  Match multiple alphabetic characters or '_'
; -> $+  Match multiple digits or alphabetic characters or '_'
; -> \   Used to escape '*', '?', '#', '@', '$' and '\'.
; Returns '1' if the text matches the pattern, '0' if not.
; -1 will be returned if the pattern is invalid.
Procedure.i IsWildcardMatch(text.s, pattern.s)
   Protected *text.Character
   Protected *pattern.Character
   Protected new_pattern.s
   ;Protected *match.Character
   ;Protected *current.Character
   
   
   *text = @text
   *pattern = @pattern
   While *text\c <> #Null
      Select *pattern\c
         Case '\'
            *pattern + SizeOf(Character)
            Select *pattern\c
               Case '*', '?', '#', '\', '@', '$'  
                  If *pattern\c <> *text\c
                     ProcedureReturn #False
                  Else
                     *pattern + SizeOf(Character)
                     *text + SizeOf(Character)
                  EndIf
               Default
                  ProcedureReturn -1
            EndSelect
            
         Case '*'
            *pattern + SizeOf(Character)
            If *pattern\c = #Null 
               ProcedureReturn #True
            EndIf
            
            ; with    the following line: IsWildcardMatch(".txt", "*.txt")  = #False ( * = at least 1 or more character)
            ; without the following line: IsWildcardMatch(".txt", "*.txt")  = #True  ( * = 0 or more characters)
            ;*text + SizeOf(Character)
            
            new_pattern = PeekS(*pattern)
            While *text\c
                If IsWildcardMatch( PeekS(*text), new_pattern )=#True
                    ProcedureReturn #True
                EndIf
                *text + SizeOf(Character)
            Wend
            ProcedureReturn #False

         Case '@'
            *pattern + SizeOf(Character)
            If *pattern\c = '+'
                *pattern + SizeOf(Character)   ; multi  alpha: @+
                If IsAlpha(*text\c)
                   While IsAlpha(*text\c)
                       *text + SizeOf(Character)
                   Wend
                Else
                   ProcedureReturn #False
                EndIf
            ElseIf IsAlpha(*text\c)            ; single alpha: @
               *text + SizeOf(Character)
            Else
               ProcedureReturn #False
            EndIf
            
         Case '#'
            *pattern + SizeOf(Character)
            If *pattern\c = '+'
                *pattern + SizeOf(Character)  ; multi  number: #+
                If IsNum(*text\c)
                   While IsNum(*text\c)
                       *text + SizeOf(Character)
                   Wend
                Else
                   ProcedureReturn #False
                EndIf
            ElseIf IsNum(*text\c)             ; single number: #
               *text + SizeOf(Character)
            Else
               ProcedureReturn #False
            EndIf

         Case '$'
            *pattern + SizeOf(Character)
            If *pattern\c = '+'
                *pattern + SizeOf(Character)  ; multi  AlphaNum: $+
                If IsAlphaNum(*text\c)
                   While IsAlphaNum(*text\c)
                       *text + SizeOf(Character)
                   Wend
                Else
                   ProcedureReturn #False
                EndIf
            ElseIf IsAlphaNum(*text\c)        ; single AlphaNum: $
               *text + SizeOf(Character)
            Else
               ProcedureReturn #False
            EndIf
            
         Case '?', *text\c
            *text + SizeOf(Character)
            *pattern + SizeOf(Character)
            
         Default
            ;If *current = #Null
               ProcedureReturn #False
            ;Else
            ;   *pattern = *match
            ;   *text = *current
            ;   *current + SizeOf(Character)
            ;EndIf
      EndSelect
   Wend
   
   
   While *pattern\c = '*'
      *pattern + SizeOf(Character)
   Wend
   
   If *pattern\c = #Null
      ProcedureReturn #True
   EndIf
   
   ProcedureReturn #False
EndProcedure


Debug "--[ Match ]-----------------------------------------------------------"

Debug IsWildcardMatch("343", "###")
Debug IsWildcardMatch("Match this Text!", "Match th?? Text!")
Debug IsWildcardMatch("Match that Text!", "Match th?? Text!")
Debug IsWildcardMatch("ERROR: 404", "ERROR: ###")
Debug IsWildcardMatch("Star: * ABC3", "Star: \* ABC#")
Debug IsWildcardMatch("BBB DDD", "B*")

Debug "--[ no Match ]--------------------------------------------------------"

Debug IsWildcardMatch("No Match!", "No Match\?")
Debug IsWildcardMatch("AAABBBCCC", "AAA???DDD")
Debug IsWildcardMatch("BBB DDD", "#")

Debug "--[ STARGÅTE ]--------------------------------------------------------"

Debug IsWildcardMatch("Image_001.jpg", "@+###.jpg")
Debug IsWildcardMatch("Example\File.txt", "@+\\@+.txt")

Debug IsWildcardMatch("Image_001.jpg", "$+.jpg")
Debug IsWildcardMatch("Example 03\File_003a.txt", "$+\\$+.txt")

Debug "--[ * Wildcard ]------------------------------------------------------"

Debug IsWildcardMatch("AAABBBCCC", "AAA*CCC")
Debug IsWildcardMatch("AAABBBCCC", "A*A*C*C")

Debug IsWildcardMatch("Image_001.jpg", "*")
Debug IsWildcardMatch("Example\File.txt", "*")

Debug IsWildcardMatch("Image_001.jpg", "*###.jpg")
Debug IsWildcardMatch("Example\File.txt", "*\\*.txt")

Debug IsWildcardMatch("File_01.txt", "*.txt")
Debug IsWildcardMatch(".txt", "*.txt")
Is it OK?