Page 1 of 1

RegularExpression

Posted: Mon Mar 06, 2023 10:35 am
by Splunk
Hi guys,

i am in the process of extending an older PB program. Since some time PB supports "RegularExpression". I think that these 2 routines can be replaced by this one. I have never dealt with "RegularExpression" before and the sample code in the PB reference is more than poor. I know there is ext.literature about it, but I don't have the inclination or time to go that deep into it. Can someone rewrite the code below for me in RegularExpression?

Code: Select all

Procedure ValidChar(STRING$, VALIDCHAR$)
	; Returns #True if no other Charakters were found in STRING$ that contain VALIDCHAR$

	For a = 1 To Len(STRING$)
		For b = 1 To Len(VALIDCHAR$)
			If Mid(STRING$,a,1) = Mid(VALIDCHAR$,b,1)
				Break	
			EndIf
		Next
		If Mid(STRING$,a,1) <> Mid(VALIDCHAR$,b,1)
			ProcedureReturn #False
		EndIf
	Next
	ProcedureReturn #True
EndProcedure

Procedure InValidChar(STRING$, INVALIDCHAR$)
	; Returns #True if STRING$ contains a character from INVALIDCHAR$

	For a = 1 To Len(STRING$)
		For b = 1 To Len(INVALIDCHAR$)
			If Mid(STRING$,a,1) = Mid(INVALIDCHAR$,b,1)
				ProcedureReturn #True	
			EndIf
		Next
	Next
	ProcedureReturn #False
EndProcedure
			
		
		
ValidChars$ =" ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789"
Debug ValidChar("test1", ValidChars$)
Debug ValidChar("test1_", ValidChars$)

InValidChars$ =";,:._-()/&%"
Debug InValidChar("test1", InValidChars$)
Debug InValidChar("test1_", InValidChars$)

Re: RegularExpression

Posted: Mon Mar 06, 2023 10:49 am
by STARGÅTE
Here is an equivalent code using regular expression.
However, this code is not optimized, because meta characters have to be replaced at the beginning and an regular expression has to be created each time the procedure is called.
An optimized version would be a "static" procedure for a fixed character set and the direct use of character classes like "\w" for alpha numeric characters.

For a documentation of regular expression itself you can read several pages in the internet, it is not part of the Pure basic documentation.

Code: Select all

Procedure ValidChar(String.s, ValidCharacters.s)
	
	Protected Regex.i
	
	; Mask meta characters inside character class
	ValidCharacters = ReplaceString(ValidCharacters, "\", "\\")
	ValidCharacters = ReplaceString(ValidCharacters, "]", "\]")
	ValidCharacters = ReplaceString(ValidCharacters, "-", "\-")
	ValidCharacters = ReplaceString(ValidCharacters, "^", "\^")
	
	Regex.i = CreateRegularExpression(#PB_Any, "^["+ValidCharacters+"]*$")
	If MatchRegularExpression(Regex, String)
		FreeRegularExpression(Regex)
		ProcedureReturn #True
	Else
		FreeRegularExpression(Regex)
		ProcedureReturn #False
	EndIf
	
EndProcedure


Procedure InvalidChar(String.s, InvalidCharacters.s)
	
	Protected Regex.i
	
	; Mask meta characters inside character class
	InvalidCharacters = ReplaceString(InvalidCharacters, "\", "\\")
	InvalidCharacters = ReplaceString(InvalidCharacters, "]", "\]")
	InvalidCharacters = ReplaceString(InvalidCharacters, "-", "\-")
	InvalidCharacters = ReplaceString(InvalidCharacters, "^", "\^")
	
	Regex.i = CreateRegularExpression(#PB_Any, "["+InvalidCharacters+"]")
	If MatchRegularExpression(Regex, String)
		FreeRegularExpression(Regex)
		ProcedureReturn #True
	Else
		FreeRegularExpression(Regex)
		ProcedureReturn #False
	EndIf
	
EndProcedure

ValidChars$ =" ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789"
Debug ValidChar("test1", ValidChars$)
Debug ValidChar("test1_", ValidChars$)

InValidChars$ =";,:._-()/&%"
Debug InValidChar("test1", InValidChars$)
Debug InValidChar("test1_", InValidChars$)

Re: RegularExpression

Posted: Mon Mar 06, 2023 11:38 am
by Splunk
Thanks @Stargate:
At first glance, the replacement code looks very elaborate. But you are of course right that it can be optimized considerably.
STARGÅTE wrote: Mon Mar 06, 2023 10:49 amFor a documentation of regular expression itself you can read several pages in the internet...
This is exactly what I wanted to prevent :mrgreen:

Re: RegularExpression

Posted: Mon Mar 06, 2023 10:14 pm
by AZJIO
The use of regular expressions leads to the embedding of the regular expression engine into the program, respectively, increases the size of the executable file by 150-200kb. If it is possible not to use regular expressions even once, then it is better to avoid them using string functions.

I've optimized it a bit, as the Mid() function will be slow in this case.

Code: Select all

EnableExplicit
DisableDebugger

Procedure InValidChar(*c.Character, *jc.Character)
    Protected *jc0

    If Not *jc\c
        ProcedureReturn #False
    EndIf

    *jc0 = *jc

    While *c\c
        *jc = *jc0

        While *jc\c
            If *c\c = *jc\c
                ProcedureReturn #True
            EndIf
            *jc + SizeOf(Character)
        Wend
        *c + SizeOf(Character)
    Wend

    ProcedureReturn #False
EndProcedure

Procedure ValidChar(*c.Character, *jc.Character)
    Protected *jc0, True

    If Not *jc\c
        ProcedureReturn #True
    EndIf

    *jc0 = *jc

    While *c\c
    	*jc = *jc0
    	True = 1

        While *jc\c
            If *c\c = *jc\c
    			True = 0
    			Break
            EndIf
            *jc + SizeOf(Character)
        Wend
	    If True
; 	    If *c\c <> *jc\c
	    	ProcedureReturn #False
	    EndIf
        *c + SizeOf(Character)
    Wend

    ProcedureReturn #True
EndProcedure





Define ValidChars$=" ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789"
Define InValidChars$ =";,:._-()/&%"


Define t, i, c = 100000
t = ElapsedMilliseconds()
For i=0 To c
	ValidChar(@"test1", @ValidChars$)
	ValidChar(@"test1_", @ValidChars$)
Next
EnableDebugger 
Debug Str(ElapsedMilliseconds()-t)


DisableDebugger
t = ElapsedMilliseconds()
For i=0 To c
	InValidChar(@"test1", @InValidChars$)
	InValidChar(@"test1_", @InValidChars$)
Next
EnableDebugger 
Debug Str(ElapsedMilliseconds()-t)


Debug ValidChar(@"test1", @ValidChars$)
Debug ValidChar(@"test1_", @ValidChars$)

Debug InValidChar(@"test1", @InValidChars$)
Debug InValidChar(@"test1_", @InValidChars$)

Re: RegularExpression

Posted: Tue Mar 07, 2023 7:49 pm
by Splunk
Thanks, @AZJIO...very interesting and certainly much faster than using Mid().

However, I avoid (unless absolutely necessary) this pointer stuff because a well readable code is more important to me. After some time, when you change the code, you have to think too much about what you did in the first place. I don't have that time most of the time.

Re: RegularExpression

Posted: Wed Mar 08, 2023 12:04 am
by BarryG
Yeah, sometimes PureBasic is not very BASIC at all. It's more like an easier version of C to me (that's how I think of it).

Re: RegularExpression

Posted: Wed Mar 08, 2023 1:09 am
by AZJIO
Splunk
People are afraid of what they don't understand. In fact, this code is light and readable, it's just that you've never used this method. I could offer you another option better than yours, but worse than what I proposed. To start with, the above code just lists the letters in a loop, I just do it sequentially one after another, but your strategy is to run to the store for a character each time.

This method turns the string into an array, then does the same as you, but iterating over the letters in the array.

Code: Select all

EnableExplicit

Procedure StrToArrLetter(Array Arr.s{1}(1), String$)
	Protected LenStr = Len(String$)
	If LenStr
		ReDim Arr(LenStr - 1)
		PokeS(Arr(), String$, -1, #PB_String_NoZero)
	EndIf
; 	ProcedureReturn LenStr
EndProcedure

EnableExplicit

Procedure InValidChar(String$, Validchar$)
	Protected i, j
	Protected Dim aString.s{1}(0)
	Protected Dim aValidchar.s{1}(0)

	If Not Asc(String$) Or Not Asc(Validchar$)
		ProcedureReturn #False
	EndIf

	StrToArrLetter(aString(), String$)
	StrToArrLetter(aValidchar(), Validchar$)


	For i = 0 To ArraySize(aString())
		For j = 0 To ArraySize(aValidchar())
			If aString(i) = aValidchar(j)
				ProcedureReturn #True
			EndIf
		Next
	Next

	ProcedureReturn #False
EndProcedure

Procedure ValidChar(String$, InValidChars$)
	Protected i, j, aSizeIVC
	Protected Dim aString.s{1}(0)
	Protected Dim aInValidChars.s{1}(0)

	If Not Asc(String$) Or Not Asc(InValidChars$)
		ProcedureReturn #True
	EndIf

	StrToArrLetter(aString(), String$)
	StrToArrLetter(aInValidChars(), InValidChars$)
	aSizeIVC = ArraySize(aInValidChars())

	For i = 0 To ArraySize(aString())
		For j = 0 To aSizeIVC
			If aString(i) = aInValidChars(j)
				Break
			EndIf
		Next
	    If j - 1 = aSizeIVC
			ProcedureReturn #False
		EndIf
	Next

	ProcedureReturn #True
EndProcedure


Define ValidChars$ = " ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789"
Debug ValidChar("test1", ValidChars$)
Debug ValidChar("test1_", ValidChars$)

Define InValidChars$ = ";,:._-()/&%"
Debug InValidChar("test1", InValidChars$)
Debug InValidChar("test1_", InValidChars$)

Re: RegularExpression

Posted: Wed Mar 08, 2023 1:29 am
by BarryG
Short and sweet (no need to loop through the valid char string):

Code: Select all

Procedure MatchChars(string$,valid$,allow)
  ok=1
  s=Len(string$)
  For c=1 To s
    f=FindString(valid$,Mid(string$,c,1))
    If allow=1
      If f=0 : ok=0 : Break : EndIf
    Else
      If f<>0 : ok=0 : Break : EndIf
    EndIf
  Next
  If allow=0
    ok=1-ok
  EndIf
  ProcedureReturn ok
EndProcedure

ValidChars$=" ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789"

; Returns 1 if no other characters were found that contain VALIDCHAR$
Debug MatchChars("test1", ValidChars$, 1) ; 1
Debug MatchChars("test1_", ValidChars$, 1) ; 0

; Returns 1 if contains a character from INVALIDCHAR$
InValidChars$ =";,:._-()/&%"
Debug MatchChars("test1", InValidChars$, 0) ; 0
Debug MatchChars("test1_", InValidChars$, 0) ; 1

Re: RegularExpression

Posted: Wed Mar 08, 2023 2:55 am
by AZJIO
BarryG
From yesterday's tests, I found out that FindString() is 13 times faster than "While *c\c"
I made it 5% faster since the Mid() function is slower than "While *c\c"

Code: Select all

EnableExplicit
DisableDebugger

Procedure ValidChar(*c.Character, Validchar$)
    If Not Asc(Validchar$)
        ProcedureReturn #True
    EndIf

    While *c\c
        If Not FindString(Validchar$, Chr(*c\c))
			ProcedureReturn #False
        EndIf
        *c + SizeOf(Character)
    Wend

    ProcedureReturn #True
EndProcedure

Procedure InValidChar(*c.Character, InValidChars$)
    If Not Asc(InValidChars$)
        ProcedureReturn #False
    EndIf

    While *c\c
        If FindString(InValidChars$, Chr(*c\c))
        	ProcedureReturn #True
        EndIf
        *c + SizeOf(Character)
    Wend

    ProcedureReturn #False
EndProcedure


Define ValidChars$=" ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789"
Define InValidChars$ =";,:._-()/&%"


Define t, i, c = 100000
t = ElapsedMilliseconds()
For i=0 To c
	ValidChar(@"test1", ValidChars$)
	ValidChar(@"test1_", ValidChars$)
Next
EnableDebugger 
Debug Str(ElapsedMilliseconds()-t)


DisableDebugger
t = ElapsedMilliseconds()
For i=0 To c
	InValidChar(@"test1", InValidChars$)
	InValidChar(@"test1_", InValidChars$)
Next
EnableDebugger 
Debug Str(ElapsedMilliseconds()-t)


Debug ValidChar(@"test1", ValidChars$)
Debug ValidChar(@"test1_", ValidChars$)

Define InValidChars$ = ";,:._-()/&%"
Debug InValidChar(@"test1", InValidChars$)
Debug InValidChar(@"test1_", InValidChars$)
Two in one

Code: Select all

EnableExplicit
DisableDebugger

Procedure ValidChar(*c.Character, Char$, Allow)
    If Not Asc(Char$)
        ProcedureReturn Allow
    EndIf
    
	While *c\c
		If Bool(FindString(Char$, Chr(*c\c))) <> Allow
			ProcedureReturn Bool(Not Allow)
		EndIf
		*c + SizeOf(Character)
	Wend

    ProcedureReturn Allow
EndProcedure


Define ValidChars$=" ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789"
Define InValidChars$ =";,:._-()/&%"


Define t, i, c = 100000
t = ElapsedMilliseconds()
For i=0 To c
	ValidChar(@"test1", ValidChars$, 1)
	ValidChar(@"test1_", ValidChars$, 1)
Next
EnableDebugger 
Debug Str(ElapsedMilliseconds()-t)


DisableDebugger
t = ElapsedMilliseconds()
For i=0 To c
	ValidChar(@"test1", InValidChars$, 0)
	ValidChar(@"test1_", InValidChars$, 0)
Next
EnableDebugger 
Debug Str(ElapsedMilliseconds()-t)


Debug ValidChar(@"test1", ValidChars$, 1)
Debug ValidChar(@"test1_", ValidChars$, 1)

Debug ValidChar(@"test1", InValidChars$, 0)
Debug ValidChar(@"test1_", InValidChars$, 0)
Mid(), improvement of the code created by the author BarryG

Code: Select all

EnableExplicit

Procedure MatchChars(String$, Char$, Allow)
	Protected Length, i

	If Not Asc(Char$)
		ProcedureReturn Allow
	EndIf

	Length = Len(String$)
	For i = 1 To Length
		If Bool(FindString(Char$, Mid(String$, i, 1))) <> Allow
			ProcedureReturn Bool(Not Allow)
		EndIf
	Next

	ProcedureReturn Allow
EndProcedure

Define ValidChars$ = " ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789"
Define InValidChars$ = ";,:._-()/&%"

Debug MatchChars("test1", ValidChars$, 1)
Debug MatchChars("test1_", ValidChars$, 1)

Debug MatchChars("test1", InValidChars$, 0)
Debug MatchChars("test1_", InValidChars$, 0)
Speed ​​measurement results
(For + Len + Mid) * 2 (Splunk)
4399
500

RegularExpression (STARGÅTE)
859
438

While+While (AZJIO) fixed for variable pointers
143
30

StrToArrLetter (AZJIO)
1652
523

FindString + Mid (BarryG)
133
79

While+FindString (AZJIO) fixed for variable pointers
109
52

Re: RegularExpression

Posted: Wed Mar 08, 2023 9:28 am
by idle
if you intend to use it at run time many times you'd be better off using a look up table
and always pass in strings by address when you can

Code: Select all

Global Dim arValidchars.u($ffff)

Procedure InitValidChars(*ValidChars)
  Protected *in.unicode 
  *in = *ValidChars 
  While *in\u 
    arValidchars(*in\u) = *in\u
    *in+2 
  Wend   
EndProcedure 

Procedure ValidChar(*STRING)
	; Returns #True if no other Charakters were found in STRING$ that contain VALIDCHAR$
  Protected *in.Unicode
  *in = *STRING 
  
  While *in\u  
    If arValidchars(*in\u) <> *in\u  
      ProcedureReturn #False 
    EndIf 
    *in+2 
  Wend   
    
	ProcedureReturn #True
EndProcedure
		
ValidChars$ =" ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789"
InitValidChars(@ValidChars$) 
Debug ValidChar(@"test1")
Debug ValidChar(@"test1_")