Page 1 of 1

Regex and $1 parameter

Posted: Sun Nov 28, 2021 8:47 am
by BarryG
Hi, back again with another regular expression question. I was interested in converting camel case to title case, and found the following example at StackOverflow. It has 442 upvotes, so it must be correct, hehe. But I can't make it work with PureBasic (I've only tried putting spaces before each capital in my code below). Please help. Thanks.

https://stackoverflow.com/a/4149393/7908170

Image

Code: Select all

text$="thisStringIsGood"

r=CreateRegularExpression(#PB_Any,"/([A-Z])/g")
If r
  If MatchRegularExpression(r,text$)
    text$=ReplaceRegularExpression(r,text$," $1")
  EndIf
  FreeRegularExpression(r)
  Debug text$ ; Want "This String Is Good"
EndIf

Re: Regex and $1 parameter

Posted: Sun Nov 28, 2021 9:34 am
by Marc56us
ReplaceRegularExpression()
...
Remarks

Back references (usually described as \1, \2, etc.) are not supported. ExtractRegularExpression() combined with ReplaceString() should achieve the requested behaviour.
:wink:

Re: Regex and $1 parameter

Posted: Sun Nov 28, 2021 9:41 am
by #NULL
I didn't see the doc talking about extractreg.. as Marc56us posted, but this seem to work:

Code: Select all

s.s = "thisStringIsGood"

If CreateRegularExpression(0, "([A-Z])")
  If ExamineRegularExpression(0, s)
    While NextRegularExpressionMatch(0)
      s = ReplaceString(s,
                    RegularExpressionGroup(0, 1),
                    " " + RegularExpressionGroup(0, 1),
                    #PB_String_CaseSensitive,
                    RegularExpressionGroupPosition(0, 1),
                    1)
    Wend
  EndIf
Else
  Debug RegularExpressionError()
EndIf
Debug s

If CreateRegularExpression(0, "(^.)")
  If ExamineRegularExpression(0, s)
    While NextRegularExpressionMatch(0)
      s = ReplaceString(s,
                    RegularExpressionGroup(0, 1),
                    UCase(RegularExpressionGroup(0, 1)),
                    #PB_String_CaseSensitive,
                    RegularExpressionGroupPosition(0, 1),
                    1)
    Wend
  EndIf
Else
  Debug RegularExpressionError()
EndIf
Debug s

Re: Regex and $1 parameter

Posted: Sun Nov 28, 2021 9:46 am
by BarryG
Oh crap, so "$1" is what PureBasic doesn't support? Darn. My app was to offer regex for its users to specify the regex text that they need, but obviously they can't now. So I'll have to not offer that feature, which is a real shame. I can't use entire replacements like #NULL's example for the reasons I just explained. This is very disappointing. Unless there's some other unofficial way to support regex fully and ignore PureBasic's version?

Re: Regex and $1 parameter

Posted: Sun Nov 28, 2021 9:46 am
by Marc56us
A quick and dirty solution without regex

Code: Select all

text$="thisStringIsGood"

For i = 1 To Len(text$)
  Char$ = Mid(text$, i, 1)
  If Char$ = UCase(Char$)
    Char$ = " " + Char$
  EndIf
  Full$ + Char$
Next

Debug Full$
(need to add a line for first char)
:wink:

Re: Regex and $1 parameter

Posted: Sun Nov 28, 2021 9:48 am
by BarryG
No good Marc56us - see my post above yours for why. PureBasic doesn't support drop-in regex statements obtained from the web, so it can't be used.

Re: [Ignore] Regex and $1 parameter

Posted: Sun Nov 28, 2021 9:56 am
by Marc56us
Unless there's some other unofficial way to support regex fully and ignore PureBasic's version?
Use RunProgram() and call external tool like SED or AWK (yes these unix tools exists for Windows too)
:wink:

Re: Regex and $1 parameter

Posted: Sun Nov 28, 2021 1:12 pm
by BarryG
Hi #NULL, your code works great for your example text ("thisStringIsGood") but if I change it to something just slightly different ("thisIsCamelCase") then it fails (has extra spaces, and "CamelCase" doesn't separate like "IsGood" does). I can't work out why that would be. Any ideas?

Here's what I'm testing with:

Code: Select all

new$=" "
text$="thisStringIsGood" ; Works.
text$="thisIsCamelCase" ; Fails.
r=CreateRegularExpression(#PB_Any,"([A-Z])")
If r
  If ExamineRegularExpression(r,text$)
    While NextRegularExpressionMatch(r)
      text$=ReplaceString(text$,RegularExpressionGroup(r,1),new$+RegularExpressionGroup(r,1),#PB_String_CaseSensitive,RegularExpressionGroupPosition(r,1),1)
    Wend
  EndIf
  FreeRegularExpression(r)
EndIf
Debug text$

Re: Regex and $1 parameter

Posted: Sun Nov 28, 2021 1:26 pm
by #NULL
The StartPosition Parameter for ReplaceString needs to be changed from RegularExpressionGroupPosition(r,1) to RegularExpressionMatchPosition(r). Group position isn't correct there so the first C gets replaced twice. :)

Re: Regex and $1 parameter

Posted: Sun Nov 28, 2021 1:41 pm
by BarryG
You da man! Haha. That works, but it's bedtime now so I'll test more extensively tomorrow. Thanks!

Re: Regex and $1 parameter

Posted: Sun Nov 28, 2021 4:04 pm
by AZJIO

Re: Regex and $1 parameter

Posted: Mon Nov 29, 2021 10:41 am
by BarryG
AZJIO, I took a look but can't see how that helps with my question here? It outputs the original string. Granted, I'm not great with regex's so I'm probably doing something wrong. The aim is to make the regex work exactly the same way as the StackOverflow version at the start of this thread, since that's what my users will be providing.

Here's your code and what I tried:

Code: Select all

#RegExp = 0

Procedure.s RegexReplace2(RgEx, *Result.string, Replace0$)
  Protected i, CountGr, Pos, Offset = 1
  Protected Result$, Replace$
  Protected NewList item.s()
  Protected LenT, *Point
  CountGr = CountRegularExpressionGroups(RgEx)
  If CountGr > 9
    CountGr = 9
  EndIf
  If ExamineRegularExpression(RgEx, *Result\s)
    While NextRegularExpressionMatch(RgEx)
      Pos = RegularExpressionMatchPosition(RgEx)
      Replace$ = ReplaceString(Replace0$,"\0", RegularExpressionMatchString(RgEx))
      For i = 1 To CountGr
        Replace$ = ReplaceString(Replace$, "\"+Str(i), RegularExpressionGroup(RgEx, i))
      Next
      If AddElement(item())
        item() = Mid(*Result\s, Offset, Pos - Offset) + Replace$
      EndIf
      Offset = Pos + RegularExpressionMatchLength(RgEx)
    Wend
    If AddElement(item())
      item() = Mid(*Result\s, Offset)
    EndIf
    LenT = 0
    ForEach item()
      LenT + Len(item())
    Next
    *Result\s = Space(LenT)
    *Point = @*Result\s
    ForEach item()
      CopyMemoryString(item(), @*Point)
    Next
    FreeList(item())
  EndIf
EndProcedure


#RegExp = 0
Define Text.string

Text\s = "thisStringIsGood"
CreateRegularExpression(#RegExp , "/([A-Z])/g" )
RegexReplace2(#RegExp, @Text, " \1" )
FreeRegularExpression(#RegExp)
Debug Text\s ; thisStringIsGood

Re: Regex and $1 parameter

Posted: Mon Nov 29, 2021 2:17 pm
by AZJIO
I'm busy right now, but a hint that the problem of groups is being solved here.
2. You need to remove the character at the beginning "/" and the character at the end "/[gim]+", but not just remove, but use these flags to enable the appropriate mode.

Re: Regex and $1 parameter

Posted: Mon Nov 29, 2021 2:53 pm
by Marc56us
(just for the fun following my suggestion https://www.purebasic.fr/english/viewto ... 74#p577574)

Quick and dirty code taking the user input and transmitting it as is to SED. (so using SED Regex)
(SED use \1 instead of $1)

Code: Select all

;  Regex And $1 parameter
;  Post by BarryG ยป Sun Nov 28, 2021 8:47 am 
;  https://www.purebasic.fr/english/viewtopic.php?p=577567#p577567
;  Marc56 - 2021-11-23

EnableExplicit

Enumeration 
    #RegExp
EndEnumeration

Procedure RegexReplaceNew(RegEx$, Text$, Replace$)
    Debug "Regex source   : " + Regex$
    RegEx$ = ReplaceString(RegEx$, "(", "\(")
    RegEx$ = ReplaceString(RegEx$, ")", "\)")
    RegEx$ = RTrim(RegEx$, "g")
    Debug "Regex with esc : " + Regex$ 
    
    Protected Arg$  = "sed 's" + RegEx$ + Replace$ + "/g' Tmp_File.in > Tmp_File.out"
    Debug "SED command line: " + Arg$
    
    Protected Run = RunProgram("wsl", Arg$, GetTemporaryDirectory(), #PB_Program_Wait)
    
    Protected Tmp_File$ = GetTemporaryDirectory() + "Tmp_File.out"
    If FileSize(Tmp_File$) > 0
        ReadFile(1, Tmp_File$)
        Protected New_Line$ = ReadString(1)
        CloseFile(1)
        Debug "---"
        Debug Text$
        Debug New_Line$
        Debug UCase(Left(New_Line$, 1)) + Right(New_Line$, Len(New_Line$) -1)
    Else
        Debug "No file"
    EndIf
EndProcedure

Global Text$ = "thisStringIsGood"
If OpenFile(0, GetTemporaryDirectory() + "Tmp_File.in")
    WriteString(0, "thisStringIsGood")
    CloseFile(0)
    Global RegEx$ = "/([A-Z])/g"
    RegexReplaceNew(RegEx$ ,Text$, " \1")
Else
    Debug "Can't create Temp file"
    End
EndIf

DeleteFile(GetTemporaryDirectory() + "Tmp_File.in")
DeleteFile(GetTemporaryDirectory() + "Tmp_File.out")

End
(Using SED of WSL 1. If you don't have it installed, download SED from Unix Tools for Windows instead)

Code: Select all

Regex source   : /([A-Z])/g
Regex with esc : /\([A-Z]\)/
SED command line: sed 's/\([A-Z]\)/ \1/g' Tmp_File.in > Tmp_File.out
---
thisStringIsGood
this String Is Good
This String Is Good
:mrgreen:

But, the simplest solution would obviously be to parse the user input (remove // and quantifiers) and use the regular expression functions of PB. But create your own regular expression filter with all the solutions, I hope you have lots of coffee and time :wink:

Re: Regex and $1 parameter

Posted: Tue Nov 30, 2021 5:29 am
by AZJIO

Code: Select all

EnableExplicit

Procedure.s RegexReplace2(RgEx, *Result.string, Replace0$, Once = 0)
	Protected i, CountGr, Pos, Offset = 1
	Protected Result$, Replace$
	Protected NewList item.s()
	Protected LenT, *Point
	CountGr = CountRegularExpressionGroups(RgEx)
	If CountGr > 9
		CountGr = 9
	EndIf
	If ExamineRegularExpression(RgEx, *Result\s)
		While NextRegularExpressionMatch(RgEx)
			Pos = RegularExpressionMatchPosition(RgEx)
			Replace$ = ReplaceString(Replace0$,"\0", RegularExpressionMatchString(RgEx))
			For i = 1 To CountGr
				Replace$ = ReplaceString(Replace$, "\"+Str(i), RegularExpressionGroup(RgEx, i))
			Next
			If AddElement(item())
				item() = Mid(*Result\s, Offset, Pos - Offset) + Replace$
			EndIf
			Offset = Pos + RegularExpressionMatchLength(RgEx)
			If Once
				Break
			EndIf
		Wend
		If AddElement(item())
			item() = Mid(*Result\s, Offset)
		EndIf
		LenT = 0
		ForEach item()
			LenT + Len(item())
		Next
		*Result\s = Space(LenT)
		*Point = @*Result\s
		ForEach item()
			CopyMemoryString(item(), @*Point)
		Next
		FreeList(item())
	EndIf
EndProcedure


Define reSource$, reFlag$, User_entered$, re, re2, CreFlags = 0, Once = 0

User_entered$ = "/([A-Z])/g"

re=CreateRegularExpression(#PB_Any,"/(.+?)/([gim]*)")
If re
	If ExamineRegularExpression(re, User_entered$)
		If NextRegularExpressionMatch(re)
			reSource$ = RegularExpressionGroup(re, 1)
			reFlag$ = RegularExpressionGroup(re, 2)
		EndIf
	EndIf
	FreeRegularExpression(re)
EndIf

If Not Asc(reSource$)
	Debug "User, you're wrong. Empty regular expression"
	Debug "The regular expression should be in the following format: /anything/gim"
	End
EndIf

If FindString(reFlag$, "i")
	CreFlags | #PB_RegularExpression_NoCase
EndIf
If FindString(reFlag$, "m")
	CreFlags | #PB_RegularExpression_MultiLine
EndIf
If Not FindString(reFlag$, "g")
	Once = 1
EndIf

Define Text.string

Text\s = "thisStringIsGood"
re2 = CreateRegularExpression(#PB_Any, reSource$, CreFlags)
If re2
	RegexReplace2(re2, @Text, " \1", Once)
	FreeRegularExpression(re2)
	Debug Text\s ; thisStringIsGood
Else
	Debug "User, you're wrong:"
	Debug RegularExpressionError()
	End
EndIf