Page 1 of 2

[Solved] RegEx question

Posted: Fri Jan 08, 2021 7:26 am
by BarryG
This is hard for me to work out, but should be easy for regular expression experts here. Hehe. See this string:

Code: Select all

{}{-first}1{-two}2{-three}3{-hello}{-last}{keep}
I just want to remove anything starting with "{-" and ending with "}", and including those curly braces, so the above would become:

Code: Select all

{}123{keep}
I have no idea how to code this in regex. I've looked here but it's still too confusing because it wants to remove everything between, instead of only if starting with "{-":

https://stackoverflow.com/questions/239 ... o-brackets

Appreciate any help.

Re: RegEx question

Posted: Fri Jan 08, 2021 9:23 am
by STARGĂ…TE

Code: Select all

If CreateRegularExpression(0, "\{-.*?\}")
	Define Result.s = ReplaceRegularExpression(0, "{}{-first}1{-two}2{-three}3{-hello}{-last}{keep}", "")
	Debug Result
Else
	Debug RegularExpressionError()
EndIf

Re: RegEx question

Posted: Fri Jan 08, 2021 9:41 am
by BarryG
Thank you, Stargate.

I put your expression into https://regex101.com/ to learn how it works, and this came up:

Image

Now I have a better understanding of it. I really appreciate your pointing the way. Thank you again!

BTW, the manual says CreateRegularExpression() returns 0 if the expression was not created successfully, but how likely is this in reality? It would only be if our expression syntax was wrong, yes? Since the expression here is 100% valid, I wouldn't need to do an If/EndIf for it?

Re: [Solved] RegEx question

Posted: Fri Jan 08, 2021 9:47 am
by infratec
If you also accept a solution without regex:

Code: Select all

Procedure.s Remove(*TextPtr.Character)
  
  Protected Result$, Help$, State.i
  
  While *TextPtr\c
    Help$ + Chr(*TextPtr\c)
    Select State
      Case 0
        If *TextPtr\c = '{'
          State = 1
        Else
          Result$ + Help$
          Help$ = ""
        EndIf
      Case 1
        If *TextPtr\c = '-'
          State = 2
        ElseIf *TextPtr\c = '}'
          Result$ + Help$
          Help$ = ""
          State = 0
        EndIf
      Case 2
        If *TextPtr\c = '}'
          Help$ = ""
          State = 0
        EndIf
    EndSelect
    
    *TextPtr + 2
  Wend
  
  ProcedureReturn Result$
  
EndProcedure


Text$ = "{}{-first}1{-two}2{-three}3{-hello}{-last}{keep}"
Debug Remove(@Text$)
To find what you not want is easy: {-\w*}
But to create the negative lookahead ...

Re: [Solved] RegEx question

Posted: Fri Jan 08, 2021 9:49 am
by loulou2522
HI Barryg,
Try rexman it's explain you how to make and test a regexp
viewtopic.php?f=27&t=37212&hilit=rexman

Re: [Solved] RegEx question

Posted: Fri Jan 08, 2021 9:51 am
by infratec
Ah ....

replace the matches with "" :idea:

But my solution saves a lot of kb for the exe :wink:

Re: [Solved] RegEx question

Posted: Fri Jan 08, 2021 9:56 am
by BarryG
Thanks for the alternative, infratec! Saves about 124 KB added to my exe. But at least I learned a bit about regex formats today. @loulou2522, I'll download Rexman and have a play.

Re: [Solved] RegEx question

Posted: Fri Jan 08, 2021 10:26 am
by Marc56us
Hi,

Can be simplified: no need to escape { and } when not use as quantifier

Code: Select all

If CreateRegularExpression(0, "{-.*?}")
   Define Result.s = ReplaceRegularExpression(0, "{}{-first}1{-two}2{-three}3{-hello}{-last}{keep}", "")
   Debug Result
Else
   Debug RegularExpressionError()
EndIf

Code: Select all

{}123{keep}
And another way for those (like me) who does not understand pointers. A FindString() version.

Code: Select all

; -------12345678901234567890123456789012345678901234567890
Text$ = "{}{-first}1{-two}2{-three}3{-hello}{-last}{keep}"
; Expected
; {}123{keep}

Debug "Text: " + Text$
Nb = CountString(Text$, "{-")
Debug "Found: {-*} " + Nb + " time(s)"

For i = 1 To Nb
    Debug #CRLF$ + "#" + i
    STX = FindString(Text$, "{-")
    If STX > 0
        Debug "  STX: " + STX
        ETX = FindString(Text$, "}", STX)
        Debug "  ETX: " + ETX   
        Text$ = RemoveString(Text$, Mid(Text$, STX, ETX-STX+1))
        STX = 0
    EndIf
Next

Debug #CRLF$ + "Result: " + Text$
It would probably be slow with large text, but PB functions are fast.

:wink:

Re: [Solved] RegEx question

Posted: Fri Jan 08, 2021 5:20 pm
by ebs
Marc56us's FindString() code can be simplified and made somewhat faster by 1) not counting the number of "{-" occurrences and 2) starting the next FindString() at STX, not the beginning of the string:

Code: Select all

Procedure.s Remove4(Text$)
  STX = 1
  Repeat
    STX = FindString(Text$, "{-", STX)
    If STX
      ETX = FindString(Text$, "}", STX)
      Text$ = RemoveString(Text$, Mid(Text$, STX, ETX-STX+1))
    Else
      Break
    EndIf
  ForEver
  
  ProcedureReturn Text$
EndProcedure
It also works if you omit the initialization of "STX = 1" and start at zero instead, but that would be wrong ;-)

Re: [Solved] RegEx question

Posted: Sat Dec 31, 2022 11:18 am
by BarryG
Hi all, back on this topic. This time, I want to convert abc {?913} --- {?9745} xyz to abc 913 --- 9745 xyz. As you can see, I just want to remove any leading "{?" and the closing "}" after it. These markers will be present multiple times in the string. My tired brain can't work it out. Any tips? Thanks.

I can do it this way, but's slow and inefficient. Posting it just to prove I tried before blindly asking for help. <Wink>.

Code: Select all

text$="abc {?913} --- {?9745} xyz"

While FindString(text$,"{?")
  startPos = FindString(text$, "{?")
  endPos = FindString(text$, "}")
  text$ = Left(text$, startPos - 1) + Mid(text$, startPos + 2, endPos - startPos - 2) + Mid(text$, endPos + 1)
Wend

Debug text$ ; abc 913 --- 9745 xyz
I think a modification of infratec's code would be best, but I'm dumb tonight. Here's a template to start:

Code: Select all

Procedure.s RemoveMarkers(*TextPtr.Character)
  
  Protected Result$, Help$, State.i
  
  While *TextPtr\c
    Help$ + Chr(*TextPtr\c)
    Select State
      Case 0
        If *TextPtr\c = '{'
          State = 1
        Else
          Result$ + Help$
          Help$ = ""
        EndIf
      Case 1
        If *TextPtr\c = '?'
          State = 2
        ElseIf *TextPtr\c = '}'
          Result$ + Help$
          Help$ = ""
          State = 0
        EndIf
      Case 2
        If *TextPtr\c = '}'
          Help$ = ""
          State = 0
        EndIf
    EndSelect
    
    *TextPtr + 2
  Wend
  
  ProcedureReturn Result$
  
EndProcedure


Text$="abc {?913} --- {?9745} xyz"
Debug RemoveMarkers(@Text$)

Re: [Solved] RegEx question

Posted: Sat Dec 31, 2022 11:45 am
by Marc56us
I want to convert abc {?913} --- {?9745} xyz to abc 913 --- 9745 xyz. As you can see, I just want to remove any leading "{?" and the closing "}" after it. These markers will be present multiple times in the string.

Code: Select all

EnableExplicit

Define Txt$ = "abc {?913} --- {?9745} xyz"

Debug Txt$
Txt$ = ReplaceString(Txt$, "{?", "")
Txt$ = ReplaceString(Txt$, "}",  "")
Debug Txt$

End

Code: Select all

abc {?913} --- {?9745} xyz
abc 913 --- 9745 xyz

Re: RegEx question

Posted: Sat Dec 31, 2022 11:57 am
by BarryG
Not quite, Marc56us. I'm not that dumb. Hehe. Your code removes all "{?" and "}", which is not the goal. Only if "{?" is first followed by a "}" should those two sets be removed. Your code fails here, for example:

Code: Select all

Txt$ = "abc} {?913} --- {?9745} } } } {?xyz"
Txt$ = ReplaceString(Txt$, "{?", "")
Txt$ = ReplaceString(Txt$, "}",  "")
; Next line debugs the wrong answer.
Debug Txt$ ; abc 913 --- 9745    xyz

Re: RegEx question

Posted: Sat Dec 31, 2022 1:12 pm
by infratec

Code: Select all

Txt$ = "abc} {?913} --- {?9745} } } } {?xyz"


Pos1 = FindString(Txt$, "{?")
While Pos1
  Pos2 = FindString(Txt$, "}", Pos1)
  If Pos2
    Txt$ = Left(Txt$, Pos1 - 1) + Mid(Txt$, Pos2 + 1)
  Else
    Break
  EndIf
  Pos1 = FindString(Txt$, "{?", Pos1)
Wend

Debug Txt$

Re: RegEx question

Posted: Sat Dec 31, 2022 1:18 pm
by infratec
Extended version:

Code: Select all


Procedure.s CutOut(String$, CutStart$, CutEnd$)
  
  Protected.i Pos1, Pos2, CutEndLen
  
  
  CutEndLen = Len(CutEnd$)
  Pos1 = FindString(String$, CutStart$)
  While Pos1
    Pos2 = FindString(String$, CutEnd$, Pos1)
    If Pos2
      String$ = Left(String$, Pos1 - 1) + Mid(String$, Pos2 + CutEndLen)
    Else
      Break
    EndIf
    Pos1 = FindString(String$, CutStart$, Pos1)
  Wend
  
  ProcedureReturn String$
  
EndProcedure


Txt$ = "abc} {?913} --- {?9745} } } } {?xyz"
Debug CutOut(Txt$, "{?", "}")

Txt$ = "{}{-first}1{-two}2{-three}3{-hello}{-last}{keep}"
Debug CutOut(Txt$, "{-", "}")
You can speed it up by using pointers and CopyMemory :wink:

Re: RegEx question

Posted: Sat Dec 31, 2022 1:46 pm
by BarryG
Sorry infratec, but both your examples delete the 913 and 9745, which are to be kept. See my original post with the green text of what's to be left. Only the leading marker of "{?" and closing marker of "}" is to be removed, and nothing else. It's a pain to code, as you can see.