[Solved] RegEx question

Just starting out? Need help? Post your questions and find answers here.
BarryG
Addict
Addict
Posts: 4148
Joined: Thu Apr 18, 2019 8:17 am

[Solved] RegEx question

Post by BarryG »

This is hard for me to work out, but should be easy for regular expression experts here. Hehe. See this string:

Code: Select all

{}{-first}1{-two}2{-three}3{-hello}{-last}{keep}
I just want to remove anything starting with "{-" and ending with "}", and including those curly braces, so the above would become:

Code: Select all

{}123{keep}
I have no idea how to code this in regex. I've looked here but it's still too confusing because it wants to remove everything between, instead of only if starting with "{-":

https://stackoverflow.com/questions/239 ... o-brackets

Appreciate any help.
Last edited by BarryG on Mon Jan 02, 2023 2:25 am, edited 8 times in total.
User avatar
STARGÅTE
Addict
Addict
Posts: 2227
Joined: Thu Jan 10, 2008 1:30 pm
Location: Germany, Glienicke
Contact:

Re: RegEx question

Post by STARGÅTE »

Code: Select all

If CreateRegularExpression(0, "\{-.*?\}")
	Define Result.s = ReplaceRegularExpression(0, "{}{-first}1{-two}2{-three}3{-hello}{-last}{keep}", "")
	Debug Result
Else
	Debug RegularExpressionError()
EndIf
PB 6.01 ― Win 10, 21H2 ― Ryzen 9 3900X, 32 GB ― NVIDIA GeForce RTX 3080 ― Vivaldi 6.0 ― www.unionbytes.de
Lizard - Script language for symbolic calculations and moreTypeface - Sprite-based font include/module
BarryG
Addict
Addict
Posts: 4148
Joined: Thu Apr 18, 2019 8:17 am

Re: RegEx question

Post by BarryG »

Thank you, Stargate.

I put your expression into https://regex101.com/ to learn how it works, and this came up:

Image

Now I have a better understanding of it. I really appreciate your pointing the way. Thank you again!

BTW, the manual says CreateRegularExpression() returns 0 if the expression was not created successfully, but how likely is this in reality? It would only be if our expression syntax was wrong, yes? Since the expression here is 100% valid, I wouldn't need to do an If/EndIf for it?
Last edited by BarryG on Fri Jan 08, 2021 9:47 am, edited 1 time in total.
infratec
Always Here
Always Here
Posts: 7591
Joined: Sun Sep 07, 2008 12:45 pm
Location: Germany

Re: [Solved] RegEx question

Post by infratec »

If you also accept a solution without regex:

Code: Select all

Procedure.s Remove(*TextPtr.Character)
  
  Protected Result$, Help$, State.i
  
  While *TextPtr\c
    Help$ + Chr(*TextPtr\c)
    Select State
      Case 0
        If *TextPtr\c = '{'
          State = 1
        Else
          Result$ + Help$
          Help$ = ""
        EndIf
      Case 1
        If *TextPtr\c = '-'
          State = 2
        ElseIf *TextPtr\c = '}'
          Result$ + Help$
          Help$ = ""
          State = 0
        EndIf
      Case 2
        If *TextPtr\c = '}'
          Help$ = ""
          State = 0
        EndIf
    EndSelect
    
    *TextPtr + 2
  Wend
  
  ProcedureReturn Result$
  
EndProcedure


Text$ = "{}{-first}1{-two}2{-three}3{-hello}{-last}{keep}"
Debug Remove(@Text$)
To find what you not want is easy: {-\w*}
But to create the negative lookahead ...
loulou2522
Enthusiast
Enthusiast
Posts: 545
Joined: Tue Oct 14, 2014 12:09 pm

Re: [Solved] RegEx question

Post by loulou2522 »

HI Barryg,
Try rexman it's explain you how to make and test a regexp
viewtopic.php?f=27&t=37212&hilit=rexman
infratec
Always Here
Always Here
Posts: 7591
Joined: Sun Sep 07, 2008 12:45 pm
Location: Germany

Re: [Solved] RegEx question

Post by infratec »

Ah ....

replace the matches with "" :idea:

But my solution saves a lot of kb for the exe :wink:
BarryG
Addict
Addict
Posts: 4148
Joined: Thu Apr 18, 2019 8:17 am

Re: [Solved] RegEx question

Post by BarryG »

Thanks for the alternative, infratec! Saves about 124 KB added to my exe. But at least I learned a bit about regex formats today. @loulou2522, I'll download Rexman and have a play.
Marc56us
Addict
Addict
Posts: 1600
Joined: Sat Feb 08, 2014 3:26 pm

Re: [Solved] RegEx question

Post by Marc56us »

Hi,

Can be simplified: no need to escape { and } when not use as quantifier

Code: Select all

If CreateRegularExpression(0, "{-.*?}")
   Define Result.s = ReplaceRegularExpression(0, "{}{-first}1{-two}2{-three}3{-hello}{-last}{keep}", "")
   Debug Result
Else
   Debug RegularExpressionError()
EndIf

Code: Select all

{}123{keep}
And another way for those (like me) who does not understand pointers. A FindString() version.

Code: Select all

; -------12345678901234567890123456789012345678901234567890
Text$ = "{}{-first}1{-two}2{-three}3{-hello}{-last}{keep}"
; Expected
; {}123{keep}

Debug "Text: " + Text$
Nb = CountString(Text$, "{-")
Debug "Found: {-*} " + Nb + " time(s)"

For i = 1 To Nb
    Debug #CRLF$ + "#" + i
    STX = FindString(Text$, "{-")
    If STX > 0
        Debug "  STX: " + STX
        ETX = FindString(Text$, "}", STX)
        Debug "  ETX: " + ETX   
        Text$ = RemoveString(Text$, Mid(Text$, STX, ETX-STX+1))
        STX = 0
    EndIf
Next

Debug #CRLF$ + "Result: " + Text$
It would probably be slow with large text, but PB functions are fast.

:wink:
ebs
Enthusiast
Enthusiast
Posts: 558
Joined: Fri Apr 25, 2003 11:08 pm

Re: [Solved] RegEx question

Post by ebs »

Marc56us's FindString() code can be simplified and made somewhat faster by 1) not counting the number of "{-" occurrences and 2) starting the next FindString() at STX, not the beginning of the string:

Code: Select all

Procedure.s Remove4(Text$)
  STX = 1
  Repeat
    STX = FindString(Text$, "{-", STX)
    If STX
      ETX = FindString(Text$, "}", STX)
      Text$ = RemoveString(Text$, Mid(Text$, STX, ETX-STX+1))
    Else
      Break
    EndIf
  ForEver
  
  ProcedureReturn Text$
EndProcedure
It also works if you omit the initialization of "STX = 1" and start at zero instead, but that would be wrong ;-)
BarryG
Addict
Addict
Posts: 4148
Joined: Thu Apr 18, 2019 8:17 am

Re: [Solved] RegEx question

Post by BarryG »

Hi all, back on this topic. This time, I want to convert abc {?913} --- {?9745} xyz to abc 913 --- 9745 xyz. As you can see, I just want to remove any leading "{?" and the closing "}" after it. These markers will be present multiple times in the string. My tired brain can't work it out. Any tips? Thanks.

I can do it this way, but's slow and inefficient. Posting it just to prove I tried before blindly asking for help. <Wink>.

Code: Select all

text$="abc {?913} --- {?9745} xyz"

While FindString(text$,"{?")
  startPos = FindString(text$, "{?")
  endPos = FindString(text$, "}")
  text$ = Left(text$, startPos - 1) + Mid(text$, startPos + 2, endPos - startPos - 2) + Mid(text$, endPos + 1)
Wend

Debug text$ ; abc 913 --- 9745 xyz
I think a modification of infratec's code would be best, but I'm dumb tonight. Here's a template to start:

Code: Select all

Procedure.s RemoveMarkers(*TextPtr.Character)
  
  Protected Result$, Help$, State.i
  
  While *TextPtr\c
    Help$ + Chr(*TextPtr\c)
    Select State
      Case 0
        If *TextPtr\c = '{'
          State = 1
        Else
          Result$ + Help$
          Help$ = ""
        EndIf
      Case 1
        If *TextPtr\c = '?'
          State = 2
        ElseIf *TextPtr\c = '}'
          Result$ + Help$
          Help$ = ""
          State = 0
        EndIf
      Case 2
        If *TextPtr\c = '}'
          Help$ = ""
          State = 0
        EndIf
    EndSelect
    
    *TextPtr + 2
  Wend
  
  ProcedureReturn Result$
  
EndProcedure


Text$="abc {?913} --- {?9745} xyz"
Debug RemoveMarkers(@Text$)
Marc56us
Addict
Addict
Posts: 1600
Joined: Sat Feb 08, 2014 3:26 pm

Re: [Solved] RegEx question

Post by Marc56us »

I want to convert abc {?913} --- {?9745} xyz to abc 913 --- 9745 xyz. As you can see, I just want to remove any leading "{?" and the closing "}" after it. These markers will be present multiple times in the string.

Code: Select all

EnableExplicit

Define Txt$ = "abc {?913} --- {?9745} xyz"

Debug Txt$
Txt$ = ReplaceString(Txt$, "{?", "")
Txt$ = ReplaceString(Txt$, "}",  "")
Debug Txt$

End

Code: Select all

abc {?913} --- {?9745} xyz
abc 913 --- 9745 xyz
BarryG
Addict
Addict
Posts: 4148
Joined: Thu Apr 18, 2019 8:17 am

Re: RegEx question

Post by BarryG »

Not quite, Marc56us. I'm not that dumb. Hehe. Your code removes all "{?" and "}", which is not the goal. Only if "{?" is first followed by a "}" should those two sets be removed. Your code fails here, for example:

Code: Select all

Txt$ = "abc} {?913} --- {?9745} } } } {?xyz"
Txt$ = ReplaceString(Txt$, "{?", "")
Txt$ = ReplaceString(Txt$, "}",  "")
; Next line debugs the wrong answer.
Debug Txt$ ; abc 913 --- 9745    xyz
infratec
Always Here
Always Here
Posts: 7591
Joined: Sun Sep 07, 2008 12:45 pm
Location: Germany

Re: RegEx question

Post by infratec »

Code: Select all

Txt$ = "abc} {?913} --- {?9745} } } } {?xyz"


Pos1 = FindString(Txt$, "{?")
While Pos1
  Pos2 = FindString(Txt$, "}", Pos1)
  If Pos2
    Txt$ = Left(Txt$, Pos1 - 1) + Mid(Txt$, Pos2 + 1)
  Else
    Break
  EndIf
  Pos1 = FindString(Txt$, "{?", Pos1)
Wend

Debug Txt$
infratec
Always Here
Always Here
Posts: 7591
Joined: Sun Sep 07, 2008 12:45 pm
Location: Germany

Re: RegEx question

Post by infratec »

Extended version:

Code: Select all


Procedure.s CutOut(String$, CutStart$, CutEnd$)
  
  Protected.i Pos1, Pos2, CutEndLen
  
  
  CutEndLen = Len(CutEnd$)
  Pos1 = FindString(String$, CutStart$)
  While Pos1
    Pos2 = FindString(String$, CutEnd$, Pos1)
    If Pos2
      String$ = Left(String$, Pos1 - 1) + Mid(String$, Pos2 + CutEndLen)
    Else
      Break
    EndIf
    Pos1 = FindString(String$, CutStart$, Pos1)
  Wend
  
  ProcedureReturn String$
  
EndProcedure


Txt$ = "abc} {?913} --- {?9745} } } } {?xyz"
Debug CutOut(Txt$, "{?", "}")

Txt$ = "{}{-first}1{-two}2{-three}3{-hello}{-last}{keep}"
Debug CutOut(Txt$, "{-", "}")
You can speed it up by using pointers and CopyMemory :wink:
BarryG
Addict
Addict
Posts: 4148
Joined: Thu Apr 18, 2019 8:17 am

Re: RegEx question

Post by BarryG »

Sorry infratec, but both your examples delete the 913 and 9745, which are to be kept. See my original post with the green text of what's to be left. Only the leading marker of "{?" and closing marker of "}" is to be removed, and nothing else. It's a pain to code, as you can see.
Post Reply