[Solved] RegEx question

Just starting out? Need help? Post your questions and find answers here.
infratec
Always Here
Always Here
Posts: 7591
Joined: Sun Sep 07, 2008 12:45 pm
Location: Germany

Re: RegEx question

Post by infratec »

I was still on the removing trip :wink:

Code: Select all


Procedure.s CutOut(String$, CutStart$, CutEnd$)
  
  Protected.i Pos1, Pos2, CutEndLen
  
  
  CutEndLen = Len(CutEnd$)
  Pos1 = FindString(String$, CutStart$)
  While Pos1
    Pos2 = FindString(String$, CutEnd$, Pos1)
    If Pos2
      String$ = Left(String$, Pos1 - 1) + Mid(String$, Pos2 + CutEndLen)
    Else
      Break
    EndIf
    Pos1 = FindString(String$, CutStart$, Pos1)
  Wend
  
  ProcedureReturn String$
  
EndProcedure


Procedure.s RemoveMarks(String$, CutStart$, CutEnd$)
  
  Protected.i Pos1, Pos2, CutStartLen, CutEndLen
  
  
  CutStartLen = Len(CutStart$)
  CutEndLen = Len(CutEnd$)
  Pos1 = FindString(String$, CutStart$)
  While Pos1
    Pos2 = FindString(String$, CutEnd$, Pos1)
    If Pos2
      String$ = Left(String$, Pos2 - 1) + Mid(String$, Pos2 + CutEndLen)
      String$ = Left(String$, Pos1 - 1) + Mid(String$, Pos1 + CutStartLen)
    Else
      Break
    EndIf
    Pos1 = FindString(String$, CutStart$, Pos1)
  Wend
  
  ProcedureReturn String$
  
EndProcedure


Txt$ = "abc} {?913} --- {?9745} } } } {?xyz"
Debug RemoveMarks(Txt$, "{?", "}")

Txt$ = "{}{-first}1{-two}2{-three}3{-hello}{-last}{keep}"
Debug CutOut(Txt$, "{-", "}")
Marc56us
Addict
Addict
Posts: 1600
Joined: Sat Feb 08, 2014 3:26 pm

Re: RegEx question

Post by Marc56us »

Another way

Code: Select all

EnableExplicit

#REGEX = "({\?(.+?)})"

Define Txt$ = "abc} {?913} --- {?9745} } } } {?xyz"

If Not CreateRegularExpression(0, #REGEX)
    Debug "Bad RegEx" : End
EndIf

If Not ExamineRegularExpression(0, Txt$)
    Debug "No Match" : End
EndIf

Debug Txt$

While NextRegularExpressionMatch(0)
    Define Org$ = RegularExpressionGroup(0, 1)
    Define New$ = RegularExpressionGroup(0, 2)
    Txt$ = ReplaceString(Txt$, Org$, New$) 
Wend

Debug Txt$

End
Yes, RegEx and ReplaceString in the same code, it's an end of year gift ! :mrgreen:

Is this the expected result?

Code: Select all

abc} {?913} --- {?9745} } } } {?xyz
abc} 913 --- 9745 } } } {?xyz
If not, please provide an example of what is before and what we want after (It is always easier to understand than an explanation)

:!: For this to work, any sequence must be complete {? ... }
No {? {

So it will also work with FindString and ReplaceString
Find: {?
Then: } *from previous position*
Add the content in new string
(pointer professionals can do that easy, I think)

:wink:
BarryG
Addict
Addict
Posts: 4148
Joined: Thu Apr 18, 2019 8:17 am

Re: RegEx question

Post by BarryG »

@infratec: Thanks, but your codes are basically the same code that I'm currently using here -> viewtopic.php?p=593082#p593082

I was hoping maybe there's a faster way like you did with your "While *TextPtr\c" code for my original question in this topic.
Marc56us wrote: Sat Dec 31, 2022 4:16 pmplease provide an example of what is before and what we want after (It is always easier to understand than an explanation)
But I did, Marc56us. I even colored it and explained the criteria after the colors:
BarryG wrote: Sat Dec 31, 2022 11:18 amI want to convert abc {?913} --- {?9745} xyz to abc 913 --- 9745 xyz. As you can see, I just want to remove any leading "{?" and the closing "}" after it.
Thanks for your RegEx code, though. It works, and is a good fountain of knowledge for me! I think I'll go with that, if infratec doesn't post anything further.

Thanks to both of you!
User avatar
Sicro
Enthusiast
Enthusiast
Posts: 559
Joined: Wed Jun 25, 2014 5:25 pm
Location: Germany
Contact:

Re: RegEx question

Post by Sicro »

@BarryG

If speed is important, you could make a quick lexer with my RegEx module, which you then use to cut out the unwanted parts of the text.

This code creates the needed DFA file for the lexer on your desktop:

Code: Select all

IncludeFile "RegEx-Engine/RegExEngine.pbi"

Enumeration TokenTypes
  #TokenType_NormalText
  #TokenType_TextInBraces
EndEnumeration

Define regEx = RegEx::Init()
If regEx
  Debug RegEx::AddNfa(regEx, "{\?[^}]*}", #TokenType_TextInBraces)
  Debug RegEx::AddNfa(regEx, "[^{]+|{\?[^}]*", #TokenType_NormalText)
  Debug RegEx::CreateDfa(regEx)
  Debug RegEx::ExportDfa(regEx, GetUserDirectory(#PB_Directory_Desktop) + "dfa.dat")
  RegEx::Free(regEx)
EndIf
Then you can use this code, which removes the unwanted parts from your text:

Code: Select all

IncludeFile "RegEx-Engine/DfaMatcher.pbi"

Enumeration TokenTypes
  #TokenType_NormalText
  #TokenType_TextInBraces
EndEnumeration

Define.Character *text
Define text$, newText$
Define tokenType, length

text$ = "abc} {?913} --- {?9745} } } } {?xyz"

*text = @text$
While *text\c
  length = DfaMatcher::Match(?dfaTable, *text, @tokenType)
  Select tokenType
    Case #TokenType_TextInBraces
      ; Remove "{?" at the beginning and "}" at the end
      newText$ + DfaMatcher::GetString(*text + SizeOf(Character) * 2, length - SizeOf(Character) * 3)
    Case #TokenType_NormalText
      newText$ + DfaMatcher::GetString(*text, length)
  EndSelect
  *text + length
Wend

Debug newText$

DataSection
  dfaTable:
  IncludeBinary "dfa.dat"
EndDataSection
Image
Why OpenSource should have a license :: PB-CodeArchiv-Rebirth :: Pleasant-Dark (syntax color scheme) :: RegEx-Engine (compiles RegExes to NFA/DFA)
Manjaro Xfce x64 (Main system) :: Windows 10 Home (VirtualBox) :: Newest PureBasic version
BarryG
Addict
Addict
Posts: 4148
Joined: Thu Apr 18, 2019 8:17 am

Re: RegEx question

Post by BarryG »

Thanks, Sicro. Noted in case I need it later.
infratec
Always Here
Always Here
Posts: 7591
Joined: Sun Sep 07, 2008 12:45 pm
Location: Germany

Re: RegEx question

Post by infratec »

Version with pointer:

Code: Select all

Procedure.s RemoveMarks(*TextPtr.Character)
  
  Protected.i Result$, Help$, State.i
  
  
  While *TextPtr\c
    
    Select State
      Case 0
        If *TextPtr\c = '{'
          State = 1
        Else
          Result$ + Chr(*TextPtr\c)
        EndIf
      Case 1
        If *TextPtr\c = '?'
          State = 2
        Else
          Result$ + "{" + Chr(*TextPtr\c)
          State = 0
        EndIf
      Case 2
        If *TextPtr\c = '}'
          Result$ + Help$
          Help$ = ""
          State = 0
        Else
          Help$ + Chr(*TextPtr\c)
        EndIf
        
    EndSelect
    
    *TextPtr + 2
  Wend
  
  If Help$ <> ""
    Result$ + "{?" + Help$
  EndIf
  
  ProcedureReturn Result$
  
EndProcedure



Txt$ = "abc} {?913} --- {?9745} } } } {?xyz"
Debug RemoveMarks(@Txt$)
BarryG
Addict
Addict
Posts: 4148
Joined: Thu Apr 18, 2019 8:17 am

Re: RegEx question

Post by BarryG »

Thanks, infratec!
infratec
Always Here
Always Here
Posts: 7591
Joined: Sun Sep 07, 2008 12:45 pm
Location: Germany

Re: RegEx question

Post by infratec »

maybe this is faster:

Code: Select all

Procedure.s RemoveMarks(*TextPtr.Character)
  
  Protected.i Result$, State.i, *Ptr
  
  
  *Ptr = *TextPtr
  While *TextPtr\c
    
    Select State
      Case 0
        If *TextPtr\c = '{'
          State = 1
        EndIf
      Case 1
        If *TextPtr\c = '?'
          State = 2
          Result$ + PeekS(*Ptr, (*TextPtr - *Ptr - 2) >> 1)
          *Ptr = *TextPtr + 2
        Else
          State = 0
        EndIf
      Case 2
        If *TextPtr\c = '}'
          Result$ + PeekS(*Ptr, (*TextPtr - *Ptr) >> 1)
          *Ptr = *TextPtr + 2
          State = 0
        EndIf
        
    EndSelect
    
    *TextPtr + 2
  Wend
  
  If State = 2
    Result$ + "{?"
  EndIf
  
  Result$ + PeekS(*Ptr, (*TextPtr - *Ptr) >> 1)
  
  
  ProcedureReturn Result$
  
EndProcedure



Txt$ = "abc} {?913} --- {?9745} } } } {?xyz"
Debug RemoveMarks(@Txt$)
Post Reply