Page 2 of 2

Re: RegEx question

Posted: Sat Dec 31, 2022 2:12 pm
by infratec
I was still on the removing trip :wink:

Code: Select all


Procedure.s CutOut(String$, CutStart$, CutEnd$)
  
  Protected.i Pos1, Pos2, CutEndLen
  
  
  CutEndLen = Len(CutEnd$)
  Pos1 = FindString(String$, CutStart$)
  While Pos1
    Pos2 = FindString(String$, CutEnd$, Pos1)
    If Pos2
      String$ = Left(String$, Pos1 - 1) + Mid(String$, Pos2 + CutEndLen)
    Else
      Break
    EndIf
    Pos1 = FindString(String$, CutStart$, Pos1)
  Wend
  
  ProcedureReturn String$
  
EndProcedure


Procedure.s RemoveMarks(String$, CutStart$, CutEnd$)
  
  Protected.i Pos1, Pos2, CutStartLen, CutEndLen
  
  
  CutStartLen = Len(CutStart$)
  CutEndLen = Len(CutEnd$)
  Pos1 = FindString(String$, CutStart$)
  While Pos1
    Pos2 = FindString(String$, CutEnd$, Pos1)
    If Pos2
      String$ = Left(String$, Pos2 - 1) + Mid(String$, Pos2 + CutEndLen)
      String$ = Left(String$, Pos1 - 1) + Mid(String$, Pos1 + CutStartLen)
    Else
      Break
    EndIf
    Pos1 = FindString(String$, CutStart$, Pos1)
  Wend
  
  ProcedureReturn String$
  
EndProcedure


Txt$ = "abc} {?913} --- {?9745} } } } {?xyz"
Debug RemoveMarks(Txt$, "{?", "}")

Txt$ = "{}{-first}1{-two}2{-three}3{-hello}{-last}{keep}"
Debug CutOut(Txt$, "{-", "}")

Re: RegEx question

Posted: Sat Dec 31, 2022 4:16 pm
by Marc56us
Another way

Code: Select all

EnableExplicit

#REGEX = "({\?(.+?)})"

Define Txt$ = "abc} {?913} --- {?9745} } } } {?xyz"

If Not CreateRegularExpression(0, #REGEX)
    Debug "Bad RegEx" : End
EndIf

If Not ExamineRegularExpression(0, Txt$)
    Debug "No Match" : End
EndIf

Debug Txt$

While NextRegularExpressionMatch(0)
    Define Org$ = RegularExpressionGroup(0, 1)
    Define New$ = RegularExpressionGroup(0, 2)
    Txt$ = ReplaceString(Txt$, Org$, New$) 
Wend

Debug Txt$

End
Yes, RegEx and ReplaceString in the same code, it's an end of year gift ! :mrgreen:

Is this the expected result?

Code: Select all

abc} {?913} --- {?9745} } } } {?xyz
abc} 913 --- 9745 } } } {?xyz
If not, please provide an example of what is before and what we want after (It is always easier to understand than an explanation)

:!: For this to work, any sequence must be complete {? ... }
No {? {

So it will also work with FindString and ReplaceString
Find: {?
Then: } *from previous position*
Add the content in new string
(pointer professionals can do that easy, I think)

:wink:

Re: RegEx question

Posted: Sun Jan 01, 2023 4:17 am
by BarryG
@infratec: Thanks, but your codes are basically the same code that I'm currently using here -> viewtopic.php?p=593082#p593082

I was hoping maybe there's a faster way like you did with your "While *TextPtr\c" code for my original question in this topic.
Marc56us wrote: Sat Dec 31, 2022 4:16 pmplease provide an example of what is before and what we want after (It is always easier to understand than an explanation)
But I did, Marc56us. I even colored it and explained the criteria after the colors:
BarryG wrote: Sat Dec 31, 2022 11:18 amI want to convert abc {?913} --- {?9745} xyz to abc 913 --- 9745 xyz. As you can see, I just want to remove any leading "{?" and the closing "}" after it.
Thanks for your RegEx code, though. It works, and is a good fountain of knowledge for me! I think I'll go with that, if infratec doesn't post anything further.

Thanks to both of you!

Re: RegEx question

Posted: Sun Jan 01, 2023 7:49 pm
by Sicro
@BarryG

If speed is important, you could make a quick lexer with my RegEx module, which you then use to cut out the unwanted parts of the text.

This code creates the needed DFA file for the lexer on your desktop:

Code: Select all

IncludeFile "RegEx-Engine/RegExEngine.pbi"

Enumeration TokenTypes
  #TokenType_NormalText
  #TokenType_TextInBraces
EndEnumeration

Define regEx = RegEx::Init()
If regEx
  Debug RegEx::AddNfa(regEx, "{\?[^}]*}", #TokenType_TextInBraces)
  Debug RegEx::AddNfa(regEx, "[^{]+|{\?[^}]*", #TokenType_NormalText)
  Debug RegEx::CreateDfa(regEx)
  Debug RegEx::ExportDfa(regEx, GetUserDirectory(#PB_Directory_Desktop) + "dfa.dat")
  RegEx::Free(regEx)
EndIf
Then you can use this code, which removes the unwanted parts from your text:

Code: Select all

IncludeFile "RegEx-Engine/DfaMatcher.pbi"

Enumeration TokenTypes
  #TokenType_NormalText
  #TokenType_TextInBraces
EndEnumeration

Define.Character *text
Define text$, newText$
Define tokenType, length

text$ = "abc} {?913} --- {?9745} } } } {?xyz"

*text = @text$
While *text\c
  length = DfaMatcher::Match(?dfaTable, *text, @tokenType)
  Select tokenType
    Case #TokenType_TextInBraces
      ; Remove "{?" at the beginning and "}" at the end
      newText$ + DfaMatcher::GetString(*text + SizeOf(Character) * 2, length - SizeOf(Character) * 3)
    Case #TokenType_NormalText
      newText$ + DfaMatcher::GetString(*text, length)
  EndSelect
  *text + length
Wend

Debug newText$

DataSection
  dfaTable:
  IncludeBinary "dfa.dat"
EndDataSection

Re: RegEx question

Posted: Sun Jan 01, 2023 8:47 pm
by BarryG
Thanks, Sicro. Noted in case I need it later.

Re: RegEx question

Posted: Sun Jan 01, 2023 9:14 pm
by infratec
Version with pointer:

Code: Select all

Procedure.s RemoveMarks(*TextPtr.Character)
  
  Protected.i Result$, Help$, State.i
  
  
  While *TextPtr\c
    
    Select State
      Case 0
        If *TextPtr\c = '{'
          State = 1
        Else
          Result$ + Chr(*TextPtr\c)
        EndIf
      Case 1
        If *TextPtr\c = '?'
          State = 2
        Else
          Result$ + "{" + Chr(*TextPtr\c)
          State = 0
        EndIf
      Case 2
        If *TextPtr\c = '}'
          Result$ + Help$
          Help$ = ""
          State = 0
        Else
          Help$ + Chr(*TextPtr\c)
        EndIf
        
    EndSelect
    
    *TextPtr + 2
  Wend
  
  If Help$ <> ""
    Result$ + "{?" + Help$
  EndIf
  
  ProcedureReturn Result$
  
EndProcedure



Txt$ = "abc} {?913} --- {?9745} } } } {?xyz"
Debug RemoveMarks(@Txt$)

Re: RegEx question

Posted: Sun Jan 01, 2023 10:49 pm
by BarryG
Thanks, infratec!

Re: RegEx question

Posted: Sun Jan 01, 2023 11:35 pm
by infratec
maybe this is faster:

Code: Select all

Procedure.s RemoveMarks(*TextPtr.Character)
  
  Protected.i Result$, State.i, *Ptr
  
  
  *Ptr = *TextPtr
  While *TextPtr\c
    
    Select State
      Case 0
        If *TextPtr\c = '{'
          State = 1
        EndIf
      Case 1
        If *TextPtr\c = '?'
          State = 2
          Result$ + PeekS(*Ptr, (*TextPtr - *Ptr - 2) >> 1)
          *Ptr = *TextPtr + 2
        Else
          State = 0
        EndIf
      Case 2
        If *TextPtr\c = '}'
          Result$ + PeekS(*Ptr, (*TextPtr - *Ptr) >> 1)
          *Ptr = *TextPtr + 2
          State = 0
        EndIf
        
    EndSelect
    
    *TextPtr + 2
  Wend
  
  If State = 2
    Result$ + "{?"
  EndIf
  
  Result$ + PeekS(*Ptr, (*TextPtr - *Ptr) >> 1)
  
  
  ProcedureReturn Result$
  
EndProcedure



Txt$ = "abc} {?913} --- {?9745} } } } {?xyz"
Debug RemoveMarks(@Txt$)