Page 2 of 2
Re: RegEx question
Posted: Sat Dec 31, 2022 2:12 pm
by infratec
I was still on the removing trip
Code: Select all
Procedure.s CutOut(String$, CutStart$, CutEnd$)
Protected.i Pos1, Pos2, CutEndLen
CutEndLen = Len(CutEnd$)
Pos1 = FindString(String$, CutStart$)
While Pos1
Pos2 = FindString(String$, CutEnd$, Pos1)
If Pos2
String$ = Left(String$, Pos1 - 1) + Mid(String$, Pos2 + CutEndLen)
Else
Break
EndIf
Pos1 = FindString(String$, CutStart$, Pos1)
Wend
ProcedureReturn String$
EndProcedure
Procedure.s RemoveMarks(String$, CutStart$, CutEnd$)
Protected.i Pos1, Pos2, CutStartLen, CutEndLen
CutStartLen = Len(CutStart$)
CutEndLen = Len(CutEnd$)
Pos1 = FindString(String$, CutStart$)
While Pos1
Pos2 = FindString(String$, CutEnd$, Pos1)
If Pos2
String$ = Left(String$, Pos2 - 1) + Mid(String$, Pos2 + CutEndLen)
String$ = Left(String$, Pos1 - 1) + Mid(String$, Pos1 + CutStartLen)
Else
Break
EndIf
Pos1 = FindString(String$, CutStart$, Pos1)
Wend
ProcedureReturn String$
EndProcedure
Txt$ = "abc} {?913} --- {?9745} } } } {?xyz"
Debug RemoveMarks(Txt$, "{?", "}")
Txt$ = "{}{-first}1{-two}2{-three}3{-hello}{-last}{keep}"
Debug CutOut(Txt$, "{-", "}")
Re: RegEx question
Posted: Sat Dec 31, 2022 4:16 pm
by Marc56us
Another way
Code: Select all
EnableExplicit
#REGEX = "({\?(.+?)})"
Define Txt$ = "abc} {?913} --- {?9745} } } } {?xyz"
If Not CreateRegularExpression(0, #REGEX)
Debug "Bad RegEx" : End
EndIf
If Not ExamineRegularExpression(0, Txt$)
Debug "No Match" : End
EndIf
Debug Txt$
While NextRegularExpressionMatch(0)
Define Org$ = RegularExpressionGroup(0, 1)
Define New$ = RegularExpressionGroup(0, 2)
Txt$ = ReplaceString(Txt$, Org$, New$)
Wend
Debug Txt$
End
Yes, RegEx and ReplaceString in the same code, it's an end of year gift !
Is this the expected result?
Code: Select all
abc} {?913} --- {?9745} } } } {?xyz
abc} 913 --- 9745 } } } {?xyz
If not, please provide an example of what is before and what we want after (It is always easier to understand than an explanation)

For this to work, any sequence must be complete
{? ...
}
No {? {
So it will also work with FindString and ReplaceString
Find:
{?
Then:
} *from previous position*
Add the content in new string
(pointer professionals can do that easy, I think)

Re: RegEx question
Posted: Sun Jan 01, 2023 4:17 am
by BarryG
@infratec: Thanks, but your codes are basically the same code that I'm currently using here ->
viewtopic.php?p=593082#p593082
I was hoping maybe there's a faster way like you did with your "While *TextPtr\c" code for my original question in this topic.
Marc56us wrote: Sat Dec 31, 2022 4:16 pmplease provide an example of what is before and what we want after (It is always easier to understand than an explanation)
But I did, Marc56us. I even colored it and explained the criteria after the colors:
BarryG wrote: Sat Dec 31, 2022 11:18 amI want to convert
abc {?913} --- {?9745} xyz to
abc 913 --- 9745 xyz. As you can see, I just want to remove any leading "{?" and the closing "}" after it.
Thanks for your RegEx code, though. It works, and is a good fountain of knowledge for me! I think I'll go with that, if infratec doesn't post anything further.
Thanks to both of you!
Re: RegEx question
Posted: Sun Jan 01, 2023 7:49 pm
by Sicro
@BarryG
If speed is important, you could make a quick lexer with my
RegEx module, which you then use to cut out the unwanted parts of the text.
This code creates the needed DFA file for the lexer on your desktop:
Code: Select all
IncludeFile "RegEx-Engine/RegExEngine.pbi"
Enumeration TokenTypes
#TokenType_NormalText
#TokenType_TextInBraces
EndEnumeration
Define regEx = RegEx::Init()
If regEx
Debug RegEx::AddNfa(regEx, "{\?[^}]*}", #TokenType_TextInBraces)
Debug RegEx::AddNfa(regEx, "[^{]+|{\?[^}]*", #TokenType_NormalText)
Debug RegEx::CreateDfa(regEx)
Debug RegEx::ExportDfa(regEx, GetUserDirectory(#PB_Directory_Desktop) + "dfa.dat")
RegEx::Free(regEx)
EndIf
Then you can use this code, which removes the unwanted parts from your text:
Code: Select all
IncludeFile "RegEx-Engine/DfaMatcher.pbi"
Enumeration TokenTypes
#TokenType_NormalText
#TokenType_TextInBraces
EndEnumeration
Define.Character *text
Define text$, newText$
Define tokenType, length
text$ = "abc} {?913} --- {?9745} } } } {?xyz"
*text = @text$
While *text\c
length = DfaMatcher::Match(?dfaTable, *text, @tokenType)
Select tokenType
Case #TokenType_TextInBraces
; Remove "{?" at the beginning and "}" at the end
newText$ + DfaMatcher::GetString(*text + SizeOf(Character) * 2, length - SizeOf(Character) * 3)
Case #TokenType_NormalText
newText$ + DfaMatcher::GetString(*text, length)
EndSelect
*text + length
Wend
Debug newText$
DataSection
dfaTable:
IncludeBinary "dfa.dat"
EndDataSection
Re: RegEx question
Posted: Sun Jan 01, 2023 8:47 pm
by BarryG
Thanks, Sicro. Noted in case I need it later.
Re: RegEx question
Posted: Sun Jan 01, 2023 9:14 pm
by infratec
Version with pointer:
Code: Select all
Procedure.s RemoveMarks(*TextPtr.Character)
Protected.i Result$, Help$, State.i
While *TextPtr\c
Select State
Case 0
If *TextPtr\c = '{'
State = 1
Else
Result$ + Chr(*TextPtr\c)
EndIf
Case 1
If *TextPtr\c = '?'
State = 2
Else
Result$ + "{" + Chr(*TextPtr\c)
State = 0
EndIf
Case 2
If *TextPtr\c = '}'
Result$ + Help$
Help$ = ""
State = 0
Else
Help$ + Chr(*TextPtr\c)
EndIf
EndSelect
*TextPtr + 2
Wend
If Help$ <> ""
Result$ + "{?" + Help$
EndIf
ProcedureReturn Result$
EndProcedure
Txt$ = "abc} {?913} --- {?9745} } } } {?xyz"
Debug RemoveMarks(@Txt$)
Re: RegEx question
Posted: Sun Jan 01, 2023 10:49 pm
by BarryG
Thanks, infratec!
Re: RegEx question
Posted: Sun Jan 01, 2023 11:35 pm
by infratec
maybe this is faster:
Code: Select all
Procedure.s RemoveMarks(*TextPtr.Character)
Protected.i Result$, State.i, *Ptr
*Ptr = *TextPtr
While *TextPtr\c
Select State
Case 0
If *TextPtr\c = '{'
State = 1
EndIf
Case 1
If *TextPtr\c = '?'
State = 2
Result$ + PeekS(*Ptr, (*TextPtr - *Ptr - 2) >> 1)
*Ptr = *TextPtr + 2
Else
State = 0
EndIf
Case 2
If *TextPtr\c = '}'
Result$ + PeekS(*Ptr, (*TextPtr - *Ptr) >> 1)
*Ptr = *TextPtr + 2
State = 0
EndIf
EndSelect
*TextPtr + 2
Wend
If State = 2
Result$ + "{?"
EndIf
Result$ + PeekS(*Ptr, (*TextPtr - *Ptr) >> 1)
ProcedureReturn Result$
EndProcedure
Txt$ = "abc} {?913} --- {?9745} } } } {?xyz"
Debug RemoveMarks(@Txt$)