Search/replace phrases in the same case
Search/replace phrases in the same case
Got a tricky one for you.
I need to search/replace some text, BUT in the same case as the original; and only for whole words; and as fast as possible.
Fast speed is vital because I will be scanning large blocks of text (500 KB or more). I'm hoping for less than a second for the job.
Here's an example of "alot" to be replaced with "a lot" in the manner I require:
Original: The zealot drank alot. Quite aLoT in fact.
Modded: The zealot drank a lot. Quite a LoT in fact.
See how the blue text is left intact, since it's part of another word? That's vital, too.
So... anyone feel up to the challenge?
I need to search/replace some text, BUT in the same case as the original; and only for whole words; and as fast as possible.
Fast speed is vital because I will be scanning large blocks of text (500 KB or more). I'm hoping for less than a second for the job.
Here's an example of "alot" to be replaced with "a lot" in the manner I require:
Original: The zealot drank alot. Quite aLoT in fact.
Modded: The zealot drank a lot. Quite a LoT in fact.
See how the blue text is left intact, since it's part of another word? That's vital, too.
So... anyone feel up to the challenge?
Re: Search/replace phrases in the same case
First replace alot than aLoT.
While replacing activate the options "case sensitive" and "whole words only".
While replacing activate the options "case sensitive" and "whole words only".
macOS Catalina 10.15.7
Re: Search/replace phrases in the same case
Hi Wolfram,
This is for my app; not the IDE. ReplaceString() doesn't have a "whole words" parameter.
And you're missing the point: "alot" can appear several different ways: alot, ALOT, alOT, ALot, etc. The replacement has to match that case, so you can't really use ReplaceString() at all because of this.
See, it's not so easy.
[Edit] Actually, Microsoft Word can sort-of do it with its "Find all word forms" feature:
This is for my app; not the IDE. ReplaceString() doesn't have a "whole words" parameter.
And you're missing the point: "alot" can appear several different ways: alot, ALOT, alOT, ALot, etc. The replacement has to match that case, so you can't really use ReplaceString() at all because of this.
See, it's not so easy.
[Edit] Actually, Microsoft Word can sort-of do it with its "Find all word forms" feature:
Re: Search/replace phrases in the same case
Regular expressions?
You only need to adapt the second expression for different forms, e.g.
Testing it with a 516 kb UTF-8 file filled with only the sentence:
The zealot drank alot. Quite aLoT in fact.
it takes 64 ms...
You only need to adapt the second expression for different forms, e.g.
Code: Select all
"\baL[oO][tT]\b"
The zealot drank alot. Quite aLoT in fact.
it takes 64 ms...
Code: Select all
Define.i startTime, pattern1, pattern2, hSrcFile, hDstFile
Define.b encoding
Define.s file = "R:\a.txt", tmpFile, content
startTime = ElapsedMilliseconds()
pattern1 = CreateRegularExpression(#PB_Any, "\balot\b")
pattern2 = CreateRegularExpression(#PB_Any, "\baLoT\b")
If pattern1 And pattern2
hSrcFile = ReadFile(#PB_Any, file, #PB_File_SharedRead|#PB_UTF8)
If hSrcFile
encoding = ReadStringFormat(hSrcFile)
content = ReadString(hSrcFile, #PB_UTF8|#PB_File_IgnoreEOL)
CloseFile(hSrcFile)
EndIf
; Replace...
If MatchRegularExpression(pattern1, content) Or MatchRegularExpression(pattern2, content)
content = ReplaceRegularExpression(pattern1, content, "a lot")
content = ReplaceRegularExpression(pattern2, content, "a Lot")
; Write the changes to a new file
tmpFile = file + ".tmp"
hDstFile = CreateFile(#PB_Any, tmpFile, #PB_File_SharedWrite|#PB_File_NoBuffering|#PB_UTF8)
If hDstFile
If encoding = #PB_UTF8
WriteStringFormat(hDstFile, #PB_UTF8)
EndIf
WriteString(hDstFile, content, #PB_UTF8)
CloseFile(hDstFile)
EndIf
; Delete the src file and rename the dst file to the src file again
;DeleteFile(file, #PB_FileSystem_Force)
;RenameFile(tmpFile, file)
EndIf
FreeRegularExpression(pattern1)
FreeRegularExpression(pattern2)
EndIf
Debug "Time passed (ms): " + Str((ElapsedMilliseconds() - startTime))
Last edited by oO0XX0Oo on Mon Feb 19, 2018 12:38 pm, edited 2 times in total.
Re: Search/replace phrases in the same case
I was considering those, but I read they're too slow for multiple string operations. However, your timing seems to imply otherwise, so I'll play with your example and see how I go. Thanks.oO0XX0Oo wrote:Regular expressions?
But the other issue is the number of ways "alot" can be written (alot, Alot, ALot, ALOt, ALOT, aLot, aLOt, ...). I would need to create many patterns to match on, so this may not be practical.
Here's what I'm testing with anyway:
Code: Select all
pattern1 = CreateRegularExpression(#PB_Any, "\balot\b")
pattern2 = CreateRegularExpression(#PB_Any, "\baLoT\b")
content.s = "The zealot drank alot. Quite aLoT in fact."
If MatchRegularExpression(pattern1, content) Or MatchRegularExpression(pattern2, content)
content = ReplaceRegularExpression(pattern1, content, "a lot")
content = ReplaceRegularExpression(pattern2, content, "a Lot")
EndIf
Debug content
FreeRegularExpression(pattern1)
FreeRegularExpression(pattern2)
Re: Search/replace phrases in the same case
Actually, can a regular expression show where in the string the match starts? So I could just search for any combo of "alot" and then just manually throw a space in between the "a" and "lot". Would make life 100% easier for this problem!
[Edit] Just discovered RegularExpressionMatchPosition()! I'm getting there!
[Edit] Just discovered RegularExpressionMatchPosition()! I'm getting there!
Last edited by Dude on Mon Feb 19, 2018 1:04 pm, edited 1 time in total.
Re: Search/replace phrases in the same case
Hi
OR you can use fr.FINDREPLACE - FindText_(fr) - ReplaceText_(fr)
Code: Select all
text$ = "The zealot drank alot. Quite aLoT in fact."
Restore StringData
For i = 1 To 6
Read.s orgData$
Read.s modData$
text$ = ReplaceString(text$,orgData$,modData$,#PB_String_CaseSensitive)
Next
Debug text$
DataSection
StringData:
Data.s " alot", " a lot", " Alot"," A lot", " ALot", " A Lot" ," ALOt"," A LOt", " ALOT"," A LOT" ," aLoT" ," a LoT"
EndDataSection
Egypt my love
Re: Search/replace phrases in the same case
No you don't. You just need to adapt the pattern and use rangesBut the other issue is the number of ways "alot" can be written (alot, Alot, ALot, ALOt, ALOT, aLot, aLOt, ...). I would need to create many patterns to match on, so this may not be practical
ALot, ALOt, ALOT, aLot, aLOt
Code: Select all
"\b[aA]L[oO][tT]\b"
and if "alot" can have multiple variants, do the same with it...
Code: Select all
"\b[aA]l[oO][tT]\b"
Re: Search/replace phrases in the same case
Thanks Rashad, but I don't want to hard-code all possibilities because there'll be too many. "alot" is just one example that I was using for testing and research. I want to search and fix other longer strings too, like "all right" (alright) and "bar-b-cue" (barbecue).
I think I'm this close thanks to oO0XX0Oo's help.
I think I'm this close thanks to oO0XX0Oo's help.
Re: Search/replace phrases in the same case
Okay, but why does this fail, then?oO0XX0Oo wrote:You just need to adapt the pattern and use rangesALot, ALOt, ALOT, aLot, aLOtmatches everything in quotes aboveCode: Select all
"\b[aA]L[oO][tT]\b"
Code: Select all
pattern = CreateRegularExpression(#PB_Any, "\b[aA]L[oO][tT]\b")
content.s = "The zealot drank alot. Quite ALOT in fact."
If MatchRegularExpression(pattern, content)
content = ReplaceRegularExpression(pattern, content, "a lot")
EndIf
Debug content ; The zealot drank alot. Quite a lot in fact. <-- Second match not capitalized. :(
FreeRegularExpression(pattern)
Re: Search/replace phrases in the same case
Because you are replacing the match with
^^
Code: Select all
"a lot"
Re: Search/replace phrases in the same case
I'm off to bed now, because I can't concentrate any longer. But this is almost working now:
Code: Select all
pattern = CreateRegularExpression(#PB_Any, "\b[aA]L[oO][tT]\b",#PB_RegularExpression_NoCase)
content.s = "The zealot drank alot. Quite ALOT in fact."
If MatchRegularExpression(pattern, content)
content = ReplaceRegularExpression(pattern, content, "a lot")
EndIf
Debug content ; The zealot drank a lot. Quite a lot in fact.
FreeRegularExpression(pattern)
Re: Search/replace phrases in the same case
This won't work if Alot is at the beginning of a sentence...
- Zebuddi123
- Enthusiast
- Posts: 794
- Joined: Wed Feb 01, 2012 3:30 pm
- Location: Nottinghamshire UK
- Contact:
Re: Search/replace phrases in the same case
Hi Dude Try this
1.
Procedure to gen and match for any word if it helps
2.
Its the kind of thing I gravitate towards test extracts all the words from a string/file to a map, the words are used a the key (reduce the search list) and gen all regex`s for each work in the map
Results:
alot aLot of oF OF TrObLE To Too Do DO iT IT it By BY by HaND hand Etc
TOO
1. (?<=\b)(t|T)(o|O)(o|O)(?=\b)
TROBLE
2. (?<=\b)(t|T)(r|R)(o|O)(b|B)(l|L)(e|E)(?=\b)
IT
3. (?<=\b)(i|I)(t|T)(?=\b)
ALOT
4. (?<=\b)(a|A)(l|L)(o|O)(t|T)(?=\b)
BY
5. (?<=\b)(b|B)(y|Y)(?=\b)
ETC
6. (?<=\b)(e|E)(t|T)(c|C)(?=\b)
HAND
7. (?<=\b)(h|H)(a|A)(n|N)(d|D)(?=\b)
TO
8. (?<=\b)(t|T)(o|O)(?=\b)
DO
9. (?<=\b)(d|D)(o|O)(?=\b)
OF
10. (?<=\b)(o|O)(f|F)(?=\b)
Code: Select all
(?<=\b)(a|A)(l|L)(o|O)(t|T)(?=\b)
1.
Procedure to gen and match for any word if it helps
Code: Select all
sString.s = "alot"
Procedure.s GenWordRegEX(sString.s) ; Generates a matching regularexpression of any lower|upper case mixed description of a word
Protected iRegex.i, iNbrMatch.i, iIndex.i, sStmp.s
Protected Dim t$(0)
iRegex.i = CreateRegularExpression(#PB_Any, "\w");
iNbrMatch.i = ExtractRegularExpression(iRegex, sString, t$())
For iIndex = 0 To (iNbrMatch-1)
If iIndex <= (iNbrMatch - 1) : sORd.s = Chr(41) : Else : sORd = "" : EndIf
sStmp.s + Chr(40) + t$(iIndex) + Chr(124) + UCase(t$(iIndex)) + sORd
Next
sStmp + Chr(41) : sStmp = "(?<=\b)" + Left(sStmp + ")", (Len(sStmp) - 1)) + "(?=\b)"
FreeRegularExpression(iRegex)
FreeArray(t$())
ProcedureReturn sStmp
EndProcedure
Debug GenWordRegEX(sString)
Its the kind of thing I gravitate towards test extracts all the words from a string/file to a map, the words are used a the key (reduce the search list) and gen all regex`s for each work in the map
Results:
alot aLot of oF OF TrObLE To Too Do DO iT IT it By BY by HaND hand Etc
TOO
1. (?<=\b)(t|T)(o|O)(o|O)(?=\b)
TROBLE
2. (?<=\b)(t|T)(r|R)(o|O)(b|B)(l|L)(e|E)(?=\b)
IT
3. (?<=\b)(i|I)(t|T)(?=\b)
ALOT
4. (?<=\b)(a|A)(l|L)(o|O)(t|T)(?=\b)
BY
5. (?<=\b)(b|B)(y|Y)(?=\b)
ETC
6. (?<=\b)(e|E)(t|T)(c|C)(?=\b)
HAND
7. (?<=\b)(h|H)(a|A)(n|N)(d|D)(?=\b)
TO
8. (?<=\b)(t|T)(o|O)(?=\b)
DO
9. (?<=\b)(d|D)(o|O)(?=\b)
OF
10. (?<=\b)(o|O)(f|F)(?=\b)
Code: Select all
sString.s = "alot aLot of oF OF TrObLE To Too Do DO iT IT it By BY by HaND hand Etc"
NewMap _m_Words.s()
Procedure GenWordList(sFile.s, Map _m_Words.s())
Protected iRegEx.i = CreateRegularExpression(#PB_Any, "\w+")
Protected iNbrMatchs.i, iIndex.i, Dim t$(0)
If MatchRegularExpression(iRegEx, sFile)
iNbrMatchs = ExtractRegularExpression(iRegEx, sFile, t$())
For iIndex = 0 To (iNbrMatchs - 1)
AddMapElement(_m_Words(), UCase(t$(iIndex)))
Next
EndIf
FreeRegularExpression(iRegEx)
FreeArray(t$())
EndProcedure
Procedure.s GenWordRegEX(sString.s) ; Generates a matching regularexpression of any lower|upper case mixed description of a word
Protected iRegex.i, iNbrMatch.i, iIndex.i, sStmp.s
Protected Dim t$(0)
iRegex.i = CreateRegularExpression(#PB_Any, "\w");
iNbrMatch.i = ExtractRegularExpression(iRegex, sString, t$())
For iIndex = 0 To (iNbrMatch-1)
If iIndex <= (iNbrMatch - 1) : sORd.s = Chr(41) : Else : sORd = "" : EndIf
sStmp.s + Chr(40) + LCase(t$(iIndex)) + Chr(124) + UCase(t$(iIndex)) + sORd
Next
sStmp + Chr(41) : sStmp = "(?<=\b)" + Left(sStmp + ")", (Len(sStmp) - 1)) + "(?=\b)"
FreeRegularExpression(iRegex)
FreeArray(t$())
ProcedureReturn sStmp
EndProcedure
GenWordList(sString, _m_Words())
Define iCounter.i
Debug sString
ForEach _m_Words()
iCounter + 1
Debug MapKey(_m_Words())
Debug Str(iCounter) + ". " + GenWordRegEX(MapKey(_m_Words()))
Debug ""
Next
CallDebugger
malleo, caput, bang. Ego, comprehendunt in tempore
Re: Search/replace phrases in the same case
Thanks, Zebuddi123! Testing your examples now.