Page 1 of 1
Search/replace phrases in the same case
Posted: Mon Feb 19, 2018 9:27 am
by Dude
Got a tricky one for you.
I need to search/replace some text, BUT in the
same case as the original; and only for
whole words; and as
fast as possible.
Fast speed is
vital because I will be scanning large blocks of text (500 KB or more). I'm hoping for less than a second for the job.
Here's an example of "alot" to be replaced with "a lot" in the manner I require:
Original: The ze
alot drank
alot. Quite
aLoT in fact.
Modded: The ze
alot drank
a lot. Quite
a LoT in fact.
See how the blue text is left intact, since it's part of another word? That's vital, too.
So... anyone feel up to the challenge?

Re: Search/replace phrases in the same case
Posted: Mon Feb 19, 2018 11:23 am
by Wolfram
First replace alot than aLoT.
While replacing activate the options "case sensitive" and "whole words only".
Re: Search/replace phrases in the same case
Posted: Mon Feb 19, 2018 11:43 am
by Dude
Hi Wolfram,
This is for my app; not the IDE. ReplaceString() doesn't have a "whole words" parameter.
And you're missing the point: "alot" can appear several different ways: alot, ALOT, alOT, ALot, etc. The replacement has to match that case, so you can't really use ReplaceString() at all because of this.
See, it's not so easy.
[Edit] Actually, Microsoft Word can sort-of do it with its "Find all word forms" feature:

Re: Search/replace phrases in the same case
Posted: Mon Feb 19, 2018 12:19 pm
by oO0XX0Oo
Regular expressions?
You only need to adapt the second expression for different forms, e.g.
Testing it with a 516 kb UTF-8 file filled with only the sentence:
The zealot drank alot. Quite aLoT in fact.
it takes 64 ms...
Code: Select all
Define.i startTime, pattern1, pattern2, hSrcFile, hDstFile
Define.b encoding
Define.s file = "R:\a.txt", tmpFile, content
startTime = ElapsedMilliseconds()
pattern1 = CreateRegularExpression(#PB_Any, "\balot\b")
pattern2 = CreateRegularExpression(#PB_Any, "\baLoT\b")
If pattern1 And pattern2
hSrcFile = ReadFile(#PB_Any, file, #PB_File_SharedRead|#PB_UTF8)
If hSrcFile
encoding = ReadStringFormat(hSrcFile)
content = ReadString(hSrcFile, #PB_UTF8|#PB_File_IgnoreEOL)
CloseFile(hSrcFile)
EndIf
; Replace...
If MatchRegularExpression(pattern1, content) Or MatchRegularExpression(pattern2, content)
content = ReplaceRegularExpression(pattern1, content, "a lot")
content = ReplaceRegularExpression(pattern2, content, "a Lot")
; Write the changes to a new file
tmpFile = file + ".tmp"
hDstFile = CreateFile(#PB_Any, tmpFile, #PB_File_SharedWrite|#PB_File_NoBuffering|#PB_UTF8)
If hDstFile
If encoding = #PB_UTF8
WriteStringFormat(hDstFile, #PB_UTF8)
EndIf
WriteString(hDstFile, content, #PB_UTF8)
CloseFile(hDstFile)
EndIf
; Delete the src file and rename the dst file to the src file again
;DeleteFile(file, #PB_FileSystem_Force)
;RenameFile(tmpFile, file)
EndIf
FreeRegularExpression(pattern1)
FreeRegularExpression(pattern2)
EndIf
Debug "Time passed (ms): " + Str((ElapsedMilliseconds() - startTime))
Re: Search/replace phrases in the same case
Posted: Mon Feb 19, 2018 12:33 pm
by Dude
oO0XX0Oo wrote:Regular expressions?
I was considering those, but I read they're too slow for multiple string operations. However, your timing seems to imply otherwise, so I'll play with your example and see how I go. Thanks.
But the other issue is the number of ways "alot" can be written (alot, Alot, ALot, ALOt, ALOT, aLot, aLOt, ...). I would need to create many patterns to match on, so this may not be practical.
Here's what I'm testing with anyway:
Code: Select all
pattern1 = CreateRegularExpression(#PB_Any, "\balot\b")
pattern2 = CreateRegularExpression(#PB_Any, "\baLoT\b")
content.s = "The zealot drank alot. Quite aLoT in fact."
If MatchRegularExpression(pattern1, content) Or MatchRegularExpression(pattern2, content)
content = ReplaceRegularExpression(pattern1, content, "a lot")
content = ReplaceRegularExpression(pattern2, content, "a Lot")
EndIf
Debug content
FreeRegularExpression(pattern1)
FreeRegularExpression(pattern2)
Re: Search/replace phrases in the same case
Posted: Mon Feb 19, 2018 12:50 pm
by Dude
Actually, can a regular expression show where in the string the match starts? So I could just search for any combo of "alot" and then just manually throw a space in between the "a" and "lot". Would make life 100% easier for this problem!
[Edit] Just discovered RegularExpressionMatchPosition()! I'm getting there!

Re: Search/replace phrases in the same case
Posted: Mon Feb 19, 2018 1:02 pm
by RASHAD
Hi
Code: Select all
text$ = "The zealot drank alot. Quite aLoT in fact."
Restore StringData
For i = 1 To 6
Read.s orgData$
Read.s modData$
text$ = ReplaceString(text$,orgData$,modData$,#PB_String_CaseSensitive)
Next
Debug text$
DataSection
StringData:
Data.s " alot", " a lot", " Alot"," A lot", " ALot", " A Lot" ," ALOt"," A LOt", " ALOT"," A LOT" ," aLoT" ," a LoT"
EndDataSection
OR you can use fr.FINDREPLACE - FindText_(fr) - ReplaceText_(fr)
Re: Search/replace phrases in the same case
Posted: Mon Feb 19, 2018 1:11 pm
by oO0XX0Oo
But the other issue is the number of ways "alot" can be written (alot, Alot, ALot, ALOt, ALOT, aLot, aLOt, ...). I would need to create many patterns to match on, so this may not be practical
No you don't. You just need to adapt the pattern and use ranges
ALot, ALOt, ALOT, aLot, aLOt
matches everything in quotes above
and if "alot" can have multiple variants, do the same with it...
Only difference, upper- or lowercase "l"
Re: Search/replace phrases in the same case
Posted: Mon Feb 19, 2018 1:14 pm
by Dude
Thanks Rashad, but I don't want to hard-code all possibilities because there'll be too many. "alot" is just one example that I was using for testing and research.

I want to search and fix other longer strings too, like "all right" (alright) and "bar-b-cue" (barbecue).
I think I'm
this close thanks to oO0XX0Oo's help.
Re: Search/replace phrases in the same case
Posted: Mon Feb 19, 2018 1:18 pm
by Dude
oO0XX0Oo wrote:You just need to adapt the pattern and use ranges
ALot, ALOt, ALOT, aLot, aLOt
matches everything in quotes above
Okay, but why does this fail, then?
Code: Select all
pattern = CreateRegularExpression(#PB_Any, "\b[aA]L[oO][tT]\b")
content.s = "The zealot drank alot. Quite ALOT in fact."
If MatchRegularExpression(pattern, content)
content = ReplaceRegularExpression(pattern, content, "a lot")
EndIf
Debug content ; The zealot drank alot. Quite a lot in fact. <-- Second match not capitalized. :(
FreeRegularExpression(pattern)
Re: Search/replace phrases in the same case
Posted: Mon Feb 19, 2018 1:37 pm
by oO0XX0Oo
Because you are replacing the match with
^^
Re: Search/replace phrases in the same case
Posted: Mon Feb 19, 2018 1:40 pm
by Dude
I'm off to bed now, because I can't concentrate any longer. But this is almost working now:
Code: Select all
pattern = CreateRegularExpression(#PB_Any, "\b[aA]L[oO][tT]\b",#PB_RegularExpression_NoCase)
content.s = "The zealot drank alot. Quite ALOT in fact."
If MatchRegularExpression(pattern, content)
content = ReplaceRegularExpression(pattern, content, "a lot")
EndIf
Debug content ; The zealot drank a lot. Quite a lot in fact.
FreeRegularExpression(pattern)
Re: Search/replace phrases in the same case
Posted: Mon Feb 19, 2018 2:02 pm
by oO0XX0Oo
This won't work if Alot is at the beginning of a sentence...
Re: Search/replace phrases in the same case
Posted: Mon Feb 19, 2018 3:19 pm
by Zebuddi123
Hi Dude Try this
1.
Procedure to gen and match for any word if it helps
Code: Select all
sString.s = "alot"
Procedure.s GenWordRegEX(sString.s) ; Generates a matching regularexpression of any lower|upper case mixed description of a word
Protected iRegex.i, iNbrMatch.i, iIndex.i, sStmp.s
Protected Dim t$(0)
iRegex.i = CreateRegularExpression(#PB_Any, "\w");
iNbrMatch.i = ExtractRegularExpression(iRegex, sString, t$())
For iIndex = 0 To (iNbrMatch-1)
If iIndex <= (iNbrMatch - 1) : sORd.s = Chr(41) : Else : sORd = "" : EndIf
sStmp.s + Chr(40) + t$(iIndex) + Chr(124) + UCase(t$(iIndex)) + sORd
Next
sStmp + Chr(41) : sStmp = "(?<=\b)" + Left(sStmp + ")", (Len(sStmp) - 1)) + "(?=\b)"
FreeRegularExpression(iRegex)
FreeArray(t$())
ProcedureReturn sStmp
EndProcedure
Debug GenWordRegEX(sString)
2.
Its the kind of thing I gravitate towards

test extracts all the words from a string/file to a map, the words are used a the key (reduce the search list) and gen all regex`s for each work in the map
Results:
alot aLot of oF OF TrObLE To Too Do DO iT IT it By BY by HaND hand Etc
TOO
1. (?<=\b)(t|T)(o|O)(o|O)(?=\b)
TROBLE
2. (?<=\b)(t|T)(r|R)(o|O)(b|B)(l|L)(e|E)(?=\b)
IT
3. (?<=\b)(i|I)(t|T)(?=\b)
ALOT
4. (?<=\b)(a|A)(l|L)(o|O)(t|T)(?=\b)
BY
5. (?<=\b)(b|B)(y|Y)(?=\b)
ETC
6. (?<=\b)(e|E)(t|T)(c|C)(?=\b)
HAND
7. (?<=\b)(h|H)(a|A)(n|N)(d|D)(?=\b)
TO
8. (?<=\b)(t|T)(o|O)(?=\b)
DO
9. (?<=\b)(d|D)(o|O)(?=\b)
OF
10. (?<=\b)(o|O)(f|F)(?=\b)
Code: Select all
sString.s = "alot aLot of oF OF TrObLE To Too Do DO iT IT it By BY by HaND hand Etc"
NewMap _m_Words.s()
Procedure GenWordList(sFile.s, Map _m_Words.s())
Protected iRegEx.i = CreateRegularExpression(#PB_Any, "\w+")
Protected iNbrMatchs.i, iIndex.i, Dim t$(0)
If MatchRegularExpression(iRegEx, sFile)
iNbrMatchs = ExtractRegularExpression(iRegEx, sFile, t$())
For iIndex = 0 To (iNbrMatchs - 1)
AddMapElement(_m_Words(), UCase(t$(iIndex)))
Next
EndIf
FreeRegularExpression(iRegEx)
FreeArray(t$())
EndProcedure
Procedure.s GenWordRegEX(sString.s) ; Generates a matching regularexpression of any lower|upper case mixed description of a word
Protected iRegex.i, iNbrMatch.i, iIndex.i, sStmp.s
Protected Dim t$(0)
iRegex.i = CreateRegularExpression(#PB_Any, "\w");
iNbrMatch.i = ExtractRegularExpression(iRegex, sString, t$())
For iIndex = 0 To (iNbrMatch-1)
If iIndex <= (iNbrMatch - 1) : sORd.s = Chr(41) : Else : sORd = "" : EndIf
sStmp.s + Chr(40) + LCase(t$(iIndex)) + Chr(124) + UCase(t$(iIndex)) + sORd
Next
sStmp + Chr(41) : sStmp = "(?<=\b)" + Left(sStmp + ")", (Len(sStmp) - 1)) + "(?=\b)"
FreeRegularExpression(iRegex)
FreeArray(t$())
ProcedureReturn sStmp
EndProcedure
GenWordList(sString, _m_Words())
Define iCounter.i
Debug sString
ForEach _m_Words()
iCounter + 1
Debug MapKey(_m_Words())
Debug Str(iCounter) + ". " + GenWordRegEX(MapKey(_m_Words()))
Debug ""
Next
CallDebugger
Re: Search/replace phrases in the same case
Posted: Wed Feb 21, 2018 1:15 pm
by Dude
Thanks, Zebuddi123! Testing your examples now.
