Page 1 of 1

Search/replace phrases in the same case

Posted: Mon Feb 19, 2018 9:27 am
by Dude
Got a tricky one for you. :twisted:

I need to search/replace some text, BUT in the same case as the original; and only for whole words; and as fast as possible.

Fast speed is vital because I will be scanning large blocks of text (500 KB or more). I'm hoping for less than a second for the job.

Here's an example of "alot" to be replaced with "a lot" in the manner I require:

Original: The zealot drank alot. Quite aLoT in fact.
Modded: The zealot drank a lot. Quite a LoT in fact.

See how the blue text is left intact, since it's part of another word? That's vital, too.

So... anyone feel up to the challenge? :mrgreen:

Re: Search/replace phrases in the same case

Posted: Mon Feb 19, 2018 11:23 am
by Wolfram
First replace alot than aLoT.
While replacing activate the options "case sensitive" and "whole words only".

Re: Search/replace phrases in the same case

Posted: Mon Feb 19, 2018 11:43 am
by Dude
Hi Wolfram,

This is for my app; not the IDE. ReplaceString() doesn't have a "whole words" parameter.

And you're missing the point: "alot" can appear several different ways: alot, ALOT, alOT, ALot, etc. The replacement has to match that case, so you can't really use ReplaceString() at all because of this.

See, it's not so easy. ;)

[Edit] Actually, Microsoft Word can sort-of do it with its "Find all word forms" feature:

Image

Image

Re: Search/replace phrases in the same case

Posted: Mon Feb 19, 2018 12:19 pm
by oO0XX0Oo
Regular expressions?

You only need to adapt the second expression for different forms, e.g.

Code: Select all

"\baL[oO][tT]\b"
Testing it with a 516 kb UTF-8 file filled with only the sentence:
The zealot drank alot. Quite aLoT in fact.

it takes 64 ms...

Code: Select all

Define.i startTime, pattern1, pattern2, hSrcFile, hDstFile
Define.b encoding
Define.s file = "R:\a.txt", tmpFile, content

startTime = ElapsedMilliseconds()
pattern1 = CreateRegularExpression(#PB_Any, "\balot\b")
pattern2 = CreateRegularExpression(#PB_Any, "\baLoT\b")

If pattern1 And pattern2
  hSrcFile = ReadFile(#PB_Any, file, #PB_File_SharedRead|#PB_UTF8)
  If hSrcFile
    encoding = ReadStringFormat(hSrcFile)
    content = ReadString(hSrcFile, #PB_UTF8|#PB_File_IgnoreEOL)
    CloseFile(hSrcFile)
  EndIf

  ; Replace...
  If MatchRegularExpression(pattern1, content) Or MatchRegularExpression(pattern2, content)
    content = ReplaceRegularExpression(pattern1, content, "a lot")
    content = ReplaceRegularExpression(pattern2, content, "a Lot")

    ; Write the changes to a new file
    tmpFile = file + ".tmp"
    hDstFile = CreateFile(#PB_Any, tmpFile, #PB_File_SharedWrite|#PB_File_NoBuffering|#PB_UTF8)
    If hDstFile
      If encoding = #PB_UTF8
        WriteStringFormat(hDstFile, #PB_UTF8)
      EndIf
      WriteString(hDstFile, content, #PB_UTF8)
      CloseFile(hDstFile)
    EndIf

    ; Delete the src file and rename the dst file to the src file again
    ;DeleteFile(file, #PB_FileSystem_Force)
    ;RenameFile(tmpFile, file)
  EndIf
  FreeRegularExpression(pattern1)
  FreeRegularExpression(pattern2)
EndIf
Debug "Time passed (ms): " + Str((ElapsedMilliseconds() - startTime))

Re: Search/replace phrases in the same case

Posted: Mon Feb 19, 2018 12:33 pm
by Dude
oO0XX0Oo wrote:Regular expressions?
I was considering those, but I read they're too slow for multiple string operations. However, your timing seems to imply otherwise, so I'll play with your example and see how I go. Thanks. :)

But the other issue is the number of ways "alot" can be written (alot, Alot, ALot, ALOt, ALOT, aLot, aLOt, ...). I would need to create many patterns to match on, so this may not be practical.

Here's what I'm testing with anyway:

Code: Select all

pattern1 = CreateRegularExpression(#PB_Any, "\balot\b")
pattern2 = CreateRegularExpression(#PB_Any, "\baLoT\b")

content.s = "The zealot drank alot. Quite aLoT in fact."

If MatchRegularExpression(pattern1, content) Or MatchRegularExpression(pattern2, content)
  content = ReplaceRegularExpression(pattern1, content, "a lot")
  content = ReplaceRegularExpression(pattern2, content, "a Lot")
EndIf

Debug content

FreeRegularExpression(pattern1)
FreeRegularExpression(pattern2)

Re: Search/replace phrases in the same case

Posted: Mon Feb 19, 2018 12:50 pm
by Dude
Actually, can a regular expression show where in the string the match starts? So I could just search for any combo of "alot" and then just manually throw a space in between the "a" and "lot". Would make life 100% easier for this problem!

[Edit] Just discovered RegularExpressionMatchPosition()! I'm getting there! :)

Re: Search/replace phrases in the same case

Posted: Mon Feb 19, 2018 1:02 pm
by RASHAD
Hi

Code: Select all

text$ = "The zealot drank alot. Quite aLoT in fact."

Restore StringData
For i = 1 To 6
  Read.s orgData$
  Read.s modData$
text$ = ReplaceString(text$,orgData$,modData$,#PB_String_CaseSensitive)
Next

Debug text$

DataSection
  StringData:
  Data.s " alot", " a lot", " Alot"," A lot", " ALot", " A Lot" ," ALOt"," A LOt", " ALOT"," A LOT" ," aLoT" ," a LoT"
EndDataSection
OR you can use fr.FINDREPLACE - FindText_(fr) - ReplaceText_(fr)

Re: Search/replace phrases in the same case

Posted: Mon Feb 19, 2018 1:11 pm
by oO0XX0Oo
But the other issue is the number of ways "alot" can be written (alot, Alot, ALot, ALOt, ALOT, aLot, aLOt, ...). I would need to create many patterns to match on, so this may not be practical
No you don't. You just need to adapt the pattern and use ranges
ALot, ALOt, ALOT, aLot, aLOt

Code: Select all

"\b[aA]L[oO][tT]\b"
matches everything in quotes above

and if "alot" can have multiple variants, do the same with it...

Code: Select all

"\b[aA]l[oO][tT]\b"
Only difference, upper- or lowercase "l"

Re: Search/replace phrases in the same case

Posted: Mon Feb 19, 2018 1:14 pm
by Dude
Thanks Rashad, but I don't want to hard-code all possibilities because there'll be too many. "alot" is just one example that I was using for testing and research. :) I want to search and fix other longer strings too, like "all right" (alright) and "bar-b-cue" (barbecue).

I think I'm this close thanks to oO0XX0Oo's help.

Re: Search/replace phrases in the same case

Posted: Mon Feb 19, 2018 1:18 pm
by Dude
oO0XX0Oo wrote:You just need to adapt the pattern and use ranges
ALot, ALOt, ALOT, aLot, aLOt

Code: Select all

"\b[aA]L[oO][tT]\b"
matches everything in quotes above
Okay, but why does this fail, then?

Code: Select all

pattern = CreateRegularExpression(#PB_Any, "\b[aA]L[oO][tT]\b")

content.s = "The zealot drank alot. Quite ALOT in fact."

If MatchRegularExpression(pattern, content)
  content = ReplaceRegularExpression(pattern, content, "a lot")
EndIf

Debug content ; The zealot drank alot. Quite a lot in fact. <-- Second match not capitalized. :(

FreeRegularExpression(pattern)

Re: Search/replace phrases in the same case

Posted: Mon Feb 19, 2018 1:37 pm
by oO0XX0Oo
Because you are replacing the match with

Code: Select all

"a lot"
^^

Re: Search/replace phrases in the same case

Posted: Mon Feb 19, 2018 1:40 pm
by Dude
I'm off to bed now, because I can't concentrate any longer. But this is almost working now:

Code: Select all

pattern = CreateRegularExpression(#PB_Any, "\b[aA]L[oO][tT]\b",#PB_RegularExpression_NoCase)

content.s = "The zealot drank alot. Quite ALOT in fact."

If MatchRegularExpression(pattern, content)
  content = ReplaceRegularExpression(pattern, content, "a lot")
EndIf

Debug content ; The zealot drank a lot. Quite a lot in fact.

FreeRegularExpression(pattern)

Re: Search/replace phrases in the same case

Posted: Mon Feb 19, 2018 2:02 pm
by oO0XX0Oo
This won't work if Alot is at the beginning of a sentence...

Re: Search/replace phrases in the same case

Posted: Mon Feb 19, 2018 3:19 pm
by Zebuddi123
Hi Dude Try this

Code: Select all

(?<=\b)(a|A)(l|L)(o|O)(t|T)(?=\b)

1.
Procedure to gen and match for any word if it helps

Code: Select all

sString.s = "alot"

Procedure.s GenWordRegEX(sString.s) ; Generates a matching regularexpression of any lower|upper case mixed description of a word
	Protected iRegex.i, iNbrMatch.i, iIndex.i, sStmp.s
	Protected Dim t$(0)
	iRegex.i 		= CreateRegularExpression(#PB_Any, "\w");
	iNbrMatch.i  	= ExtractRegularExpression(iRegex, sString, t$())
	
	For iIndex  = 0 To (iNbrMatch-1)
		If iIndex <= (iNbrMatch - 1) : sORd.s = Chr(41) : Else : sORd = "" : EndIf	
		sStmp.s	+ Chr(40) + t$(iIndex) + Chr(124) + UCase(t$(iIndex)) + sORd 
	Next
	
	sStmp + Chr(41) : sStmp  = "(?<=\b)" +  Left(sStmp + ")", (Len(sStmp) - 1))  + "(?=\b)"
	FreeRegularExpression(iRegex)
	FreeArray(t$())
	ProcedureReturn sStmp
EndProcedure

Debug GenWordRegEX(sString)
2.
Its the kind of thing I gravitate towards :) test extracts all the words from a string/file to a map, the words are used a the key (reduce the search list) and gen all regex`s for each work in the map

Results:

alot aLot of oF OF TrObLE To Too Do DO iT IT it By BY by HaND hand Etc
TOO
1. (?<=\b)(t|T)(o|O)(o|O)(?=\b)

TROBLE
2. (?<=\b)(t|T)(r|R)(o|O)(b|B)(l|L)(e|E)(?=\b)

IT
3. (?<=\b)(i|I)(t|T)(?=\b)

ALOT
4. (?<=\b)(a|A)(l|L)(o|O)(t|T)(?=\b)

BY
5. (?<=\b)(b|B)(y|Y)(?=\b)

ETC
6. (?<=\b)(e|E)(t|T)(c|C)(?=\b)

HAND
7. (?<=\b)(h|H)(a|A)(n|N)(d|D)(?=\b)

TO
8. (?<=\b)(t|T)(o|O)(?=\b)

DO
9. (?<=\b)(d|D)(o|O)(?=\b)

OF
10. (?<=\b)(o|O)(f|F)(?=\b)


Code: Select all

sString.s = "alot aLot of oF OF TrObLE To Too Do DO iT IT it By BY by HaND hand Etc"

NewMap _m_Words.s()

Procedure GenWordList(sFile.s, Map _m_Words.s())
	Protected iRegEx.i = CreateRegularExpression(#PB_Any, "\w+")
	Protected iNbrMatchs.i, iIndex.i, Dim t$(0)
	If MatchRegularExpression(iRegEx, sFile)
		iNbrMatchs = ExtractRegularExpression(iRegEx, sFile, t$())
		
		For iIndex = 0 To (iNbrMatchs - 1)
			AddMapElement(_m_Words(), UCase(t$(iIndex)))
		Next	
	EndIf	
	FreeRegularExpression(iRegEx)
	FreeArray(t$())
EndProcedure


Procedure.s GenWordRegEX(sString.s) ; Generates a matching regularexpression of any lower|upper case mixed description of a word
	Protected iRegex.i, iNbrMatch.i, iIndex.i, sStmp.s
	Protected Dim t$(0)
	iRegex.i 		= CreateRegularExpression(#PB_Any, "\w");
	iNbrMatch.i  	= ExtractRegularExpression(iRegex, sString, t$())
	
	For iIndex  = 0 To (iNbrMatch-1)
		If iIndex <= (iNbrMatch - 1) : sORd.s = Chr(41) : Else : sORd = "" : EndIf	
		sStmp.s	+ Chr(40) + LCase(t$(iIndex)) + Chr(124) + UCase(t$(iIndex)) + sORd 
	Next
	
	sStmp + Chr(41) : sStmp  = "(?<=\b)" +  Left(sStmp + ")", (Len(sStmp) - 1))  + "(?=\b)"
	FreeRegularExpression(iRegex)
	FreeArray(t$())
	ProcedureReturn sStmp
EndProcedure

GenWordList(sString, _m_Words())


Define iCounter.i

Debug sString
ForEach _m_Words()
	iCounter + 1
	Debug MapKey(_m_Words())
	Debug Str(iCounter) + ". " + GenWordRegEX(MapKey(_m_Words()))
	Debug ""
Next	
CallDebugger

Re: Search/replace phrases in the same case

Posted: Wed Feb 21, 2018 1:15 pm
by Dude
Thanks, Zebuddi123! Testing your examples now. :)