Search/replace phrases in the same case

Just starting out? Need help? Post your questions and find answers here.
Dude
Addict
Addict
Posts: 1907
Joined: Mon Feb 16, 2015 2:49 pm

Search/replace phrases in the same case

Post by Dude »

Got a tricky one for you. :twisted:

I need to search/replace some text, BUT in the same case as the original; and only for whole words; and as fast as possible.

Fast speed is vital because I will be scanning large blocks of text (500 KB or more). I'm hoping for less than a second for the job.

Here's an example of "alot" to be replaced with "a lot" in the manner I require:

Original: The zealot drank alot. Quite aLoT in fact.
Modded: The zealot drank a lot. Quite a LoT in fact.

See how the blue text is left intact, since it's part of another word? That's vital, too.

So... anyone feel up to the challenge? :mrgreen:
Wolfram
Enthusiast
Enthusiast
Posts: 568
Joined: Thu May 30, 2013 4:39 pm

Re: Search/replace phrases in the same case

Post by Wolfram »

First replace alot than aLoT.
While replacing activate the options "case sensitive" and "whole words only".
macOS Catalina 10.15.7
Dude
Addict
Addict
Posts: 1907
Joined: Mon Feb 16, 2015 2:49 pm

Re: Search/replace phrases in the same case

Post by Dude »

Hi Wolfram,

This is for my app; not the IDE. ReplaceString() doesn't have a "whole words" parameter.

And you're missing the point: "alot" can appear several different ways: alot, ALOT, alOT, ALot, etc. The replacement has to match that case, so you can't really use ReplaceString() at all because of this.

See, it's not so easy. ;)

[Edit] Actually, Microsoft Word can sort-of do it with its "Find all word forms" feature:

Image

Image
oO0XX0Oo
User
User
Posts: 78
Joined: Thu Aug 10, 2017 7:35 am

Re: Search/replace phrases in the same case

Post by oO0XX0Oo »

Regular expressions?

You only need to adapt the second expression for different forms, e.g.

Code: Select all

"\baL[oO][tT]\b"
Testing it with a 516 kb UTF-8 file filled with only the sentence:
The zealot drank alot. Quite aLoT in fact.

it takes 64 ms...

Code: Select all

Define.i startTime, pattern1, pattern2, hSrcFile, hDstFile
Define.b encoding
Define.s file = "R:\a.txt", tmpFile, content

startTime = ElapsedMilliseconds()
pattern1 = CreateRegularExpression(#PB_Any, "\balot\b")
pattern2 = CreateRegularExpression(#PB_Any, "\baLoT\b")

If pattern1 And pattern2
  hSrcFile = ReadFile(#PB_Any, file, #PB_File_SharedRead|#PB_UTF8)
  If hSrcFile
    encoding = ReadStringFormat(hSrcFile)
    content = ReadString(hSrcFile, #PB_UTF8|#PB_File_IgnoreEOL)
    CloseFile(hSrcFile)
  EndIf

  ; Replace...
  If MatchRegularExpression(pattern1, content) Or MatchRegularExpression(pattern2, content)
    content = ReplaceRegularExpression(pattern1, content, "a lot")
    content = ReplaceRegularExpression(pattern2, content, "a Lot")

    ; Write the changes to a new file
    tmpFile = file + ".tmp"
    hDstFile = CreateFile(#PB_Any, tmpFile, #PB_File_SharedWrite|#PB_File_NoBuffering|#PB_UTF8)
    If hDstFile
      If encoding = #PB_UTF8
        WriteStringFormat(hDstFile, #PB_UTF8)
      EndIf
      WriteString(hDstFile, content, #PB_UTF8)
      CloseFile(hDstFile)
    EndIf

    ; Delete the src file and rename the dst file to the src file again
    ;DeleteFile(file, #PB_FileSystem_Force)
    ;RenameFile(tmpFile, file)
  EndIf
  FreeRegularExpression(pattern1)
  FreeRegularExpression(pattern2)
EndIf
Debug "Time passed (ms): " + Str((ElapsedMilliseconds() - startTime))
Last edited by oO0XX0Oo on Mon Feb 19, 2018 12:38 pm, edited 2 times in total.
Dude
Addict
Addict
Posts: 1907
Joined: Mon Feb 16, 2015 2:49 pm

Re: Search/replace phrases in the same case

Post by Dude »

oO0XX0Oo wrote:Regular expressions?
I was considering those, but I read they're too slow for multiple string operations. However, your timing seems to imply otherwise, so I'll play with your example and see how I go. Thanks. :)

But the other issue is the number of ways "alot" can be written (alot, Alot, ALot, ALOt, ALOT, aLot, aLOt, ...). I would need to create many patterns to match on, so this may not be practical.

Here's what I'm testing with anyway:

Code: Select all

pattern1 = CreateRegularExpression(#PB_Any, "\balot\b")
pattern2 = CreateRegularExpression(#PB_Any, "\baLoT\b")

content.s = "The zealot drank alot. Quite aLoT in fact."

If MatchRegularExpression(pattern1, content) Or MatchRegularExpression(pattern2, content)
  content = ReplaceRegularExpression(pattern1, content, "a lot")
  content = ReplaceRegularExpression(pattern2, content, "a Lot")
EndIf

Debug content

FreeRegularExpression(pattern1)
FreeRegularExpression(pattern2)
Dude
Addict
Addict
Posts: 1907
Joined: Mon Feb 16, 2015 2:49 pm

Re: Search/replace phrases in the same case

Post by Dude »

Actually, can a regular expression show where in the string the match starts? So I could just search for any combo of "alot" and then just manually throw a space in between the "a" and "lot". Would make life 100% easier for this problem!

[Edit] Just discovered RegularExpressionMatchPosition()! I'm getting there! :)
Last edited by Dude on Mon Feb 19, 2018 1:04 pm, edited 1 time in total.
RASHAD
PureBasic Expert
PureBasic Expert
Posts: 4660
Joined: Sun Apr 12, 2009 6:27 am

Re: Search/replace phrases in the same case

Post by RASHAD »

Hi

Code: Select all

text$ = "The zealot drank alot. Quite aLoT in fact."

Restore StringData
For i = 1 To 6
  Read.s orgData$
  Read.s modData$
text$ = ReplaceString(text$,orgData$,modData$,#PB_String_CaseSensitive)
Next

Debug text$

DataSection
  StringData:
  Data.s " alot", " a lot", " Alot"," A lot", " ALot", " A Lot" ," ALOt"," A LOt", " ALOT"," A LOT" ," aLoT" ," a LoT"
EndDataSection
OR you can use fr.FINDREPLACE - FindText_(fr) - ReplaceText_(fr)
Egypt my love
oO0XX0Oo
User
User
Posts: 78
Joined: Thu Aug 10, 2017 7:35 am

Re: Search/replace phrases in the same case

Post by oO0XX0Oo »

But the other issue is the number of ways "alot" can be written (alot, Alot, ALot, ALOt, ALOT, aLot, aLOt, ...). I would need to create many patterns to match on, so this may not be practical
No you don't. You just need to adapt the pattern and use ranges
ALot, ALOt, ALOT, aLot, aLOt

Code: Select all

"\b[aA]L[oO][tT]\b"
matches everything in quotes above

and if "alot" can have multiple variants, do the same with it...

Code: Select all

"\b[aA]l[oO][tT]\b"
Only difference, upper- or lowercase "l"
Dude
Addict
Addict
Posts: 1907
Joined: Mon Feb 16, 2015 2:49 pm

Re: Search/replace phrases in the same case

Post by Dude »

Thanks Rashad, but I don't want to hard-code all possibilities because there'll be too many. "alot" is just one example that I was using for testing and research. :) I want to search and fix other longer strings too, like "all right" (alright) and "bar-b-cue" (barbecue).

I think I'm this close thanks to oO0XX0Oo's help.
Dude
Addict
Addict
Posts: 1907
Joined: Mon Feb 16, 2015 2:49 pm

Re: Search/replace phrases in the same case

Post by Dude »

oO0XX0Oo wrote:You just need to adapt the pattern and use ranges
ALot, ALOt, ALOT, aLot, aLOt

Code: Select all

"\b[aA]L[oO][tT]\b"
matches everything in quotes above
Okay, but why does this fail, then?

Code: Select all

pattern = CreateRegularExpression(#PB_Any, "\b[aA]L[oO][tT]\b")

content.s = "The zealot drank alot. Quite ALOT in fact."

If MatchRegularExpression(pattern, content)
  content = ReplaceRegularExpression(pattern, content, "a lot")
EndIf

Debug content ; The zealot drank alot. Quite a lot in fact. <-- Second match not capitalized. :(

FreeRegularExpression(pattern)
oO0XX0Oo
User
User
Posts: 78
Joined: Thu Aug 10, 2017 7:35 am

Re: Search/replace phrases in the same case

Post by oO0XX0Oo »

Because you are replacing the match with

Code: Select all

"a lot"
^^
Dude
Addict
Addict
Posts: 1907
Joined: Mon Feb 16, 2015 2:49 pm

Re: Search/replace phrases in the same case

Post by Dude »

I'm off to bed now, because I can't concentrate any longer. But this is almost working now:

Code: Select all

pattern = CreateRegularExpression(#PB_Any, "\b[aA]L[oO][tT]\b",#PB_RegularExpression_NoCase)

content.s = "The zealot drank alot. Quite ALOT in fact."

If MatchRegularExpression(pattern, content)
  content = ReplaceRegularExpression(pattern, content, "a lot")
EndIf

Debug content ; The zealot drank a lot. Quite a lot in fact.

FreeRegularExpression(pattern)
oO0XX0Oo
User
User
Posts: 78
Joined: Thu Aug 10, 2017 7:35 am

Re: Search/replace phrases in the same case

Post by oO0XX0Oo »

This won't work if Alot is at the beginning of a sentence...
User avatar
Zebuddi123
Enthusiast
Enthusiast
Posts: 794
Joined: Wed Feb 01, 2012 3:30 pm
Location: Nottinghamshire UK
Contact:

Re: Search/replace phrases in the same case

Post by Zebuddi123 »

Hi Dude Try this

Code: Select all

(?<=\b)(a|A)(l|L)(o|O)(t|T)(?=\b)

1.
Procedure to gen and match for any word if it helps

Code: Select all

sString.s = "alot"

Procedure.s GenWordRegEX(sString.s) ; Generates a matching regularexpression of any lower|upper case mixed description of a word
	Protected iRegex.i, iNbrMatch.i, iIndex.i, sStmp.s
	Protected Dim t$(0)
	iRegex.i 		= CreateRegularExpression(#PB_Any, "\w");
	iNbrMatch.i  	= ExtractRegularExpression(iRegex, sString, t$())
	
	For iIndex  = 0 To (iNbrMatch-1)
		If iIndex <= (iNbrMatch - 1) : sORd.s = Chr(41) : Else : sORd = "" : EndIf	
		sStmp.s	+ Chr(40) + t$(iIndex) + Chr(124) + UCase(t$(iIndex)) + sORd 
	Next
	
	sStmp + Chr(41) : sStmp  = "(?<=\b)" +  Left(sStmp + ")", (Len(sStmp) - 1))  + "(?=\b)"
	FreeRegularExpression(iRegex)
	FreeArray(t$())
	ProcedureReturn sStmp
EndProcedure

Debug GenWordRegEX(sString)
2.
Its the kind of thing I gravitate towards :) test extracts all the words from a string/file to a map, the words are used a the key (reduce the search list) and gen all regex`s for each work in the map

Results:

alot aLot of oF OF TrObLE To Too Do DO iT IT it By BY by HaND hand Etc
TOO
1. (?<=\b)(t|T)(o|O)(o|O)(?=\b)

TROBLE
2. (?<=\b)(t|T)(r|R)(o|O)(b|B)(l|L)(e|E)(?=\b)

IT
3. (?<=\b)(i|I)(t|T)(?=\b)

ALOT
4. (?<=\b)(a|A)(l|L)(o|O)(t|T)(?=\b)

BY
5. (?<=\b)(b|B)(y|Y)(?=\b)

ETC
6. (?<=\b)(e|E)(t|T)(c|C)(?=\b)

HAND
7. (?<=\b)(h|H)(a|A)(n|N)(d|D)(?=\b)

TO
8. (?<=\b)(t|T)(o|O)(?=\b)

DO
9. (?<=\b)(d|D)(o|O)(?=\b)

OF
10. (?<=\b)(o|O)(f|F)(?=\b)


Code: Select all

sString.s = "alot aLot of oF OF TrObLE To Too Do DO iT IT it By BY by HaND hand Etc"

NewMap _m_Words.s()

Procedure GenWordList(sFile.s, Map _m_Words.s())
	Protected iRegEx.i = CreateRegularExpression(#PB_Any, "\w+")
	Protected iNbrMatchs.i, iIndex.i, Dim t$(0)
	If MatchRegularExpression(iRegEx, sFile)
		iNbrMatchs = ExtractRegularExpression(iRegEx, sFile, t$())
		
		For iIndex = 0 To (iNbrMatchs - 1)
			AddMapElement(_m_Words(), UCase(t$(iIndex)))
		Next	
	EndIf	
	FreeRegularExpression(iRegEx)
	FreeArray(t$())
EndProcedure


Procedure.s GenWordRegEX(sString.s) ; Generates a matching regularexpression of any lower|upper case mixed description of a word
	Protected iRegex.i, iNbrMatch.i, iIndex.i, sStmp.s
	Protected Dim t$(0)
	iRegex.i 		= CreateRegularExpression(#PB_Any, "\w");
	iNbrMatch.i  	= ExtractRegularExpression(iRegex, sString, t$())
	
	For iIndex  = 0 To (iNbrMatch-1)
		If iIndex <= (iNbrMatch - 1) : sORd.s = Chr(41) : Else : sORd = "" : EndIf	
		sStmp.s	+ Chr(40) + LCase(t$(iIndex)) + Chr(124) + UCase(t$(iIndex)) + sORd 
	Next
	
	sStmp + Chr(41) : sStmp  = "(?<=\b)" +  Left(sStmp + ")", (Len(sStmp) - 1))  + "(?=\b)"
	FreeRegularExpression(iRegex)
	FreeArray(t$())
	ProcedureReturn sStmp
EndProcedure

GenWordList(sString, _m_Words())


Define iCounter.i

Debug sString
ForEach _m_Words()
	iCounter + 1
	Debug MapKey(_m_Words())
	Debug Str(iCounter) + ". " + GenWordRegEX(MapKey(_m_Words()))
	Debug ""
Next	
CallDebugger
malleo, caput, bang. Ego, comprehendunt in tempore
Dude
Addict
Addict
Posts: 1907
Joined: Mon Feb 16, 2015 2:49 pm

Re: Search/replace phrases in the same case

Post by Dude »

Thanks, Zebuddi123! Testing your examples now. :)
Post Reply