Page 1 of 1
					
				Search/replace phrases in the same case
				Posted: Mon Feb 19, 2018 9:27 am
				by Dude
				Got a tricky one for you.  
 
I need to search/replace some text, BUT in the 
same case as the original; and only for 
whole words; and as 
fast as possible.
Fast speed is 
vital because I will be scanning large blocks of text (500 KB or more). I'm hoping for less than a second for the job.
Here's an example of "alot" to be replaced with "a lot" in the manner I require:
Original: The ze
alot drank 
alot. Quite 
aLoT in fact.
Modded: The ze
alot drank 
a lot. Quite 
a LoT in fact.
See how the blue text is left intact, since it's part of another word? That's vital, too.
So... anyone feel up to the challenge?  

 
			 
			
					
				Re: Search/replace phrases in the same case
				Posted: Mon Feb 19, 2018 11:23 am
				by Wolfram
				First replace alot than aLoT.
While replacing activate the options "case sensitive" and "whole words only".
			 
			
					
				Re: Search/replace phrases in the same case
				Posted: Mon Feb 19, 2018 11:43 am
				by Dude
				Hi Wolfram,
This is for my app; not the IDE. ReplaceString() doesn't have a "whole words" parameter.
And you're missing the point: "alot" can appear several different ways: alot, ALOT, alOT, ALot, etc. The replacement has to match that case, so you can't really use ReplaceString() at all because of this.
See, it's not so easy. 
[Edit] Actually, Microsoft Word can sort-of do it with its "Find all word forms" feature:

 
			 
			
					
				Re: Search/replace phrases in the same case
				Posted: Mon Feb 19, 2018 12:19 pm
				by oO0XX0Oo
				Regular expressions?
You only need to adapt the second expression for different forms, e.g.
Testing it with a 516 kb UTF-8 file filled with only the sentence:
The zealot drank alot. Quite aLoT in fact.
it takes 64 ms...
Code: Select all
Define.i startTime, pattern1, pattern2, hSrcFile, hDstFile
Define.b encoding
Define.s file = "R:\a.txt", tmpFile, content
startTime = ElapsedMilliseconds()
pattern1 = CreateRegularExpression(#PB_Any, "\balot\b")
pattern2 = CreateRegularExpression(#PB_Any, "\baLoT\b")
If pattern1 And pattern2
  hSrcFile = ReadFile(#PB_Any, file, #PB_File_SharedRead|#PB_UTF8)
  If hSrcFile
    encoding = ReadStringFormat(hSrcFile)
    content = ReadString(hSrcFile, #PB_UTF8|#PB_File_IgnoreEOL)
    CloseFile(hSrcFile)
  EndIf
  ; Replace...
  If MatchRegularExpression(pattern1, content) Or MatchRegularExpression(pattern2, content)
    content = ReplaceRegularExpression(pattern1, content, "a lot")
    content = ReplaceRegularExpression(pattern2, content, "a Lot")
    ; Write the changes to a new file
    tmpFile = file + ".tmp"
    hDstFile = CreateFile(#PB_Any, tmpFile, #PB_File_SharedWrite|#PB_File_NoBuffering|#PB_UTF8)
    If hDstFile
      If encoding = #PB_UTF8
        WriteStringFormat(hDstFile, #PB_UTF8)
      EndIf
      WriteString(hDstFile, content, #PB_UTF8)
      CloseFile(hDstFile)
    EndIf
    ; Delete the src file and rename the dst file to the src file again
    ;DeleteFile(file, #PB_FileSystem_Force)
    ;RenameFile(tmpFile, file)
  EndIf
  FreeRegularExpression(pattern1)
  FreeRegularExpression(pattern2)
EndIf
Debug "Time passed (ms): " + Str((ElapsedMilliseconds() - startTime))
 
			 
			
					
				Re: Search/replace phrases in the same case
				Posted: Mon Feb 19, 2018 12:33 pm
				by Dude
				oO0XX0Oo wrote:Regular expressions?
I was considering those, but I read they're too slow for multiple string operations. However, your timing seems to imply otherwise, so I'll play with your example and see how I go. Thanks. 
But the other issue is the number of ways "alot" can be written (alot, Alot, ALot, ALOt, ALOT, aLot, aLOt, ...). I would need to create many patterns to match on, so this may not be practical.
Here's what I'm testing with anyway:
Code: Select all
pattern1 = CreateRegularExpression(#PB_Any, "\balot\b")
pattern2 = CreateRegularExpression(#PB_Any, "\baLoT\b")
content.s = "The zealot drank alot. Quite aLoT in fact."
If MatchRegularExpression(pattern1, content) Or MatchRegularExpression(pattern2, content)
  content = ReplaceRegularExpression(pattern1, content, "a lot")
  content = ReplaceRegularExpression(pattern2, content, "a Lot")
EndIf
Debug content
FreeRegularExpression(pattern1)
FreeRegularExpression(pattern2)
 
			 
			
					
				Re: Search/replace phrases in the same case
				Posted: Mon Feb 19, 2018 12:50 pm
				by Dude
				Actually, can a regular expression show where in the string the match starts? So I could just search for any combo of "alot" and then just manually throw a space in between the "a" and "lot". Would make life 100% easier for this problem!
[Edit] Just discovered RegularExpressionMatchPosition()! I'm getting there! 

 
			 
			
					
				Re: Search/replace phrases in the same case
				Posted: Mon Feb 19, 2018 1:02 pm
				by RASHAD
				Hi
Code: Select all
text$ = "The zealot drank alot. Quite aLoT in fact."
Restore StringData
For i = 1 To 6
  Read.s orgData$
  Read.s modData$
text$ = ReplaceString(text$,orgData$,modData$,#PB_String_CaseSensitive)
Next
Debug text$
DataSection
  StringData:
  Data.s " alot", " a lot", " Alot"," A lot", " ALot", " A Lot" ," ALOt"," A LOt", " ALOT"," A LOT" ," aLoT" ," a LoT"
EndDataSection
OR you can use fr.FINDREPLACE - FindText_(fr) - ReplaceText_(fr)
 
			 
			
					
				Re: Search/replace phrases in the same case
				Posted: Mon Feb 19, 2018 1:11 pm
				by oO0XX0Oo
				But the other issue is the number of ways "alot" can be written (alot, Alot, ALot, ALOt, ALOT, aLot, aLOt, ...). I would need to create many patterns to match on, so this may not be practical
No you don't. You just need to adapt the pattern and use ranges
ALot, ALOt, ALOT, aLot, aLOt
 matches everything in quotes above
and if "alot" can have multiple variants, do the same with it...
Only difference, upper- or lowercase "l"
 
			 
			
					
				Re: Search/replace phrases in the same case
				Posted: Mon Feb 19, 2018 1:14 pm
				by Dude
				Thanks Rashad, but I don't want to hard-code all possibilities because there'll be too many. "alot" is just one example that I was using for testing and research. 

 I want to search and fix other longer strings too, like "all right" (alright) and "bar-b-cue" (barbecue).
I think I'm 
this close thanks to oO0XX0Oo's help.
 
			 
			
					
				Re: Search/replace phrases in the same case
				Posted: Mon Feb 19, 2018 1:18 pm
				by Dude
				oO0XX0Oo wrote:You just need to adapt the pattern and use ranges
ALot, ALOt, ALOT, aLot, aLOt
 matches everything in quotes above
 
Okay, but why does this fail, then?
Code: Select all
pattern = CreateRegularExpression(#PB_Any, "\b[aA]L[oO][tT]\b")
content.s = "The zealot drank alot. Quite ALOT in fact."
If MatchRegularExpression(pattern, content)
  content = ReplaceRegularExpression(pattern, content, "a lot")
EndIf
Debug content ; The zealot drank alot. Quite a lot in fact. <-- Second match not capitalized. :(
FreeRegularExpression(pattern)
 
			 
			
					
				Re: Search/replace phrases in the same case
				Posted: Mon Feb 19, 2018 1:37 pm
				by oO0XX0Oo
				Because you are replacing the match with
^^
 
			 
			
					
				Re: Search/replace phrases in the same case
				Posted: Mon Feb 19, 2018 1:40 pm
				by Dude
				I'm off to bed now, because I can't concentrate any longer. But this is almost working now:
Code: Select all
pattern = CreateRegularExpression(#PB_Any, "\b[aA]L[oO][tT]\b",#PB_RegularExpression_NoCase)
content.s = "The zealot drank alot. Quite ALOT in fact."
If MatchRegularExpression(pattern, content)
  content = ReplaceRegularExpression(pattern, content, "a lot")
EndIf
Debug content ; The zealot drank a lot. Quite a lot in fact.
FreeRegularExpression(pattern)
 
			 
			
					
				Re: Search/replace phrases in the same case
				Posted: Mon Feb 19, 2018 2:02 pm
				by oO0XX0Oo
				This won't work if Alot is at the beginning of a sentence...
			 
			
					
				Re: Search/replace phrases in the same case
				Posted: Mon Feb 19, 2018 3:19 pm
				by Zebuddi123
				Hi Dude   Try this  
1.
Procedure to gen and match for any word  if it helps 
Code: Select all
sString.s = "alot"
Procedure.s GenWordRegEX(sString.s) ; Generates a matching regularexpression of any lower|upper case mixed description of a word
	Protected iRegex.i, iNbrMatch.i, iIndex.i, sStmp.s
	Protected Dim t$(0)
	iRegex.i 		= CreateRegularExpression(#PB_Any, "\w");
	iNbrMatch.i  	= ExtractRegularExpression(iRegex, sString, t$())
	
	For iIndex  = 0 To (iNbrMatch-1)
		If iIndex <= (iNbrMatch - 1) : sORd.s = Chr(41) : Else : sORd = "" : EndIf	
		sStmp.s	+ Chr(40) + t$(iIndex) + Chr(124) + UCase(t$(iIndex)) + sORd 
	Next
	
	sStmp + Chr(41) : sStmp  = "(?<=\b)" +  Left(sStmp + ")", (Len(sStmp) - 1))  + "(?=\b)"
	FreeRegularExpression(iRegex)
	FreeArray(t$())
	ProcedureReturn sStmp
EndProcedure
Debug GenWordRegEX(sString)
2.
 Its the kind of thing I gravitate towards 

   test extracts all the words from a string/file  to a map, the words are used a the key  (reduce the search list) and gen all regex`s for each work in the map
Results:
alot aLot of oF OF TrObLE To Too Do DO iT IT it By BY by HaND hand Etc
TOO
1. (?<=\b)(t|T)(o|O)(o|O)(?=\b)
TROBLE
2. (?<=\b)(t|T)(r|R)(o|O)(b|B)(l|L)(e|E)(?=\b)
IT
3. (?<=\b)(i|I)(t|T)(?=\b)
ALOT
4. (?<=\b)(a|A)(l|L)(o|O)(t|T)(?=\b)
BY
5. (?<=\b)(b|B)(y|Y)(?=\b)
ETC
6. (?<=\b)(e|E)(t|T)(c|C)(?=\b)
HAND
7. (?<=\b)(h|H)(a|A)(n|N)(d|D)(?=\b)
TO
8. (?<=\b)(t|T)(o|O)(?=\b)
DO
9. (?<=\b)(d|D)(o|O)(?=\b)
OF
10. (?<=\b)(o|O)(f|F)(?=\b)
Code: Select all
sString.s = "alot aLot of oF OF TrObLE To Too Do DO iT IT it By BY by HaND hand Etc"
NewMap _m_Words.s()
Procedure GenWordList(sFile.s, Map _m_Words.s())
	Protected iRegEx.i = CreateRegularExpression(#PB_Any, "\w+")
	Protected iNbrMatchs.i, iIndex.i, Dim t$(0)
	If MatchRegularExpression(iRegEx, sFile)
		iNbrMatchs = ExtractRegularExpression(iRegEx, sFile, t$())
		
		For iIndex = 0 To (iNbrMatchs - 1)
			AddMapElement(_m_Words(), UCase(t$(iIndex)))
		Next	
	EndIf	
	FreeRegularExpression(iRegEx)
	FreeArray(t$())
EndProcedure
Procedure.s GenWordRegEX(sString.s) ; Generates a matching regularexpression of any lower|upper case mixed description of a word
	Protected iRegex.i, iNbrMatch.i, iIndex.i, sStmp.s
	Protected Dim t$(0)
	iRegex.i 		= CreateRegularExpression(#PB_Any, "\w");
	iNbrMatch.i  	= ExtractRegularExpression(iRegex, sString, t$())
	
	For iIndex  = 0 To (iNbrMatch-1)
		If iIndex <= (iNbrMatch - 1) : sORd.s = Chr(41) : Else : sORd = "" : EndIf	
		sStmp.s	+ Chr(40) + LCase(t$(iIndex)) + Chr(124) + UCase(t$(iIndex)) + sORd 
	Next
	
	sStmp + Chr(41) : sStmp  = "(?<=\b)" +  Left(sStmp + ")", (Len(sStmp) - 1))  + "(?=\b)"
	FreeRegularExpression(iRegex)
	FreeArray(t$())
	ProcedureReturn sStmp
EndProcedure
GenWordList(sString, _m_Words())
Define iCounter.i
Debug sString
ForEach _m_Words()
	iCounter + 1
	Debug MapKey(_m_Words())
	Debug Str(iCounter) + ". " + GenWordRegEX(MapKey(_m_Words()))
	Debug ""
Next	
CallDebugger
 
			 
			
					
				Re: Search/replace phrases in the same case
				Posted: Wed Feb 21, 2018 1:15 pm
				by Dude
				Thanks, Zebuddi123! Testing your examples now. 
