Page 1 of 2
FindString to support whole words
Posted: Wed May 14, 2025 12:47 pm
by BarryG
Can FindString maybe be updated with a flag to only find whole words? Like this:
Code: Select all
Debug FindString("text1$ text2$ text3$","t2$") ; Want 0 returned, not 11.
Re: FindString to support whole words
Posted: Wed May 14, 2025 2:20 pm
by AZJIO
Takes into account the word found at the beginning and at the end of the line.
Code: Select all
EnableExplicit
Procedure FindString2(Source.s, SearchString.s)
Protected pos = 1
Protected length = Len(SearchString)
Protected Result, charStart, charEnd
Repeat
pos = FindString(Source, SearchString, pos)
If pos
charStart = Asc(Mid(Source, pos - 1, 1))
charEnd = Asc(Mid(Source, pos + length, 1))
; Debug Chr(charStart)
; Debug Chr(charEnd)
; Debug pos
; If Not ((charStart >= 65 And charStart <= 90) Or (charStart >= 97 And charStart <= 122) Or (charStart >= 48 And charStart <= 57 Or charStart = 36) Or (charEnd >= 65 And charEnd <= 90) Or (charEnd >= 97 And charEnd <= 122) Or (charEnd >= 48 And charEnd <= 57) Or charEnd = 36)
If (Not ((charStart >= 65 And charStart <= 90) Or (charStart >= 97 And charStart <= 122) Or (charStart >= 48 And charStart <= 57 Or charStart = 36)) Or pos = 1) And Not ((charEnd >= 65 And charEnd <= 90) Or (charEnd >= 97 And charEnd <= 122) Or (charEnd >= 48 And charEnd <= 57) Or charEnd = 36)
Result = pos
Break
EndIf
pos + 1
Else
Break
EndIf
ForEver
ProcedureReturn Result
EndProcedure
Debug FindString2("text1$ text2$ ext text3$","ext")
Debug FindString2("t2$ text1$ text2$ text3$ t2$","t2$")
Takes into account the word found at the beginning and at the end of the line.
Code: Select all
EnableExplicit
Procedure FindString2(Source.s, SearchString.s, separator.s = #TAB$ + " .,:<>()[]{}!")
Protected pos = 1
Protected length = Len(SearchString)
Protected Result, charStart.s, charEnd.s
Repeat
pos = FindString(Source, SearchString, pos)
If pos
charStart = Mid(Source, pos - 1, 1)
charEnd = Mid(Source, pos + length, 1)
; Debug charStart
; Debug charEnd
; Debug pos
; If FindString(separator, charStart) And FindString(separator, charEnd)
If (pos = 1 Or FindString(separator, charStart)) And (FindString(separator, charEnd) Or charEnd = "")
Result = pos
Break
EndIf
pos + 1
Else
Break
EndIf
ForEver
ProcedureReturn Result
EndProcedure
Debug FindString2("text1$ text2$ text3$","t2$")
Debug FindString2("text1$ text2$ text3$ t2$","t2$")
Code: Select all
EnableExplicit
Procedure FindString2(Source.s, SearchString.s)
Protected pos = 1
Protected length = Len(SearchString)
Protected Result, charStart, charEnd
Repeat
pos = FindString(Source, SearchString, pos)
If pos
charStart = Asc(Mid(Source, pos - 1, 1))
charEnd = Asc(Mid(Source, pos + length, 1))
If pos = 1
charStart = 1
Else
Select charStart
Case 65 To 90, 97 To 122, 48 To 57, 36
pos + 1
Continue
EndSelect
EndIf
Select charEnd
Case 65 To 90, 97 To 122, 48 To 57, 36
pos + 1
Continue
EndSelect
Result = pos
Break
Else
Break
EndIf
ForEver
ProcedureReturn Result
EndProcedure
Debug FindString2("text1$ text2$ ext text3$","ext")
Debug FindString2("t2$ text1$ text2$ text3$ t2$","t2$")
Re: FindString to support whole words
Posted: Wed May 14, 2025 2:22 pm
by ebs
Not official, but how about
Code: Select all
Debug FindString("text1$ text2$ text3$"," t2$") ; returns 0
Re: FindString to support whole words
Posted: Wed May 14, 2025 3:36 pm
by NicTheQuick
Just use Regular Expressions. It's way easier with them. With a `\b` you can match word boundaries.
Re: FindString to support whole words
Posted: Wed May 14, 2025 3:44 pm
by Caronte3D
Code: Select all
Procedure FindStringWhole(textString$, toFind$)
Protected count = Len(toFind$)
Protected count2 = count + 1
Protected pos = 0
If textString$ = toFind$ Or Left(textString$, count2) = toFind$ + " "
pos = 1
ElseIf Right(textString$, count2) = " " + toFind$
pos = Len(textString$) - count+1
Else
pos = FindString(textString$, " " + toFind$ + " ")
If pos > 0
pos + 1
EndIf
EndIf
ProcedureReturn pos
EndProcedure
Debug FindStringWhole("text1$ text2$ text3$","t2$")
Re: FindString to support whole words
Posted: Wed May 14, 2025 4:08 pm
by Kiffi
@Caronte3D:
Code: Select all
Debug FindStringWhole("Hello World!","World")

Re: FindString to support whole words
Posted: Wed May 14, 2025 4:50 pm
by NicTheQuick
This is the simplest way for doing it with Regular Expressions:
Code: Select all
Procedure FindWholeWord(Source.s, SearchString.s, Mode.i = 0)
Protected regExFlags.i = #PB_RegularExpression_MultiLine
If Mode = #PB_String_NoCase
regExFlags | #PB_RegularExpression_NoCase
EndIf
Protected hRegEx.i = CreateRegularExpression(#PB_Any, "\b" + SearchString + "\b", regExFlags)
If Not hRegEx
ProcedureReturn -1
EndIf
If Not ExamineRegularExpression(hRegEx, Source)
ProcedureReturn -2
EndIf
Protected position.i
If NextRegularExpressionMatch(hRegEx)
position = RegularExpressionMatchPosition(hRegEx)
EndIf
FreeRegularExpression(hRegEx)
ProcedureReturn position
EndProcedure
Debug FindWholeWord("text1$ text2$ text3$", "t2$")
Debug FindWholeWord("Hello World!", "world", #PB_String_NoCase)
But this is not a complete procedure. You also have to escape all characters that have a special meaning in a RegEx before searching for it.
Also this version creates a lot of overhead because the whole RegEx engine must be loaded, the pattern must be parsed and finally the matching has to be done.
You can improve that in case you always want to find the same pattern. Then just create one RegEx out of it and use `ExamineRegularExpression()` on multiple strings.
Re: FindString to support whole words
Posted: Wed May 14, 2025 6:38 pm
by AZJIO
\b - works only for Latin letters and numbers and _, but is not suitable for $.
The string may contain meta characters and they must be escaped.
Import regex +150-200 kb to the file size.
Re: FindString to support whole words
Posted: Wed May 14, 2025 11:23 pm
by AZJIO
BarryG wrote: Wed May 14, 2025 12:47 pm
Can FindString maybe be updated with a flag to only find whole words? Like this:
Adding parameters can slow down the function.
If only the compiler counted the number of function parameters and chose a more optimized one. For example, if there are 4 parameters, then the FindString4 function is embedded, and if there are 5 parameters, then the FindString5 function is embedded, that is, as the parameters increase, the compiler would embed the function with the largest number of parameters, while the function name remains FindString. Then increasing the number of parameters would not harm those who want a fast function with fewer parameters. If there are many function calls with different parameters, then the only function with the largest number of parameters is embedded.
Re: FindString to support whole words
Posted: Thu May 15, 2025 12:11 am
by Quin
AZJIO wrote: Wed May 14, 2025 11:23 pm
Adding parameters can slow down the function.
If only the compiler counted the number of function parameters and chose a more optimized one. For example, if there are 4 parameters, then the FindString4 function is embedded, and if there are 5 parameters, then the FindString5 function is embedded, that is, as the parameters increase, the compiler would embed the function with the largest number of parameters, while the function name remains FindString. Then increasing the number of parameters would not harm those who want a fast function with fewer parameters. If there are many function calls with different parameters, then the only function with the largest number of parameters is embedded.
This sounds false to me.
Sure, the more parameters you add, the larger the paramsize and as a result the stack frame, and copying all the values from the parameters will take longer if you have more of them, but it shouldn't actually slow down the operation of the function.
Re: FindString to support whole words
Posted: Thu May 15, 2025 2:16 pm
by NicTheQuick
AZJIO wrote: Wed May 14, 2025 6:38 pm
\b - works only for Latin letters and numbers and _, but is not suitable for $.
Oh, then Purebasic's RegEx engine seems to be configured badly. Usually it works with UTF-8 characters and not just latin characters.
AZJIO wrote: Wed May 14, 2025 6:38 pm
The string may contain meta characters and they must be escaped.
Exactly. That's what I said.
AZJIO wrote: Wed May 14, 2025 6:38 pm
Import regex +150-200 kb to the file size.
I don't think a lot of people care about the executable size today.
Re: FindString to support whole words
Posted: Thu May 15, 2025 2:40 pm
by Quin
NicTheQuick wrote: Thu May 15, 2025 2:16 pm
AZJIO wrote: Wed May 14, 2025 6:38 pm
Import regex +150-200 kb to the file size.
I don't think a lot of people care about the executable size today.
I'm a huge proponent of executable size, trying to make my binaries as small as they can reasonably be.
And that's why I use PureBasic. Statically linking SQLite doesn't even bring my binary to a meg!
So, even if you care about binary size, PB is the best choice. Ever tried using regular expressions in Python or .NET and then seeing all the stuff it has to add to your dist?
Re: FindString to support whole words
Posted: Thu May 15, 2025 2:54 pm
by miso
Quin wrote: Thu May 15, 2025 2:40 pm
Ever tried using regular expressions in Python or .NET and then seeing all the stuff it has to add to your dist?
I,ve seen a monster executable created by python over a gig... :S
Was not made by me.
Re: FindString to support whole words
Posted: Thu May 15, 2025 3:02 pm
by Quin
miso wrote: Thu May 15, 2025 2:54 pm
Quin wrote: Thu May 15, 2025 2:40 pm
Ever tried using regular expressions in Python or .NET and then seeing all the stuff it has to add to your dist?
I,ve seen a monster executable created by python over a gig... :S
Was not made by me.
Sheesh!
Python isn't a bad language, but it is noooot for desktop apps. Web scrapers? Sure. Random scripts? Yeah. But please don't write your desktop apps in Python...
Re: FindString to support whole words
Posted: Thu May 15, 2025 3:06 pm
by NicTheQuick
Quin wrote: Thu May 15, 2025 2:40 pm
So, even if you care about binary size, PB is the best choice. Ever tried using regular expressions in Python or .NET and then seeing all the stuff it has to add to your dist?
Well, you can not compare a language that gets compiled into an executable with a script language that needs a VM. Also regular expressions are one of the included libraries of Python. So if you use it or not, it has always the same size. And I don't know anybody who compiles Python into an executable. Maybe some Windows guys do weird things like that. On Linux I never saw that. It compiles itself into bytecode but that's all.