FindString to support whole words

Got an idea for enhancing PureBasic? New command(s) you'd like to see?
BarryG
Addict
Addict
Posts: 4173
Joined: Thu Apr 18, 2019 8:17 am

FindString to support whole words

Post by BarryG »

Can FindString maybe be updated with a flag to only find whole words? Like this:

Code: Select all

Debug FindString("text1$ text2$ text3$","t2$") ; Want 0 returned, not 11.
AZJIO
Addict
Addict
Posts: 2191
Joined: Sun May 14, 2017 1:48 am

Re: FindString to support whole words

Post by AZJIO »

Takes into account the word found at the beginning and at the end of the line.

Code: Select all

EnableExplicit

Procedure FindString2(Source.s, SearchString.s)
	Protected pos = 1
	Protected length = Len(SearchString)
	Protected Result, charStart, charEnd
	Repeat
		pos = FindString(Source, SearchString, pos)
		If pos
			charStart = Asc(Mid(Source, pos - 1, 1))
			charEnd = Asc(Mid(Source, pos + length, 1))
; 			Debug Chr(charStart)
; 			Debug Chr(charEnd)
; 			Debug pos
; 			If Not ((charStart >= 65 And charStart <= 90) Or (charStart >= 97 And charStart <= 122) Or (charStart >= 48 And charStart <= 57 Or charStart = 36) Or (charEnd >= 65 And charEnd <= 90) Or (charEnd >= 97 And charEnd <= 122) Or (charEnd >= 48 And charEnd <= 57) Or charEnd = 36)
			If (Not ((charStart >= 65 And charStart <= 90) Or (charStart >= 97 And charStart <= 122) Or (charStart >= 48 And charStart <= 57 Or charStart = 36)) Or pos = 1) And Not ((charEnd >= 65 And charEnd <= 90) Or (charEnd >= 97 And charEnd <= 122) Or (charEnd >= 48 And charEnd <= 57) Or charEnd = 36)
				Result = pos
				Break
			EndIf
			pos + 1
		Else
			Break
		EndIf
	ForEver
	
    ProcedureReturn Result
EndProcedure

Debug FindString2("text1$ text2$ ext text3$","ext")
Debug FindString2("t2$ text1$ text2$ text3$ t2$","t2$")
Takes into account the word found at the beginning and at the end of the line.

Code: Select all

EnableExplicit

Procedure FindString2(Source.s, SearchString.s, separator.s = #TAB$ + " .,:<>()[]{}!")
	Protected pos = 1
	Protected length = Len(SearchString)
	Protected Result, charStart.s, charEnd.s
	Repeat
		pos = FindString(Source, SearchString, pos)
		If pos
			charStart = Mid(Source, pos - 1, 1)
			charEnd = Mid(Source, pos + length, 1)
; 			Debug charStart
; 			Debug charEnd
; 			Debug pos
; 			If FindString(separator, charStart) And FindString(separator, charEnd)
			If (pos = 1 Or FindString(separator, charStart)) And (FindString(separator, charEnd) Or charEnd = "")
				Result = pos
				Break
			EndIf
			pos + 1
		Else
			Break
		EndIf
	ForEver
	
    ProcedureReturn Result
EndProcedure

Debug FindString2("text1$ text2$ text3$","t2$")
Debug FindString2("text1$ text2$ text3$ t2$","t2$")

Code: Select all

EnableExplicit

Procedure FindString2(Source.s, SearchString.s)
	Protected pos = 1
	Protected length = Len(SearchString)
	Protected Result, charStart, charEnd
	Repeat
		pos = FindString(Source, SearchString, pos)
		If pos
			charStart = Asc(Mid(Source, pos - 1, 1))
			charEnd = Asc(Mid(Source, pos + length, 1))
			If pos = 1
				charStart = 1
			Else
				Select charStart
					Case 65 To 90, 97 To 122, 48 To 57, 36
						pos + 1
						Continue
				EndSelect
			EndIf
			Select charEnd
				Case 65 To 90, 97 To 122, 48 To 57, 36
					pos + 1
					Continue
			EndSelect
			Result = pos
			Break
		Else
			Break
		EndIf
	ForEver
	
	ProcedureReturn Result
EndProcedure

Debug FindString2("text1$ text2$ ext text3$","ext")
Debug FindString2("t2$ text1$ text2$ text3$ t2$","t2$")
Last edited by AZJIO on Wed May 14, 2025 4:08 pm, edited 7 times in total.
ebs
Enthusiast
Enthusiast
Posts: 561
Joined: Fri Apr 25, 2003 11:08 pm

Re: FindString to support whole words

Post by ebs »

Not official, but how about

Code: Select all

Debug FindString("text1$ text2$ text3$"," t2$") ; returns 0
User avatar
NicTheQuick
Addict
Addict
Posts: 1519
Joined: Sun Jun 22, 2003 7:43 pm
Location: Germany, Saarbrücken
Contact:

Re: FindString to support whole words

Post by NicTheQuick »

Just use Regular Expressions. It's way easier with them. With a `\b` you can match word boundaries.
The english grammar is freeware, you can use it freely - But it's not Open Source, i.e. you can not change it or publish it in altered way.
User avatar
Caronte3D
Addict
Addict
Posts: 1361
Joined: Fri Jan 22, 2016 5:33 pm
Location: Some Universe

Re: FindString to support whole words

Post by Caronte3D »

Code: Select all

Procedure FindStringWhole(textString$, toFind$)

  Protected count = Len(toFind$)
  Protected count2 = count + 1
  Protected pos = 0
  
  If textString$ = toFind$ Or Left(textString$, count2) = toFind$ + " "
    pos = 1
  ElseIf Right(textString$, count2) = " " + toFind$
    pos = Len(textString$) - count+1
  Else
    pos = FindString(textString$, " " + toFind$ + " ")
    If pos > 0
      pos + 1
    EndIf
  EndIf
  
  ProcedureReturn pos
EndProcedure

Debug FindStringWhole("text1$ text2$ text3$","t2$")
User avatar
Kiffi
Addict
Addict
Posts: 1502
Joined: Tue Mar 02, 2004 1:20 pm
Location: Amphibios 9

Re: FindString to support whole words

Post by Kiffi »

@Caronte3D:

Code: Select all

Debug FindStringWhole("Hello World!","World")
:wink:
Hygge
User avatar
NicTheQuick
Addict
Addict
Posts: 1519
Joined: Sun Jun 22, 2003 7:43 pm
Location: Germany, Saarbrücken
Contact:

Re: FindString to support whole words

Post by NicTheQuick »

This is the simplest way for doing it with Regular Expressions:

Code: Select all

Procedure FindWholeWord(Source.s, SearchString.s, Mode.i = 0)
	Protected regExFlags.i = #PB_RegularExpression_MultiLine
	If Mode = #PB_String_NoCase
		regExFlags | #PB_RegularExpression_NoCase
	EndIf
	
	Protected hRegEx.i = CreateRegularExpression(#PB_Any, "\b" + SearchString + "\b", regExFlags)
	If Not hRegEx
		ProcedureReturn -1
	EndIf
	
	If Not ExamineRegularExpression(hRegEx, Source)
		ProcedureReturn -2
	EndIf
	Protected position.i
	If NextRegularExpressionMatch(hRegEx)
		position = RegularExpressionMatchPosition(hRegEx)
	EndIf
	FreeRegularExpression(hRegEx)
	ProcedureReturn position
EndProcedure

Debug FindWholeWord("text1$ text2$ text3$", "t2$")
Debug FindWholeWord("Hello World!", "world", #PB_String_NoCase)
But this is not a complete procedure. You also have to escape all characters that have a special meaning in a RegEx before searching for it.
Also this version creates a lot of overhead because the whole RegEx engine must be loaded, the pattern must be parsed and finally the matching has to be done.
You can improve that in case you always want to find the same pattern. Then just create one RegEx out of it and use `ExamineRegularExpression()` on multiple strings.
The english grammar is freeware, you can use it freely - But it's not Open Source, i.e. you can not change it or publish it in altered way.
AZJIO
Addict
Addict
Posts: 2191
Joined: Sun May 14, 2017 1:48 am

Re: FindString to support whole words

Post by AZJIO »

\b - works only for Latin letters and numbers and _, but is not suitable for $.
The string may contain meta characters and they must be escaped.
Import regex +150-200 kb to the file size.
AZJIO
Addict
Addict
Posts: 2191
Joined: Sun May 14, 2017 1:48 am

Re: FindString to support whole words

Post by AZJIO »

BarryG wrote: Wed May 14, 2025 12:47 pm Can FindString maybe be updated with a flag to only find whole words? Like this:
Adding parameters can slow down the function.
If only the compiler counted the number of function parameters and chose a more optimized one. For example, if there are 4 parameters, then the FindString4 function is embedded, and if there are 5 parameters, then the FindString5 function is embedded, that is, as the parameters increase, the compiler would embed the function with the largest number of parameters, while the function name remains FindString. Then increasing the number of parameters would not harm those who want a fast function with fewer parameters. If there are many function calls with different parameters, then the only function with the largest number of parameters is embedded.
Quin
Addict
Addict
Posts: 1133
Joined: Thu Mar 31, 2022 7:03 pm
Location: Colorado, United States
Contact:

Re: FindString to support whole words

Post by Quin »

AZJIO wrote: Wed May 14, 2025 11:23 pm Adding parameters can slow down the function.
If only the compiler counted the number of function parameters and chose a more optimized one. For example, if there are 4 parameters, then the FindString4 function is embedded, and if there are 5 parameters, then the FindString5 function is embedded, that is, as the parameters increase, the compiler would embed the function with the largest number of parameters, while the function name remains FindString. Then increasing the number of parameters would not harm those who want a fast function with fewer parameters. If there are many function calls with different parameters, then the only function with the largest number of parameters is embedded.
This sounds false to me.
Sure, the more parameters you add, the larger the paramsize and as a result the stack frame, and copying all the values from the parameters will take longer if you have more of them, but it shouldn't actually slow down the operation of the function.
User avatar
NicTheQuick
Addict
Addict
Posts: 1519
Joined: Sun Jun 22, 2003 7:43 pm
Location: Germany, Saarbrücken
Contact:

Re: FindString to support whole words

Post by NicTheQuick »

AZJIO wrote: Wed May 14, 2025 6:38 pm \b - works only for Latin letters and numbers and _, but is not suitable for $.
Oh, then Purebasic's RegEx engine seems to be configured badly. Usually it works with UTF-8 characters and not just latin characters.
AZJIO wrote: Wed May 14, 2025 6:38 pm The string may contain meta characters and they must be escaped.
Exactly. That's what I said.
AZJIO wrote: Wed May 14, 2025 6:38 pm Import regex +150-200 kb to the file size.
I don't think a lot of people care about the executable size today.
The english grammar is freeware, you can use it freely - But it's not Open Source, i.e. you can not change it or publish it in altered way.
Quin
Addict
Addict
Posts: 1133
Joined: Thu Mar 31, 2022 7:03 pm
Location: Colorado, United States
Contact:

Re: FindString to support whole words

Post by Quin »

NicTheQuick wrote: Thu May 15, 2025 2:16 pm
AZJIO wrote: Wed May 14, 2025 6:38 pm Import regex +150-200 kb to the file size.
I don't think a lot of people care about the executable size today.
I'm a huge proponent of executable size, trying to make my binaries as small as they can reasonably be.
And that's why I use PureBasic. Statically linking SQLite doesn't even bring my binary to a meg!
So, even if you care about binary size, PB is the best choice. Ever tried using regular expressions in Python or .NET and then seeing all the stuff it has to add to your dist?
miso
Enthusiast
Enthusiast
Posts: 466
Joined: Sat Oct 21, 2023 4:06 pm
Location: Hungary

Re: FindString to support whole words

Post by miso »

Quin wrote: Thu May 15, 2025 2:40 pm Ever tried using regular expressions in Python or .NET and then seeing all the stuff it has to add to your dist?
I,ve seen a monster executable created by python over a gig... :S
Was not made by me.
Quin
Addict
Addict
Posts: 1133
Joined: Thu Mar 31, 2022 7:03 pm
Location: Colorado, United States
Contact:

Re: FindString to support whole words

Post by Quin »

miso wrote: Thu May 15, 2025 2:54 pm
Quin wrote: Thu May 15, 2025 2:40 pm Ever tried using regular expressions in Python or .NET and then seeing all the stuff it has to add to your dist?
I,ve seen a monster executable created by python over a gig... :S
Was not made by me.
Sheesh!
Python isn't a bad language, but it is noooot for desktop apps. Web scrapers? Sure. Random scripts? Yeah. But please don't write your desktop apps in Python...
User avatar
NicTheQuick
Addict
Addict
Posts: 1519
Joined: Sun Jun 22, 2003 7:43 pm
Location: Germany, Saarbrücken
Contact:

Re: FindString to support whole words

Post by NicTheQuick »

Quin wrote: Thu May 15, 2025 2:40 pm So, even if you care about binary size, PB is the best choice. Ever tried using regular expressions in Python or .NET and then seeing all the stuff it has to add to your dist?
Well, you can not compare a language that gets compiled into an executable with a script language that needs a VM. Also regular expressions are one of the included libraries of Python. So if you use it or not, it has always the same size. And I don't know anybody who compiles Python into an executable. Maybe some Windows guys do weird things like that. On Linux I never saw that. It compiles itself into bytecode but that's all.
The english grammar is freeware, you can use it freely - But it's not Open Source, i.e. you can not change it or publish it in altered way.
Post Reply