How to get the word_before and word_after a specific keyword from a string.
Re: How to get the word_before and word_after a specific keyword from a string.
I like the pointors. Thank you infratec.
Re: How to get the word_before and word_after a specific keyword from a string.
I think I will post a code using pointors only (no Len(), no FindString(), etc ... ). I have not lots of time. I prepared this code since the start of the subject, but I did not care about the characters which must be excluded (here, '(', '+', etc... ), what it is no ready. I hesitate : to use a specific string which excludes the characters we want to exclude. Why not two strings ? Such a string allows to be as soft as the regex library.
Re: How to get the word_before and word_after a specific keyword from a string.
During searching for the end, you can already scan all words.
So you need only one loop over the string.
So you need only one loop over the string.
Re: How to get the word_before and word_after a specific keyword from a string.
Really nice version, I prefer it by far vs RegEx version.
Re: How to get the word_before and word_after a specific keyword from a string.
In one go, without FindString() and Len();
Code: Select all
EnableExplicit
Structure WordBeforeAfter_Structure
Before$
After$
EndStructure
Procedure.i GetWordBeforeAndAfter(String$, Keyword$, List ResultList.WordBeforeAfter_Structure())
Protected.i WordCounter, Offset
Protected Word$, LastWord$
Protected *String.Character
Protected NewList WordList$()
ClearList(ResultList())
*String = @String$
While *String\c <> 0
If (*String\c >= 'A' And *String\c <= 'Z') Or (*String\c >= '_' And *String\c <= 'z')
Word$ + Chr(*String\c)
Else
If Word$ <> ""
AddElement(WordList$())
WordList$() = Word$
WordCounter + 1
;Debug Word$
If LastWord$ = Keyword$
AddElement(ResultList())
If WordCounter - 3 >= 0
If SelectElement(WordList$(), WordCounter - 3)
ResultList()\Before$ = WordList$()
EndIf
EndIf
If LastElement(WordList$())
ResultList()\After$ = WordList$()
EndIf
EndIf
LastWord$ = Word$
Word$ = ""
EndIf
EndIf
*String + SizeOf(Character)
Wend
If Word$ <> "" Or LastWord$ = Keyword$
AddElement(WordList$())
WordList$() = Word$
WordCounter + 1
;Debug Word$
If LastWord$ = Keyword$ Or Word$ = Keyword$
AddElement(ResultList())
If Word$ <> Keyword$
Offset = 3
Else
Offset = 2
EndIf
If WordCounter - Offset >= 0
If SelectElement(WordList$(), WordCounter - Offset)
ResultList()\Before$ = WordList$()
EndIf
EndIf
If Word$ <> Keyword$
If LastElement(WordList$())
ResultList()\After$ = WordList$()
EndIf
EndIf
EndIf
EndIf
;Debug "-------"
ProcedureReturn ListSize(ResultList())
EndProcedure
NewList WordBeforeAfterList.WordBeforeAfter_Structure()
If GetWordBeforeAndAfter(" blah blah -(word_before keyword word_after!. blah blah ,word_before_ keyword word_after_. blah blah", "keyword", WordBeforeAfterList())
;If GetWordBeforeAndAfter(" blah blah -(word_before keyword word_after", "keyword", WordBeforeAfterList())
;If GetWordBeforeAndAfter("word_before keyword word_after", "keyword", WordBeforeAfterList())
;If GetWordBeforeAndAfter("keyword word_after", "keyword", WordBeforeAfterList())
;If GetWordBeforeAndAfter("keyword", "keyword", WordBeforeAfterList())
;If GetWordBeforeAndAfter("word_before keyword", "keyword", WordBeforeAfterList())
ForEach WordBeforeAfterList()
Debug WordBeforeAfterList()\Before$
Debug WordBeforeAfterList()\After$
Next
EndIf
Re: How to get the word_before and word_after a specific keyword from a string.
not much difference to Infratec's just uses a 3 element list.
Code: Select all
EnableExplicit
Macro AddQueue(lList, word)
FirstElement(llist)
DeleteElement(llist)
LastElement(llist)
AddElement(llist)
llist = Word
EndMacro
Procedure.i GetWordBeforeAndAfter(*String.Character, Keyword$, List wordlist.s())
Protected Word$,found,ct
ClearList(wordlist())
While *String\c <> 0
If *String\c > 32
Word$ + Chr(*String\c)
Else
If Word$ <> ""
If ct > 2
If Wordlist() = Keyword$
found =1
EndIf
AddQueue(WordList(),word$)
If found
word$=""
Break
EndIf
Else
AddElement(WordList())
WordList() = Word$
EndIf
ct+1
word$=""
EndIf
EndIf
*String + SizeOf(Character)
Wend
If Word$ <> ""
If ct > 2
AddQueue(WordList(),word$)
Else
AddElement(WordList())
WordList() = Word$
EndIf
EndIf
ProcedureReturn ListSize(wordlist())
EndProcedure
NewList WordBeforeAfterList.s()
If GetWordBeforeAndAfter(@" blah blah -(word_before keyword word_after!. blah blah ,word_before_ keyword word_after_. blah blah", "keyword", WordBeforeAfterList())
;If GetWordBeforeAndAfter(@" blah blah -(word_before keyword word_after", "keyword", WordBeforeAfterList())
;If GetWordBeforeAndAfter(@"word_before keyword word_after", "keyword", WordBeforeAfterList())
;If GetWordBeforeAndAfter(@"keyword word_after", "keyword", WordBeforeAfterList())
;If GetWordBeforeAndAfter(@"keyword", "keyword", WordBeforeAfterList())
;If GetWordBeforeAndAfter(@"word_before keyword", "keyword", WordBeforeAfterList())
ForEach WordBeforeAfterList()
Debug WordBeforeAfterList()
Next
EndIf
Re: How to get the word_before and word_after a specific keyword from a string.
I'm surprised at the complexity of some of the solutions put forward. I'd be concerned if they needed to be this complex, that the future task of maintaining the code would be difficult, especially as it might be someone else.
As I see it, there would only be five lines of code required, were it not for the fact that we must disregard characters ( ) , I assume also that multiple spaces should be treated as a single space. The power is in StringField(), because we can use it to delimit the word-before and after sections, without unnecessary complexity.
Input string : blah blah blah word_before keyword word_after blah blah blah
Word before : word_before
Word after : word_after
Input string : blah blah blah (word_before keyword word_after, blah blah blah
Word before : word_before
Word after : word_after
As I see it, there would only be five lines of code required, were it not for the fact that we must disregard characters ( ) , I assume also that multiple spaces should be treated as a single space. The power is in StringField(), because we can use it to delimit the word-before and after sections, without unnecessary complexity.
Code: Select all
string1.s = "blah blah blah word_before keyword word_after blah blah blah"
string2.s = " blah blah blah (word_before keyword word_after, blah blah blah "
Procedure process(text.s,keyword.s)
length=Len(text.s)
lastchr.s=" "
For stpos=1 To length ; Go through each character in the input string
chr.s=Mid(text.s,stpos,1) ; Extract each character in turn
If (chr.s <> " " Or lastchr.s <> " ") ; If character is not a space, or the previous character was not a space, include it
If chr.s <> "(" And chr.s <> ")" And chr.s <> "." And chr.s <> "," ; Exclude certain unwanted chars ( ) . ,
newstring.s=newstring.s+chr.s ; Include the character in the 'newstring.s' output
EndIf
EndIf
lastchr.s=chr.s ; Save this character for the next iteration in the loop
Next stpos
before.s = StringField(newstring.s, 1, keyword.s) ; Split the string on the keyword
countspc.i = CountString(before.s, " ") ; Count no. of spaces to obtain the last word before the keyword
before.s = StringField(before.s, countspc.i," ") ; Split the string on the last space
Debug "Input string : " + text.s ; Display the input string (before it is processed)
Debug "Word before : " + before.s ; Display the resulting word before
after.s = Trim(StringField(newstring.s, 2, keyword.s)) ; Split the string on the next space after keyword, remove trailing space
after.s = StringField(after.s, 1, " ") ; Split the string on the first space after keyword
Debug "Word after : " + after.s ; Display resulting word after
EndProcedure
process(string1.s, "keyword")
process(string2.s, "keyword")
Word before : word_before
Word after : word_after
Input string : blah blah blah (word_before keyword word_after, blah blah blah
Word before : word_before
Word after : word_after
Re: How to get the word_before and word_after a specific keyword from a string.
@Oso
you've just taken 2 steps back, each post was an improvement over the others, my post also simplified and improved the runtime complexity of Infratec's last code, he's a PB guru, he knows his stuff.
you've just taken 2 steps back, each post was an improvement over the others, my post also simplified and improved the runtime complexity of Infratec's last code, he's a PB guru, he knows his stuff.
Re: How to get the word_before and word_after a specific keyword from a string.
@Oso
Don't worry about the upcoming compatibility stories :
1) LTS = Long term support
2) if we doubt, we will quickly change what it needs to be changed, just after the battle.
I just notice that I haven't had time to put my head in the ring, that infratec already has my teeth. I'm trying to get up from my K.O.... I know I'll be able to get up, but I'm waiting for my dizziness to stop...
Don't worry about the upcoming compatibility stories :
1) LTS = Long term support
2) if we doubt, we will quickly change what it needs to be changed, just after the battle.
I just notice that I haven't had time to put my head in the ring, that infratec already has my teeth. I'm trying to get up from my K.O.... I know I'll be able to get up, but I'm waiting for my dizziness to stop...
Re: How to get the word_before and word_after a specific keyword from a string.
Also KO but my dizziness stopped
Based on Infratec's code also, my version, which should simplify a bit the code reading. Without using the word list, Offset.
Edit: Added String_Case parameter: #PB_String_CaseSensitive (default) or #PB_String_NoCase
Edit2: Optimize String_Case

Based on Infratec's code also, my version, which should simplify a bit the code reading. Without using the word list, Offset.
Code: Select all
EnableExplicit
Structure WordBeforeAfter_Structure
Before$
After$
EndStructure
Procedure.i GetWordBeforeAndAfter(String$, Keyword$, List ResultList.WordBeforeAfter_Structure(), StringNoCase = #PB_String_CaseSensitive)
Protected Word$, WordCase$, KeywordCase$, LastWord$, KeyFound
Protected *String.Character, *Keyword.Character
Debug "String: " + String$
Debug "Keyword: " + Keyword$
Debug "-->"
If StringNoCase
*Keyword = @Keyword$
While *Keyword\c <> 0
If *Keyword\c <= 'Z'
KeywordCase$ + Chr(*Keyword\c +32)
Else
KeywordCase$ + Chr(*Keyword\c)
EndIf
*Keyword + SizeOf(Character)
Wend
Else
KeywordCase$ = Keyword$
EndIf
ClearList(ResultList())
*String = @String$
Repeat
If (*String\c >= 'A' And *String\c <= 'Z') Or (*String\c >= '_' And *String\c <= 'z')
Word$ + Chr(*String\c)
If StringNoCase And *String\c <= 'Z'
WordCase$ + Chr(*String\c + 32)
Else
WordCase$ + Chr(*String\c)
EndIf
Else
If Word$ <> ""
If KeyFound
ResultList()\After$ = Word$
If WordCase$ = KeywordCase$
AddElement(ResultList())
ResultList()\Before$ = LastWord$
KeyFound = #True
Else
KeyFound = #False
EndIf
Else
If WordCase$ = KeywordCase$
AddElement(ResultList())
ResultList()\Before$ = LastWord$
KeyFound = #True
EndIf
EndIf
LastWord$ = Word$
Word$ = ""
WordCase$ = ""
EndIf
EndIf
If *String\c = 0
Break
EndIf
*String + SizeOf(Character)
ForEver
ProcedureReturn ListSize(ResultList())
EndProcedure
NewList WordBeforeAfterList.WordBeforeAfter_Structure()
Define String$ = "KeyWord blah blah -(word_before keyword word_after!. blah blah ,word_before_ keyword keyword word_after_. blah blah keyword"
;Define String$ = " blah blah -(word_before keyword word_after"
;Define String$ = "word_before keyword word_after"
;Define String$ = "keyword word_after"
;Define String$ = "keyword"
;Define String$ = "word_before keyword"
Define Keyword$ = "keyword"
;Define Keyword$ = "KeyWord"
If GetWordBeforeAndAfter(String$, Keyword$, WordBeforeAfterList())
;If GetWordBeforeAndAfter(String$, Keyword$, WordBeforeAfterList(), #PB_String_NoCase)
Define NbMatch
ForEach WordBeforeAfterList()
NbMatch + 1
Debug "Word_Before " + Str(NbMatch) + ": " + WordBeforeAfterList()\Before$
Debug "Word_After " + Str(NbMatch) + ": " + WordBeforeAfterList()\After$
Next
EndIf
Edit2: Optimize String_Case
Re: How to get the word_before and word_after a specific keyword from a string.
Wiv'out v'e teev' (long term fupport)
( hash("nuts") = 2*'n' + 3*'u' + 5*'t' + 7*'s' )
( hash("nuts") = 2*'n' + 3*'u' + 5*'t' + 7*'s' )
Code: Select all
Macro anotherMid(alpha, beta)
PeekS(alpha\wrd(3, beta), alpha\wrd(2, beta) )
EndMacro
#bpc = SizeOf(character) ; (b)ytes (p)er (c)haracter
#bpi = SizeOf(integer) ; (b)ytes (p)er (i)nteger
;longest word
;x32u; x32a; x64u; x64a
;114; 1469; ?(big); ?(big)
; (for beta reducing)
#greatestUnsignedCharacter = 1 << (8 * #bpc) - 1
#greatestSignedInteger = 1 << ((8 * #bpi) - 1) - 1
#cmLim = 1 << (8 * #bpc) - 1 ; (c)haracter (m)ask array (lim)it
#pvLim = 1 << 16 - 1 ; (p)rime (v)alue array (lim)it
Structure charMask
Array cm.a(#cmLim)
EndStructure
Structure wrd
Array wrd.i(3, 4095)
qty.i
EndStructure
Structure primeValue
Array pv.i(#pvLim)
EndStructure
Procedure cmCreate()
Define *this.charMask = AllocateMemory(SizeOf(charMask) )
InitializeStructure(*this, charMask)
ProcedureReturn *this
EndProcedure
Procedure pvCreate()
Define *this.primeValue = AllocateMemory(SizeOf(primeValue) )
Define i, j, sqrPvLim = Sqr(#pvLim)
InitializeStructure(*this, primeValue)
With *this
; *** 1/3 sieving ******************************************
i = 2
Repeat
If \pv(i) = 0
j = i * i
Repeat
\pv(j) = j
j + i
Until j > #pvLim
EndIf
i + 1
Until i > sqrPvLim
; *** 2/3 compacting ***************************************
j = 1
For i = 2 To #pvLim
If Not \pv(i)
\pv(j) = i
j + 1
EndIf
Next
j - 1
; *** 3/3 alpha reducing *****************************************
\pv(0) = j
ReDim \pv(j)
EndWith
ProcedureReturn *this
EndProcedure
Procedure hash(*a.character, *pv.primeValue)
While *a\c
i + 1
r + *pv\pv(i) * *a\c
*a + SizeOf(character)
Wend
ProcedureReturn r
EndProcedure
Procedure SplitFilterAndHash(*a.character, *cm.charMask, *pv.primeValue)
*c.wrd = AllocateMemory(SizeOf(wrd) ) ; resulting array
InitializeStructure(*c, wrd)
With *c
*a - #bpc
While *a\c
*a + #bpc
j + 1
If *cm\cm(*a\c)
If r
\wrd(0, k) = r
\wrd(2, k) = i
k + 1
i = 0
r = 0
EndIf
Else
If r = 0
\wrd(1, k) = j
\wrd(3, k) = *a
EndIf
i + 1
r + *pv\pv(i) * *a\c
EndIf
Wend
If r
\wrd(0, k) = r
\wrd(2, k) = i
EndIf
\qty = k
ProcedureReturn *c
EndWith
EndProcedure
; Here, we go !
Define *pv.primeValue = pvCreate()
Define *cm.charMask = cmCreate()
; WE EXCLUDE :
*cm\cm(9) = 1 ; tabulation char
*cm\cm(10) = 1 ; line feed char
*cm\cm(13) = 1 ; carriage return char
*cm\cm(32) = 1 ; space char
*cm\cm('(') = 1 ; 1st parenthesis char
*cm\cm('+') = 1 ; 'plus' char...
*cm\cm('e') = 1 ; and 'e' char...
*cm\cm('e') = 0 ; ...finally, nop : no 'e' char exclude...
*cm\cm('♞') = 0 ; We insure ourselves we keep the horse...
a$ = " monday (tuesday wednesday thursday+ friday"
weSearch = hash(@"wednesday", *pv)
Define *c.wrd = SplitFilterAndHash(@a$, *cm, *pv)
Debug a$
For i = 0 To *c\qty
Debug PeekS(*c\wrd(3, i), *c\wrd(2, i) )
If *c\wrd(0, i) = weSearch
Debug "before " + anotherMid(*c, i) + " there is " + anotherMid(*c, i - 1) + " and before again : " + anotherMid(*c, i - 2)
Debug "after " + anotherMid(*c, i) + " there is " + anotherMid(*c, i + 1) + " and after again : " + anotherMid(*c, i + 2)
EndIf
Next
Re: How to get the word_before and word_after a specific keyword from a string.
That's got my vote. Keep the horse 
Re: How to get the word_before and word_after a specific keyword from a string.
I fell off the horse ♞ on monday and friday 
Re: How to get the word_before and word_after a specific keyword from a string.
Although some misunderstood the question in the OP.
or maybe isn't clear enough.
There is no underscore or any other characters between words.
The only exception is ( parentheses in the word_before and , comma in the word_after.
But you all have, interesting concepts to learn from.
Thank you infratec. As always.
Thank you idle.
Thank you Oso.
Thank you olli.
olli you have a witty and dark sense of humor you are the winner here.
the horse bit , it is just another level.

or maybe isn't clear enough.

word_before and word_after as it's written does not exist in English, I wrote that way to convey the meaning of the string.What is the best way to get the word_before and word_after a specific keyword from a string.
That meets these two conditions.
First condition.
1. string.s="blah blah blah word_before keyword word_after blah blah blah"
Second condition.
2. string.s=" blah blah blah (word_before keyword word_after, blah blah blah "
There is no underscore or any other characters between words.
The only exception is ( parentheses in the word_before and , comma in the word_after.
But you all have, interesting concepts to learn from.
Thank you infratec. As always.
Thank you idle.
Thank you Oso.
I agree.Oso wrote: Fri Aug 26, 2022 11:09 pm I'm surprised at the complexity of some of the solutions put forward. I'd be concerned if they needed to be this complex, that the future task of maintaining the code would be difficult, especially as it might be someone else.
Thank you olli.
olli you have a witty and dark sense of humor you are the winner here.
the horse bit , it is just another level.


Last edited by dcr3 on Sat Aug 27, 2022 10:11 pm, edited 1 time in total.
Re: How to get the word_before and word_after a specific keyword from a string.
I've got ya, my apologies for 'throwing the spanner in the works', as they sayidle wrote: Sat Aug 27, 2022 2:27 am @Oso you've just taken 2 steps back, each post was an improvement over the others, my post also simplified and improved the runtime complexity of Infratec's last code, he's a PB guru, he knows his stuff.
