Page 2 of 3

Re: How to get the word_before and word_after a specific keyword from a string.

Posted: Fri Aug 26, 2022 5:37 pm
by Olli
I like the pointors. Thank you infratec.

Re: How to get the word_before and word_after a specific keyword from a string.

Posted: Fri Aug 26, 2022 8:21 pm
by Olli
I think I will post a code using pointors only (no Len(), no FindString(), etc ... ). I have not lots of time. I prepared this code since the start of the subject, but I did not care about the characters which must be excluded (here, '(', '+', etc... ), what it is no ready. I hesitate : to use a specific string which excludes the characters we want to exclude. Why not two strings ? Such a string allows to be as soft as the regex library.

Re: How to get the word_before and word_after a specific keyword from a string.

Posted: Fri Aug 26, 2022 9:00 pm
by infratec
During searching for the end, you can already scan all words.
So you need only one loop over the string.

Re: How to get the word_before and word_after a specific keyword from a string.

Posted: Fri Aug 26, 2022 9:52 pm
by ChrisR
infratec wrote: Fri Aug 26, 2022 1:20 pm Ok, my version:
Without RegEx and with pointers.
Really nice version, I prefer it by far vs RegEx version.

Re: How to get the word_before and word_after a specific keyword from a string.

Posted: Fri Aug 26, 2022 9:56 pm
by infratec
In one go, without FindString() and Len();

Code: Select all

EnableExplicit

Structure WordBeforeAfter_Structure
  Before$
  After$
EndStructure



Procedure.i GetWordBeforeAndAfter(String$, Keyword$, List ResultList.WordBeforeAfter_Structure())
  
  Protected.i WordCounter, Offset
  Protected Word$, LastWord$
  Protected *String.Character
  Protected NewList WordList$()
  
  
  ClearList(ResultList())
  
  *String = @String$
  While *String\c <> 0
    
    If (*String\c >= 'A' And *String\c <= 'Z') Or (*String\c >= '_' And *String\c <= 'z')
      Word$ + Chr(*String\c)
    Else
      If Word$ <> ""
        AddElement(WordList$())
        WordList$() = Word$
        WordCounter + 1
        ;Debug Word$
        If LastWord$ = Keyword$
          AddElement(ResultList())
          If WordCounter - 3 >= 0
            If SelectElement(WordList$(), WordCounter - 3)
              ResultList()\Before$ = WordList$()
            EndIf
          EndIf
          If LastElement(WordList$())
            ResultList()\After$ = WordList$()
          EndIf
        EndIf
        LastWord$ = Word$
        Word$ = ""
      EndIf
    EndIf
    
    *String + SizeOf(Character)
  Wend
  
  If Word$ <> "" Or LastWord$ = Keyword$
    AddElement(WordList$())
    WordList$() = Word$
    WordCounter + 1
    ;Debug Word$
    If LastWord$ = Keyword$ Or Word$ = Keyword$
      AddElement(ResultList())
      If Word$ <> Keyword$
        Offset = 3
      Else
        Offset = 2
      EndIf
      If WordCounter - Offset >= 0 
        If SelectElement(WordList$(), WordCounter - Offset)
          ResultList()\Before$ = WordList$()
        EndIf
      EndIf
      If Word$ <> Keyword$
        If LastElement(WordList$())
          ResultList()\After$ = WordList$()
        EndIf
      EndIf
    EndIf
    
  EndIf
  
  ;Debug "-------"
  ProcedureReturn ListSize(ResultList())
  
EndProcedure


NewList WordBeforeAfterList.WordBeforeAfter_Structure()

If GetWordBeforeAndAfter(" blah blah -(word_before keyword word_after!. blah blah ,word_before_ keyword word_after_. blah blah", "keyword", WordBeforeAfterList())
;If GetWordBeforeAndAfter(" blah blah -(word_before keyword word_after", "keyword", WordBeforeAfterList())
;If GetWordBeforeAndAfter("word_before keyword word_after", "keyword", WordBeforeAfterList())
;If GetWordBeforeAndAfter("keyword word_after", "keyword", WordBeforeAfterList())
;If GetWordBeforeAndAfter("keyword", "keyword", WordBeforeAfterList())
;If GetWordBeforeAndAfter("word_before keyword", "keyword", WordBeforeAfterList())
  ForEach WordBeforeAfterList()
    Debug WordBeforeAfterList()\Before$
    Debug WordBeforeAfterList()\After$
  Next
EndIf

Re: How to get the word_before and word_after a specific keyword from a string.

Posted: Fri Aug 26, 2022 10:48 pm
by idle
not much difference to Infratec's just uses a 3 element list.

Code: Select all

EnableExplicit

Macro AddQueue(lList, word) 
   FirstElement(llist)
   DeleteElement(llist) 
   LastElement(llist) 
   AddElement(llist)
   llist = Word
 EndMacro   

Procedure.i GetWordBeforeAndAfter(*String.Character, Keyword$, List wordlist.s())
  
  Protected Word$,found,ct
    
  ClearList(wordlist())
  
  While *String\c <> 0  
    If *String\c > 32  
      Word$ + Chr(*String\c) 
    Else
      If Word$ <> ""
        If ct > 2  
          If Wordlist() = Keyword$ 
            found =1 
          EndIf   
          AddQueue(WordList(),word$) 
          If found 
            word$=""            
            Break 
          EndIf   
        Else 
          AddElement(WordList())
          WordList() = Word$
        EndIf
        ct+1
        word$=""
      EndIf
    EndIf
    *String + SizeOf(Character)
  Wend 
  
  If Word$ <> ""
    If ct > 2 
      AddQueue(WordList(),word$)
    Else 
      AddElement(WordList())
      WordList() = Word$
    EndIf
  EndIf 
    
  ProcedureReturn ListSize(wordlist())
  
EndProcedure

NewList WordBeforeAfterList.s()

If GetWordBeforeAndAfter(@" blah blah -(word_before keyword word_after!. blah blah ,word_before_ keyword word_after_. blah blah", "keyword", WordBeforeAfterList())
;If GetWordBeforeAndAfter(@" blah blah -(word_before keyword word_after", "keyword", WordBeforeAfterList())
;If GetWordBeforeAndAfter(@"word_before keyword word_after", "keyword", WordBeforeAfterList())
;If GetWordBeforeAndAfter(@"keyword word_after", "keyword", WordBeforeAfterList())
;If GetWordBeforeAndAfter(@"keyword", "keyword", WordBeforeAfterList())
;If GetWordBeforeAndAfter(@"word_before keyword", "keyword", WordBeforeAfterList())
  ForEach WordBeforeAfterList()
    Debug WordBeforeAfterList()
  Next
EndIf



Re: How to get the word_before and word_after a specific keyword from a string.

Posted: Fri Aug 26, 2022 11:09 pm
by Oso
I'm surprised at the complexity of some of the solutions put forward. I'd be concerned if they needed to be this complex, that the future task of maintaining the code would be difficult, especially as it might be someone else.

As I see it, there would only be five lines of code required, were it not for the fact that we must disregard characters ( ) , I assume also that multiple spaces should be treated as a single space. The power is in StringField(), because we can use it to delimit the word-before and after sections, without unnecessary complexity.

Code: Select all

string1.s = "blah  blah blah       word_before keyword word_after blah blah blah"
string2.s = " blah blah blah (word_before keyword word_after, blah    blah     blah "

Procedure process(text.s,keyword.s)
  length=Len(text.s)    
  lastchr.s=" "
    
  For stpos=1 To length                                     ; Go through each character in the input string
    chr.s=Mid(text.s,stpos,1)                               ; Extract each character in turn
    If (chr.s <> " " Or lastchr.s <> " ")                   ; If character is not a space, or the previous character was not a space, include it
      If chr.s <> "(" And chr.s <> ")" And chr.s <> "." And chr.s <> ","  ; Exclude certain unwanted chars  ( ) . ,
        newstring.s=newstring.s+chr.s                       ; Include the character in the 'newstring.s' output
      EndIf
    EndIf
    lastchr.s=chr.s                                         ; Save this character for the next iteration in the loop
  Next stpos
  
  before.s =   StringField(newstring.s, 1, keyword.s)       ; Split the string on the keyword
  countspc.i = CountString(before.s, " ")                   ; Count no. of spaces to obtain the last word before the keyword
  before.s = StringField(before.s, countspc.i," ")          ; Split the string on the last space
  
  Debug "Input string     : " + text.s                      ; Display the input string (before it is processed)
  Debug "Word before   : " + before.s                       ; Display the resulting word before
  
  after.s =  Trim(StringField(newstring.s, 2, keyword.s))   ; Split the string on the next space after keyword, remove trailing space
  after.s = StringField(after.s, 1, " ")                    ; Split the string on the first space after keyword
  Debug "Word after      : " + after.s                      ; Display resulting word after
    
EndProcedure
  
process(string1.s, "keyword")
process(string2.s, "keyword")
Input string : blah blah blah word_before keyword word_after blah blah blah
Word before : word_before
Word after : word_after
Input string : blah blah blah (word_before keyword word_after, blah blah blah
Word before : word_before
Word after : word_after

Re: How to get the word_before and word_after a specific keyword from a string.

Posted: Sat Aug 27, 2022 2:27 am
by idle
@Oso
you've just taken 2 steps back, each post was an improvement over the others, my post also simplified and improved the runtime complexity of Infratec's last code, he's a PB guru, he knows his stuff.

Re: How to get the word_before and word_after a specific keyword from a string.

Posted: Sat Aug 27, 2022 9:02 am
by Olli
@Oso

Don't worry about the upcoming compatibility stories :
1) LTS = Long term support
2) if we doubt, we will quickly change what it needs to be changed, just after the battle.

I just notice that I haven't had time to put my head in the ring, that infratec already has my teeth. I'm trying to get up from my K.O.... I know I'll be able to get up, but I'm waiting for my dizziness to stop...

Re: How to get the word_before and word_after a specific keyword from a string.

Posted: Sat Aug 27, 2022 2:15 pm
by ChrisR
Also KO but my dizziness stopped :)
Based on Infratec's code also, my version, which should simplify a bit the code reading. Without using the word list, Offset.

Code: Select all

EnableExplicit

Structure WordBeforeAfter_Structure
  Before$
  After$
EndStructure

Procedure.i GetWordBeforeAndAfter(String$, Keyword$, List ResultList.WordBeforeAfter_Structure(), StringNoCase = #PB_String_CaseSensitive)
  Protected Word$, WordCase$, KeywordCase$, LastWord$, KeyFound
  Protected *String.Character, *Keyword.Character
  
  Debug "String: "  + String$
  Debug "Keyword: " + Keyword$
  Debug "-->"
  
  If StringNoCase
    *Keyword = @Keyword$
    While *Keyword\c <> 0
      If *Keyword\c <= 'Z'
        KeywordCase$ + Chr(*Keyword\c +32)
      Else
        KeywordCase$ + Chr(*Keyword\c)
      EndIf
      *Keyword + SizeOf(Character)
    Wend
  Else
    KeywordCase$ = Keyword$
  EndIf
  
  ClearList(ResultList())
  
  *String = @String$
  Repeat
    
    If (*String\c >= 'A' And *String\c <= 'Z') Or (*String\c >= '_' And *String\c <= 'z')
      Word$ + Chr(*String\c)
      If StringNoCase And *String\c <= 'Z'
        WordCase$ + Chr(*String\c + 32)
      Else
        WordCase$ + Chr(*String\c)
      EndIf
    Else
      If Word$ <> ""
        If KeyFound
          ResultList()\After$ = Word$
          If WordCase$ = KeywordCase$
            AddElement(ResultList())
            ResultList()\Before$ = LastWord$
            KeyFound = #True
          Else
            KeyFound = #False
          EndIf
        Else
          If WordCase$ = KeywordCase$
            AddElement(ResultList())
            ResultList()\Before$ = LastWord$
            KeyFound = #True
          EndIf
        EndIf
        LastWord$ = Word$
        Word$  = ""
        WordCase$ = ""
      EndIf
    EndIf
    
    If *String\c = 0
      Break
    EndIf
    *String + SizeOf(Character)
  ForEver
  
  ProcedureReturn ListSize(ResultList())
EndProcedure

NewList WordBeforeAfterList.WordBeforeAfter_Structure()

Define String$  = "KeyWord blah blah -(word_before keyword word_after!. blah blah ,word_before_ keyword keyword word_after_. blah blah keyword"
;Define String$  = " blah blah -(word_before keyword word_after"
;Define String$  = "word_before keyword word_after"
;Define String$  = "keyword word_after"
;Define String$  = "keyword"
;Define String$  = "word_before keyword"

Define Keyword$ = "keyword"
;Define Keyword$ = "KeyWord"

If GetWordBeforeAndAfter(String$, Keyword$, WordBeforeAfterList())
;If GetWordBeforeAndAfter(String$, Keyword$, WordBeforeAfterList(), #PB_String_NoCase)
  Define NbMatch
  ForEach WordBeforeAfterList()
    NbMatch + 1
    Debug "Word_Before "   + Str(NbMatch) + ": " + WordBeforeAfterList()\Before$
    Debug "Word_After    " + Str(NbMatch) + ": " + WordBeforeAfterList()\After$
  Next
EndIf
Edit: Added String_Case parameter: #PB_String_CaseSensitive (default) or #PB_String_NoCase
Edit2: Optimize String_Case

Re: How to get the word_before and word_after a specific keyword from a string.

Posted: Sat Aug 27, 2022 6:21 pm
by Olli
Wiv'out v'e teev' (long term fupport)

( hash("nuts") = 2*'n' + 3*'u' + 5*'t' + 7*'s' )

Code: Select all

Macro anotherMid(alpha, beta)
    PeekS(alpha\wrd(3, beta), alpha\wrd(2, beta) )
EndMacro

#bpc = SizeOf(character) ; (b)ytes (p)er (c)haracter
#bpi = SizeOf(integer) ; (b)ytes (p)er (i)nteger

;longest word
;x32u; x32a; x64u;   x64a
;114;  1469; ?(big); ?(big)

; (for beta reducing)
#greatestUnsignedCharacter = 1 << (8 * #bpc) - 1 
#greatestSignedInteger = 1 << ((8 * #bpi) - 1) - 1

#cmLim = 1 << (8 * #bpc) - 1 ; (c)haracter (m)ask array (lim)it
#pvLim = 1 << 16 - 1 ; (p)rime (v)alue array (lim)it

Structure charMask
    Array cm.a(#cmLim)
EndStructure

Structure wrd
    Array wrd.i(3, 4095)
    qty.i
EndStructure

Structure primeValue
    Array pv.i(#pvLim)
EndStructure

Procedure cmCreate()
    Define *this.charMask = AllocateMemory(SizeOf(charMask) )
    InitializeStructure(*this, charMask)    
    ProcedureReturn *this
EndProcedure
    
Procedure pvCreate()
    Define *this.primeValue = AllocateMemory(SizeOf(primeValue) )
    Define i, j, sqrPvLim = Sqr(#pvLim)
    InitializeStructure(*this, primeValue)
    With *this
        ; *** 1/3 sieving ******************************************
        i = 2
        Repeat            
            If \pv(i) = 0
                j = i * i
                Repeat
                    \pv(j) = j
                    j + i
                Until j > #pvLim
            EndIf
            i + 1
        Until i > sqrPvLim
        ; *** 2/3 compacting ***************************************
        j = 1
        For i = 2 To #pvLim
            If Not \pv(i)
                \pv(j) = i
                j + 1
            EndIf
        Next
        j - 1
        ; *** 3/3 alpha reducing *****************************************
        \pv(0) = j
        ReDim \pv(j)
    EndWith
    ProcedureReturn *this
EndProcedure

Procedure hash(*a.character, *pv.primeValue)
    While *a\c
        i + 1
        r + *pv\pv(i) * *a\c
        *a + SizeOf(character)      
    Wend
    ProcedureReturn r
EndProcedure

Procedure SplitFilterAndHash(*a.character, *cm.charMask, *pv.primeValue)
    *c.wrd = AllocateMemory(SizeOf(wrd) ) ; resulting array
    InitializeStructure(*c, wrd)
    With *c
        *a - #bpc       
        While *a\c
            *a + #bpc
            j + 1
             If *cm\cm(*a\c)
                If r
                    \wrd(0, k) = r
                    \wrd(2, k) = i
                    k + 1
                    i = 0
                    r = 0
                EndIf
            Else                
                If r = 0
                    \wrd(1, k) = j
                    \wrd(3, k) = *a
                EndIf
                i + 1
                r + *pv\pv(i) * *a\c
            EndIf
        Wend
        If r
            \wrd(0, k) = r
            \wrd(2, k) = i
        EndIf
        \qty = k
        ProcedureReturn *c
    EndWith
EndProcedure









; Here, we go !


Define *pv.primeValue = pvCreate()
Define *cm.charMask = cmCreate()

                   ; WE EXCLUDE :
                   
*cm\cm(9) = 1      ; tabulation char
*cm\cm(10) = 1     ; line feed char
*cm\cm(13) = 1     ; carriage return char
*cm\cm(32) = 1     ; space char
*cm\cm('(') = 1    ; 1st parenthesis char
*cm\cm('+') = 1    ; 'plus' char...

*cm\cm('e') = 1    ; and 'e' char...

*cm\cm('e') = 0     ; ...finally, nop : no 'e' char exclude...

*cm\cm('♞') = 0   ; We insure ourselves we keep the horse...



a$ = "    monday (tuesday wednesday thursday+ friday"
weSearch = hash(@"wednesday", *pv)

Define *c.wrd = SplitFilterAndHash(@a$, *cm, *pv)
Debug a$
For i = 0 To *c\qty
    Debug PeekS(*c\wrd(3, i), *c\wrd(2, i) )
    If *c\wrd(0, i) = weSearch
        Debug "before " + anotherMid(*c, i) + " there is " + anotherMid(*c, i - 1) + " and before again : " + anotherMid(*c, i - 2)
        Debug "after " + anotherMid(*c, i) + " there is " + anotherMid(*c, i + 1) + " and after again : " + anotherMid(*c, i + 2)
    EndIf
Next

Re: How to get the word_before and word_after a specific keyword from a string.

Posted: Sat Aug 27, 2022 8:49 pm
by idle
That's got my vote. Keep the horse 🐎

Re: How to get the word_before and word_after a specific keyword from a string.

Posted: Sat Aug 27, 2022 9:46 pm
by ChrisR
I fell off the horse ♞ on monday and friday 🚑

Re: How to get the word_before and word_after a specific keyword from a string.

Posted: Sat Aug 27, 2022 9:59 pm
by dcr3
Although some misunderstood the question in the OP.
or maybe isn't clear enough. :oops:



What is the best way to get the word_before and word_after a specific keyword from a string.


That meets these two conditions.

First condition.

1. string.s="blah blah blah word_before keyword word_after blah blah blah"


Second condition.

2. string.s=" blah blah blah (word_before keyword word_after, blah blah blah "
word_before and word_after as it's written does not exist in English, I wrote that way to convey the meaning of the string.

There is no underscore or any other characters between words.
The only exception is ( parentheses in the word_before and , comma in the word_after.


But you all have, interesting concepts to learn from.

Thank you infratec. As always.

Thank you idle.

Thank you Oso.
Oso wrote: Fri Aug 26, 2022 11:09 pm I'm surprised at the complexity of some of the solutions put forward. I'd be concerned if they needed to be this complex, that the future task of maintaining the code would be difficult, especially as it might be someone else.
I agree.

Thank you olli.

olli you have a witty and dark sense of humor you are the winner here.
the horse bit , it is just another level. :lol: :lol:

Re: How to get the word_before and word_after a specific keyword from a string.

Posted: Sat Aug 27, 2022 10:03 pm
by Oso
idle wrote: Sat Aug 27, 2022 2:27 am @Oso you've just taken 2 steps back, each post was an improvement over the others, my post also simplified and improved the runtime complexity of Infratec's last code, he's a PB guru, he knows his stuff.
I've got ya, my apologies for 'throwing the spanner in the works', as they say :cry: