FindField() ~ Return field offset in reference string

Share your advanced PureBasic knowledge/code with the community.
RichardL
Enthusiast
Enthusiast
Posts: 532
Joined: Sat Sep 11, 2004 11:54 am
Location: UK

FindField() ~ Return field offset in reference string

Post by RichardL »

Hi,
A trivial but useful routine to return the Field offset of a string in a reference string.
RichardL

Code: Select all

Procedure FindField(String$,Search$,Sep$,CaseFlag=0)
  ; R1 (1747 mSec / 1 000 000
  ; Un-suitable for cases when Search$ may be a substring in a reference String field
  ; ASCII Only.
  ; Return the field number for 'Search$' in the longer string 'String$'
  ; employing separators 'Sep$'.
  ; Default is to case independent search. Set flag 'True' to match cases.
  
  If FindString(Search$,Sep$)
    MessageRequester("ERROR: FindField()","'Search$' cannot include 'Sep$'")
    ProcedureReturn 0
  EndIf
  
  ; Case independent match.
  If CaseFlag = 0
    String$ = UCase(String$)
    Search$ = UCase(Search$)
  EndIf
  
  ; Find match in string
  p = FindString(String$,Search$)
  
  ; Find corresponding field number
  If p
    p = CountString(Left(String$,p),Sep$)
    ProcedureReturn p + 1
  Else
    ProcedureReturn 0
  EndIf
  
EndProcedure

Procedure R2FindField(String$,Search$,Sep$,CaseFlag=0)
  ; R2 (2527 mSec / 1 000 000)
  ; Fixes the defect in R1
  ; ASCII only.
  ; Return the field number for 'Search$' in the longer string 'String$'
  ; employing separators 'Sep$'.
  ; Default is to case independent search. Set CaseFlag 'True' to match cases.

  Protected n
  
  If FindString(Search$,Sep$)
    MessageRequester("ERROR: FindField()","'Search$' cannot include 'Sep$'")
    ProcedureReturn 0
  EndIf
  
  ; Case independent match...
  If CaseFlag = 0
    String$ = UCase(String$)
    Search$ = UCase(Search$)
  EndIf
  
  ; Compare fields
  For n = 1 To CountString(String$,Sep$) + 1
    If StringField(String$,n,Sep$) = Search$
      ProcedureReturn n
    EndIf
  Next
  
  ProcedureReturn 0
  
EndProcedure

Procedure R3FindField(String$,Find$,Sep$,CaseFlag=0)
  ; R3 (1601 mSec / 1 000 000) - Single scan method. More code, but faster
  ; ASCII only.
  ; Return the field number for 'Search$' in the longer string 'String$'
  ; employing separators 'Sep$'.
  ; Default is to case independent search. Set CaseFlag 'True' to match cases.
  
  Protected *FK    = @Find$                  ; String to find     Ex: DICK
  Protected SepC   = PeekC(@Sep$)            ; Separator          Ex: |
  Protected Result = 0  
  Protected FieldCount = 0
  Protected LF     = Len(Find$)
  Protected MCount = 0
  Protected *F     = *FK-1
  Protected *S     = @String$ -1             ; Reference string   Ex: Tom|Dick|Harry|
  
  If FindString(Find$,Sep$)
    MessageRequester("ERROR: FindField()","'Find$' cannot include 'Sep$'")
    ProcedureReturn 0
  EndIf  
  
  Repeat
    ; Advance pointers and get chars from Ref and Find strings
    *F + 1  : F = PeekC(*F)
    *S + 1  : C = PeekC(*S)
    
    ; End of BOTH fields, and they match! 
    If (C = SepC Or C = 0) And F = 0 And MCount=LF          
      Result = FieldCount + 1
      Break
    EndIf  
    
    ; Reached end of Ref string so finish
    If C = 0                   
      Break
    EndIf
    
    ; Reached end of REF field, prepare for next one
    If C = SepC                ; Ref field end
      FieldCount + 1
      *F = *FK-1               ; Re-start Find
      MCount = 0
      Continue 
    EndIf
    
    ; Reached end of FIND field, run over remaining REF chars
    If F = 0     
      While PeekC(*S) <> SepC And PeekC(*S) <> 0 : *S+ 1 : Wend
      *S - 1
      *F = *FK-1               ; Re-start Find
      MCount = 0
      Continue
    EndIf
    
  ; Compare field chars and count matching chars
    If Not CaseFlag            ; Case independent, so move 'a-z' to 'A-Z'
      If C >= 'a' And C <= 'z' : C - 32 : EndIf
      If F >= 'a' And F <= 'z' : F - 32 : EndIf
    EndIf
    
    If C = F     
      MCount + 1
    Else
      While PeekC(*S) <> SepC And PeekC(*S) <> 0 : *S+ 1 : Wend
      *S - 1
      *F = *FK-1              
      MCount = 0
    EndIf
    
  ForEver
  
ProcedureReturn Result
EndProcedure

TestCount = 100000
TS = ElapsedMilliseconds()
For n = 1 To TestCount
  T = FindField("the|cat|sat|On the |mat","on the ","|",1)     ; 0    1
  T + FindField("the|cat|sat|on the |mat|","COW","|")          ; 0    2
  T + FindField("the|cat|sat|on the |mat|","LONGCOW","|")      ; 0    3
  T + FindField("the |cat|sat|on the |mat","the ","|")         ; 1    4
  T + FindField("cat","CAT","|")                               ; 1    5
  T + FindField("the|CAT|sat|On the |mat","CAT","|")           ; 2    6
  T + FindField("the|CAT|sat|On the |mat","SAT","|")           ; 3    7 
  T + FindField("these|cats|sat|on the |at","on the ","|")     ; 4    8
  T + FindField("mon|tue|wed|thu|fri|sat|sun","Fri","|")       ; 5    9
  T + FindField("the,name,of this,tomcat,is,tom,", "TOM", ",") ; 6    10
  ;                                                            ====     
  ;                                                             22  
  If TestCount = 1
    T + FindField("the|cat|sat|on|the|mat","Bang|","|")        ; ERROR and 0
    If T <> 22 : MessageRequester("Error","Sort it...") : EndIf 
  EndIf
Next
MessageRequester("Timer... 1M tests",Str(10*(n-1))+ " Tests in "+Str(ElapsedMilliseconds()-TS)+" mSec")
;http://www.youtube.com/watch?v=27ugSKW4-QQ
Last edited by RichardL on Sun Dec 08, 2013 10:49 am, edited 1 time in total.
Little John
Addict
Addict
Posts: 4791
Joined: Thu Jun 07, 2007 3:25 pm
Location: Berlin, Germany

Re: FindField() ~ Return field offset in reference string

Post by Little John »

Hi,

the idea is good, but your code does not work as I would expect.
IMHO the procedure should only find whole fields. I.e., the following example should yield 6, but it returns 4.

Code: Select all

Procedure FindField(String$,Search$,Sep$,CaseFlag=0)
  ; Return the field number for 'Search$' in the longer string 'String$'
  ; employing separators 'Sep$'.
  ; Default is to case independent search. Set flag 'True' to match cases.
 
  If FindString(Search$,Sep$)
    MessageRequester("ERROR: FindField()","'Search$' cannot include 'Sep$'")
    ProcedureReturn 0
  EndIf
 
  ; Case independent match...xx
  If CaseFlag = 0
    String$ = UCase(String$)
    Search$ = UCase(Search$)
  EndIf
 
  ; Find match in string
  P = FindString(String$,Search$)
 
  ; Find corresponding field number
  If P
    P = CountString(Left(String$,P),Sep$)
    ProcedureReturn P + 1
  Else
    ProcedureReturn 0
  EndIf
 
EndProcedure


Debug FindField("the,name,of this,tomcat,is,tom", "tom", ",")
Last edited by Little John on Thu Dec 05, 2013 7:09 am, edited 1 time in total.
User avatar
skywalk
Addict
Addict
Posts: 4219
Joined: Wed Dec 23, 2009 10:14 pm
Location: Boston, MA

Re: FindField() ~ Return field offset in reference string

Post by skywalk »

Little John wrote:Hi,
the idea is good, but your code does not work as I would expect.
IMHO the procedure should only find whole fields. I.e., the following example should yield 6, but it returns 4.
:wink:

Code: Select all

Debug FindField("the,name,of this,tomcat,is,Tom", "Tom", ",", 1)
The nice thing about standards is there are so many to choose from. ~ Andrew Tanenbaum
Little John
Addict
Addict
Posts: 4791
Joined: Thu Jun 07, 2007 3:25 pm
Location: Berlin, Germany

Re: FindField() ~ Return field offset in reference string

Post by Little John »

skywalk wrote:
Little John wrote:Hi,
the idea is good, but your code does not work as I would expect.
IMHO the procedure should only find whole fields. I.e., the following example should yield 6, but it returns 4.
:wink:

Code: Select all

Debug FindField("the,name,of this,tomcat,is,Tom", "Tom", ",", 1)
Case sensitive or not case sensitive is not the point. I changed the example in my previous post to make it clearer.
RichardL
Enthusiast
Enthusiast
Posts: 532
Joined: Sat Sep 11, 2004 11:54 am
Location: UK

Re: FindField() ~ Return field offset in reference string

Post by RichardL »

Hi Little John,
Thanks for your comment. The project that spawned this routine matched a string that was always uniquely defined in the reference string so the problem did not arise. I have annotated this version 'R1'

I have produced two alternatives that address the issue you raise.
'R2' solves the problem.

'R3' uses a single scan of the reference string and looks for a field match. I include this because I asked myself how I would have written the same thing in 6502 assembler, which I wrote a lot of in the 70's and 80's. While on a walk I thought about an alternative way of solving the problem. This was for my satisfaction only, but although it resulted in longer code it is the fastest when I test it... by a very small amount.
Also, R3 will tolerate a separator at the end if the reference string, which I sometimes do.

I have replaced the code in the original posting with all three version... enable by naming the Procedure() appropriately.

Best regards
Richard
Last edited by RichardL on Sat Feb 15, 2014 12:54 pm, edited 1 time in total.
wilbert
PureBasic Expert
PureBasic Expert
Posts: 3942
Joined: Sun Aug 08, 2004 5:21 am
Location: Netherlands

Re: FindField() ~ Return field offset in reference string

Post by wilbert »

It's very easy to improve the speed of your R3 version by replacing some PeekC commands

Code: Select all

Procedure R3FindField(String$,Find$,Sep$,CaseFlag=0)
  ; R3 (1601 mSec / 1 000 000) - Single scan method. More code, but faster
  ; ASCII only.
  ; Return the field number for 'Search$' in the longer string 'String$'
  ; employing separators 'Sep$'.
  ; Default is to case independent search. Set CaseFlag 'True' to match cases.
  
  Protected *FK    = @Find$                  ; String to find     Ex: DICK
  Protected SepC   = PeekC(@Sep$)            ; Separator          Ex: |
  Protected Result = 0  
  Protected FieldCount = 0
  Protected LF     = Len(Find$)
  Protected MCount = 0
  Protected *F.Character = *FK-1
  Protected *S.Character = @String$ -1       ; Reference string   Ex: Tom|Dick|Harry|
  
  If FindString(Find$,Sep$)
    MessageRequester("ERROR: FindField()","'Find$' cannot include 'Sep$'")
    ProcedureReturn 0
  EndIf  
  
  Repeat
    ; Advance pointers and get chars from Ref and Find strings
    *F + 1  : F = *F\c
    *S + 1  : C = *S\c
    
    ; End of BOTH fields, and they match! 
    If (C = SepC Or C = 0) And F = 0 And MCount=LF          
      Result = FieldCount + 1
      Break
    EndIf  
    
    ; Reached end of Ref string so finish
    If C = 0                   
      Break
    EndIf
    
    ; Reached end of REF field, prepare for next one
    If C = SepC                ; Ref field end
      FieldCount + 1
      *F = *FK-1               ; Re-start Find
      MCount = 0
      Continue 
    EndIf
    
    ; Reached end of FIND field, run over remaining REF chars
    If F = 0     
      While *S\c <> SepC And *S\c <> 0 : *S+ 1 : Wend
      *S - 1
      *F = *FK-1               ; Re-start Find
      MCount = 0
      Continue
    EndIf
    
  ; Compare field chars and count matching chars
    If Not CaseFlag            ; Case independent, so move 'a-z' to 'A-Z'
      If C >= 'a' And C <= 'z' : C - 32 : EndIf
      If F >= 'a' And F <= 'z' : F - 32 : EndIf
    EndIf
    
    If C = F     
      MCount + 1
    Else
      While *S\c <> SepC And *S\c <> 0 : *S+ 1 : Wend
      *S - 1
      *F = *FK-1              
      MCount = 0
    EndIf
    
  ForEver
  
ProcedureReturn Result
EndProcedure
Windows (x64)
Raspberry Pi OS (Arm64)
RichardL
Enthusiast
Enthusiast
Posts: 532
Joined: Sat Sep 11, 2004 11:54 am
Location: UK

Re: FindField() ~ Return field offset in reference string

Post by RichardL »

Good morning Wilbert,

Thank you.
That is not a syntax I'm familiar with and for comparison it clocks at 1213 mSec on the same machine as I used before.

Anyone for replacing the core loop with assembler...? :wink:

Richard
Little John
Addict
Addict
Posts: 4791
Joined: Thu Jun 07, 2007 3:25 pm
Location: Berlin, Germany

Re: FindField() ~ Return field offset in reference string

Post by Little John »

Thanks to both of you, Richard and Wilbert!
Mistrel
Addict
Addict
Posts: 3415
Joined: Sat Jun 30, 2007 8:04 pm

Re: FindField() ~ Return field offset in reference string

Post by Mistrel »

That's a lot of code for a simple problem. You could just as easily use CountString() with the field and then loop over the fields by that number to find your match. You've also reimplemented a poorer version of StringField() which loses unicode support.

See here for my StringField() implementation which supports whole-string delimiters (back before PureBasic supported it). You can easily adapt it for case support:

http://forum.purebasic.com/english/view ... 12&t=42013
Post Reply