Page 1 of 1

FindField() ~ Return field offset in reference string

Posted: Wed Dec 04, 2013 3:15 pm
by RichardL
Hi,
A trivial but useful routine to return the Field offset of a string in a reference string.
RichardL

Code: Select all

Procedure FindField(String$,Search$,Sep$,CaseFlag=0)
  ; R1 (1747 mSec / 1 000 000
  ; Un-suitable for cases when Search$ may be a substring in a reference String field
  ; ASCII Only.
  ; Return the field number for 'Search$' in the longer string 'String$'
  ; employing separators 'Sep$'.
  ; Default is to case independent search. Set flag 'True' to match cases.
  
  If FindString(Search$,Sep$)
    MessageRequester("ERROR: FindField()","'Search$' cannot include 'Sep$'")
    ProcedureReturn 0
  EndIf
  
  ; Case independent match.
  If CaseFlag = 0
    String$ = UCase(String$)
    Search$ = UCase(Search$)
  EndIf
  
  ; Find match in string
  p = FindString(String$,Search$)
  
  ; Find corresponding field number
  If p
    p = CountString(Left(String$,p),Sep$)
    ProcedureReturn p + 1
  Else
    ProcedureReturn 0
  EndIf
  
EndProcedure

Procedure R2FindField(String$,Search$,Sep$,CaseFlag=0)
  ; R2 (2527 mSec / 1 000 000)
  ; Fixes the defect in R1
  ; ASCII only.
  ; Return the field number for 'Search$' in the longer string 'String$'
  ; employing separators 'Sep$'.
  ; Default is to case independent search. Set CaseFlag 'True' to match cases.

  Protected n
  
  If FindString(Search$,Sep$)
    MessageRequester("ERROR: FindField()","'Search$' cannot include 'Sep$'")
    ProcedureReturn 0
  EndIf
  
  ; Case independent match...
  If CaseFlag = 0
    String$ = UCase(String$)
    Search$ = UCase(Search$)
  EndIf
  
  ; Compare fields
  For n = 1 To CountString(String$,Sep$) + 1
    If StringField(String$,n,Sep$) = Search$
      ProcedureReturn n
    EndIf
  Next
  
  ProcedureReturn 0
  
EndProcedure

Procedure R3FindField(String$,Find$,Sep$,CaseFlag=0)
  ; R3 (1601 mSec / 1 000 000) - Single scan method. More code, but faster
  ; ASCII only.
  ; Return the field number for 'Search$' in the longer string 'String$'
  ; employing separators 'Sep$'.
  ; Default is to case independent search. Set CaseFlag 'True' to match cases.
  
  Protected *FK    = @Find$                  ; String to find     Ex: DICK
  Protected SepC   = PeekC(@Sep$)            ; Separator          Ex: |
  Protected Result = 0  
  Protected FieldCount = 0
  Protected LF     = Len(Find$)
  Protected MCount = 0
  Protected *F     = *FK-1
  Protected *S     = @String$ -1             ; Reference string   Ex: Tom|Dick|Harry|
  
  If FindString(Find$,Sep$)
    MessageRequester("ERROR: FindField()","'Find$' cannot include 'Sep$'")
    ProcedureReturn 0
  EndIf  
  
  Repeat
    ; Advance pointers and get chars from Ref and Find strings
    *F + 1  : F = PeekC(*F)
    *S + 1  : C = PeekC(*S)
    
    ; End of BOTH fields, and they match! 
    If (C = SepC Or C = 0) And F = 0 And MCount=LF          
      Result = FieldCount + 1
      Break
    EndIf  
    
    ; Reached end of Ref string so finish
    If C = 0                   
      Break
    EndIf
    
    ; Reached end of REF field, prepare for next one
    If C = SepC                ; Ref field end
      FieldCount + 1
      *F = *FK-1               ; Re-start Find
      MCount = 0
      Continue 
    EndIf
    
    ; Reached end of FIND field, run over remaining REF chars
    If F = 0     
      While PeekC(*S) <> SepC And PeekC(*S) <> 0 : *S+ 1 : Wend
      *S - 1
      *F = *FK-1               ; Re-start Find
      MCount = 0
      Continue
    EndIf
    
  ; Compare field chars and count matching chars
    If Not CaseFlag            ; Case independent, so move 'a-z' to 'A-Z'
      If C >= 'a' And C <= 'z' : C - 32 : EndIf
      If F >= 'a' And F <= 'z' : F - 32 : EndIf
    EndIf
    
    If C = F     
      MCount + 1
    Else
      While PeekC(*S) <> SepC And PeekC(*S) <> 0 : *S+ 1 : Wend
      *S - 1
      *F = *FK-1              
      MCount = 0
    EndIf
    
  ForEver
  
ProcedureReturn Result
EndProcedure

TestCount = 100000
TS = ElapsedMilliseconds()
For n = 1 To TestCount
  T = FindField("the|cat|sat|On the |mat","on the ","|",1)     ; 0    1
  T + FindField("the|cat|sat|on the |mat|","COW","|")          ; 0    2
  T + FindField("the|cat|sat|on the |mat|","LONGCOW","|")      ; 0    3
  T + FindField("the |cat|sat|on the |mat","the ","|")         ; 1    4
  T + FindField("cat","CAT","|")                               ; 1    5
  T + FindField("the|CAT|sat|On the |mat","CAT","|")           ; 2    6
  T + FindField("the|CAT|sat|On the |mat","SAT","|")           ; 3    7 
  T + FindField("these|cats|sat|on the |at","on the ","|")     ; 4    8
  T + FindField("mon|tue|wed|thu|fri|sat|sun","Fri","|")       ; 5    9
  T + FindField("the,name,of this,tomcat,is,tom,", "TOM", ",") ; 6    10
  ;                                                            ====     
  ;                                                             22  
  If TestCount = 1
    T + FindField("the|cat|sat|on|the|mat","Bang|","|")        ; ERROR and 0
    If T <> 22 : MessageRequester("Error","Sort it...") : EndIf 
  EndIf
Next
MessageRequester("Timer... 1M tests",Str(10*(n-1))+ " Tests in "+Str(ElapsedMilliseconds()-TS)+" mSec")
;http://www.youtube.com/watch?v=27ugSKW4-QQ

Re: FindField() ~ Return field offset in reference string

Posted: Wed Dec 04, 2013 6:57 pm
by Little John
Hi,

the idea is good, but your code does not work as I would expect.
IMHO the procedure should only find whole fields. I.e., the following example should yield 6, but it returns 4.

Code: Select all

Procedure FindField(String$,Search$,Sep$,CaseFlag=0)
  ; Return the field number for 'Search$' in the longer string 'String$'
  ; employing separators 'Sep$'.
  ; Default is to case independent search. Set flag 'True' to match cases.
 
  If FindString(Search$,Sep$)
    MessageRequester("ERROR: FindField()","'Search$' cannot include 'Sep$'")
    ProcedureReturn 0
  EndIf
 
  ; Case independent match...xx
  If CaseFlag = 0
    String$ = UCase(String$)
    Search$ = UCase(Search$)
  EndIf
 
  ; Find match in string
  P = FindString(String$,Search$)
 
  ; Find corresponding field number
  If P
    P = CountString(Left(String$,P),Sep$)
    ProcedureReturn P + 1
  Else
    ProcedureReturn 0
  EndIf
 
EndProcedure


Debug FindField("the,name,of this,tomcat,is,tom", "tom", ",")

Re: FindField() ~ Return field offset in reference string

Posted: Wed Dec 04, 2013 11:58 pm
by skywalk
Little John wrote:Hi,
the idea is good, but your code does not work as I would expect.
IMHO the procedure should only find whole fields. I.e., the following example should yield 6, but it returns 4.
:wink:

Code: Select all

Debug FindField("the,name,of this,tomcat,is,Tom", "Tom", ",", 1)

Re: FindField() ~ Return field offset in reference string

Posted: Thu Dec 05, 2013 7:12 am
by Little John
skywalk wrote:
Little John wrote:Hi,
the idea is good, but your code does not work as I would expect.
IMHO the procedure should only find whole fields. I.e., the following example should yield 6, but it returns 4.
:wink:

Code: Select all

Debug FindField("the,name,of this,tomcat,is,Tom", "Tom", ",", 1)
Case sensitive or not case sensitive is not the point. I changed the example in my previous post to make it clearer.

Re: FindField() ~ Return field offset in reference string

Posted: Sun Dec 08, 2013 10:46 am
by RichardL
Hi Little John,
Thanks for your comment. The project that spawned this routine matched a string that was always uniquely defined in the reference string so the problem did not arise. I have annotated this version 'R1'

I have produced two alternatives that address the issue you raise.
'R2' solves the problem.

'R3' uses a single scan of the reference string and looks for a field match. I include this because I asked myself how I would have written the same thing in 6502 assembler, which I wrote a lot of in the 70's and 80's. While on a walk I thought about an alternative way of solving the problem. This was for my satisfaction only, but although it resulted in longer code it is the fastest when I test it... by a very small amount.
Also, R3 will tolerate a separator at the end if the reference string, which I sometimes do.

I have replaced the code in the original posting with all three version... enable by naming the Procedure() appropriately.

Best regards
Richard

Re: FindField() ~ Return field offset in reference string

Posted: Sun Dec 08, 2013 10:58 am
by wilbert
It's very easy to improve the speed of your R3 version by replacing some PeekC commands

Code: Select all

Procedure R3FindField(String$,Find$,Sep$,CaseFlag=0)
  ; R3 (1601 mSec / 1 000 000) - Single scan method. More code, but faster
  ; ASCII only.
  ; Return the field number for 'Search$' in the longer string 'String$'
  ; employing separators 'Sep$'.
  ; Default is to case independent search. Set CaseFlag 'True' to match cases.
  
  Protected *FK    = @Find$                  ; String to find     Ex: DICK
  Protected SepC   = PeekC(@Sep$)            ; Separator          Ex: |
  Protected Result = 0  
  Protected FieldCount = 0
  Protected LF     = Len(Find$)
  Protected MCount = 0
  Protected *F.Character = *FK-1
  Protected *S.Character = @String$ -1       ; Reference string   Ex: Tom|Dick|Harry|
  
  If FindString(Find$,Sep$)
    MessageRequester("ERROR: FindField()","'Find$' cannot include 'Sep$'")
    ProcedureReturn 0
  EndIf  
  
  Repeat
    ; Advance pointers and get chars from Ref and Find strings
    *F + 1  : F = *F\c
    *S + 1  : C = *S\c
    
    ; End of BOTH fields, and they match! 
    If (C = SepC Or C = 0) And F = 0 And MCount=LF          
      Result = FieldCount + 1
      Break
    EndIf  
    
    ; Reached end of Ref string so finish
    If C = 0                   
      Break
    EndIf
    
    ; Reached end of REF field, prepare for next one
    If C = SepC                ; Ref field end
      FieldCount + 1
      *F = *FK-1               ; Re-start Find
      MCount = 0
      Continue 
    EndIf
    
    ; Reached end of FIND field, run over remaining REF chars
    If F = 0     
      While *S\c <> SepC And *S\c <> 0 : *S+ 1 : Wend
      *S - 1
      *F = *FK-1               ; Re-start Find
      MCount = 0
      Continue
    EndIf
    
  ; Compare field chars and count matching chars
    If Not CaseFlag            ; Case independent, so move 'a-z' to 'A-Z'
      If C >= 'a' And C <= 'z' : C - 32 : EndIf
      If F >= 'a' And F <= 'z' : F - 32 : EndIf
    EndIf
    
    If C = F     
      MCount + 1
    Else
      While *S\c <> SepC And *S\c <> 0 : *S+ 1 : Wend
      *S - 1
      *F = *FK-1              
      MCount = 0
    EndIf
    
  ForEver
  
ProcedureReturn Result
EndProcedure

Re: FindField() ~ Return field offset in reference string

Posted: Sun Dec 08, 2013 11:11 am
by RichardL
Good morning Wilbert,

Thank you.
That is not a syntax I'm familiar with and for comparison it clocks at 1213 mSec on the same machine as I used before.

Anyone for replacing the core loop with assembler...? :wink:

Richard

Re: FindField() ~ Return field offset in reference string

Posted: Mon Dec 09, 2013 12:59 pm
by Little John
Thanks to both of you, Richard and Wilbert!

Re: FindField() ~ Return field offset in reference string

Posted: Mon Dec 30, 2013 3:09 pm
by Mistrel
That's a lot of code for a simple problem. You could just as easily use CountString() with the field and then loop over the fields by that number to find your match. You've also reimplemented a poorer version of StringField() which loses unicode support.

See here for my StringField() implementation which supports whole-string delimiters (back before PureBasic supported it). You can easily adapt it for case support:

http://forum.purebasic.com/english/view ... 12&t=42013