Page 1 of 2

StringField - Alterable functions - Feasibility demos

Posted: Wed May 05, 2021 7:11 pm
by Saki
StringField - Alterable functions - Feasibility demos

This codes here are only feasibility demo

A little other :
From a purely intuitive point of view, I consider my variant to be more understandable.
To make it clear to the interested reader what this is about.
The PB version of mk_soft recognises this with an index 2, as two stars occur.

My version recognises this with an index 1, because there is the first field that is enclosed with stars.

Index 2 mk_soft
"*Hello*World"

Index 1 my version
"*Hello*World"

That is all.

This code is free.
who can still improve and optimize it is welcome.

Hint :
The result of the analysis of these test codes is the
extreme fast and comprehensive StringField_Tool_BF, you can find it here :

https://www.purebasic.fr/english/viewto ... 12&t=77219

Have fun with it.

Variant 1 - Usual procedure

Code: Select all

DeclareModule StringField_BF
  EnableExplicit
  Declare.s StringField_BF(string$, index, separator$)
EndDeclareModule

Module StringField_BF
  Global NewList index() : AddElement(index())
  Procedure.s StringField_BF(string$, index, separator$)
    ; StringField - By Saki - Unicode - This code is free for using and enhancing
    Protected i, ii, pos_1, pos_2, length_result, comp, amount_indexes, count_index
    Protected len_separator=StringByteLength(separator$), *pointer.word, jump_in, result$
    Protected *string=@string$, *separator=@separator$
    If Not PeekW(*string) : ProcedureReturn "" : EndIf
    If index<1 : ProcedureReturn "" : EndIf
    ClearList(index()) : AddElement(index()); #########################################
    If comp=CompareMemory(*string, *separator, len_separator)
      jump_in=1 : index+1
    EndIf
    i=-2
    Repeat
      i+2
      comp=CompareMemory(*string+i, *separator, len_separator)
      If comp+jump_in
        count_index+1
        jump_in=0 : ii+1 : i+len_separator-2 : amount_indexes+1
        AddElement(index()) : index()=i+2
      EndIf 
      *pointer=*string+i+len_separator
    Until count_index>index Or Not *pointer\w 
    If index>count_index : index=count_index : EndIf
    amount_indexes=ii : i=0 
    If amount_indexes
      Repeat
        SelectElement(index(), i) : pos_1=index()
        SelectElement(index(), i+1) : pos_2=index()
        length_result=pos_2-pos_1-len_separator
        If pos_2-pos_1>0
          If length_result>0
            result$=Space(length_result>>1)
            CopyMemory(*string+pos_1, @result$, length_result)
          Else 
            result$=#Null$
          EndIf
          If i=index : ProcedureReturn result$ : EndIf
        EndIf
        i+1
      Until i>index Or i>amount_indexes-1
    EndIf
  EndProcedure
EndModule
UseModule StringField_BF

Define separator$=" "
Define index, result$, string$

string$=" Hello I am a splitted String "

Define multiplier=8

Define i
For i=1 To multiplier
  string$+string$
Next i

Debug Len(string$)
Debug "================"

Define len_string=Len(string$)

Define time=ElapsedMilliseconds()

For index=1 To len_string/6+1
  
  result$=StringField_BF(string$, index, separator$)
  
  If result$<>"" : Debug result$ : EndIf
Next 

MessageRequester("", Str(ElapsedMilliseconds()-time))

Re: StringField_BF - A alterable function

Posted: Wed May 05, 2021 7:59 pm
by Paul
Still not seeing how your version here is better than mk-soft's version ??

Code: Select all

separator$=" "
string$ = " Hello I am a splitted string "

cnt = CountString(string$,separator$)
For k = 1 To cnt
  result$=StringField(string$,k,separator$)
  If result$
    Debug result$
  EndIf
Next

Re: StringField_BF - A alterable function

Posted: Wed May 05, 2021 8:23 pm
by Saki
Why should it be better ?

It works different !
The behavior is quite similar but also very different.

Re: StringField_BF - A alterable function

Posted: Wed May 05, 2021 8:46 pm
by Paul
I just though if you're replacing a single PB command with 50 or 60 lines of code it must do something much more.

So... No worries :wink:
Some people like many lines of code and others prefer simple and efficient.

Cheers!

Re: StringField_BF - A alterable function

Posted: Wed May 05, 2021 8:59 pm
by Saki
Then work with it a bit and you'll see.
Strings can be very different.
If you have a problem with one code in certain constellations, you can try the other one.
Both are not perfect, but different. :wink:

It is just a sample code for T&T.
It does not have to be better or worse, larger or smaller than other codes.

But it is just a code that the users can analyze, change and learn, not a command.

Re: StringField_BF - A alterable function

Posted: Wed May 05, 2021 9:36 pm
by BarryG
Saki wrote: Wed May 05, 2021 8:59 pmBoth are not perfect
What's not perfect about StringField()?

Re: StringField_BF - A alterable function

Posted: Wed May 05, 2021 9:44 pm
by Saki
Work intensively with this functions, then you will see what you need or want, or not.

Re: StringField_BF - A alterable function

Posted: Fri May 07, 2021 5:41 pm
by mk-soft
Sorry ...

Code: Select all

Procedure.s StringField_MK(string$, index, separator$)
  string$ = Trim(string$, separator$)
  ProcedureReturn StringField(string$, index, separator$)
EndProcedure

Define separator$=" "

Define index, result$, string$

string$=" Hello I am a splitted string "

For index=1 To 6
  result$=StringField_MK(string$, index, separator$) ; BF
  
  ; result$=StringField(string$, index, separator$)  ; PB
  If result$<>"" : Debug result$ : EndIf
Next 
P.S.
Maybe you find your StringField evaluation better. But it is logically wrong and only suitable in special cases.

At the latest when you process data files, serious errors occur.
The first or last column of a data file with separators (",", ";", TAB) can be empty and then you have a shift of the data and thus invalid data.

Re: StringField_BF - A alterable function

Posted: Fri May 07, 2021 8:47 pm
by Saki
Hi, yes the code was not yet error-free.
I have just replaced it with another one.

I work further on it.

From a purely intuitive point of view, I consider my variant to be more understandable.
To make it clear to the interested reader what this is about.
The version of mk_soft recognises this with an index 2, as two stars occur.

My version recognises this with an index 1, because there is the first field that is enclosed with stars.

Index 2 mk_soft
"*Hello*World"

Index 1 my version
"*Hello*World"

That is all.

Of course you can always post in my threads,
even criticise or contribute code examples or changes.

This string fiddling is a chapter of its own, really heavy stuff. :lol:

Re: StringField_BF - A alterable function

Posted: Fri May 07, 2021 10:10 pm
by the.weavster
You've concatenated a long string for your demo but then you've only iterated over the first 6 fields out of > 229k making your timer rather worthless
Saki wrote: Wed May 05, 2021 7:11 pm

Code: Select all

Define time=ElapsedMilliseconds()

For index=1 To 6 ; <== Only Six!
  
  result$=StringField_BF(string$, index, separator$) ; BF
  
  ;If result$<>"" : Debug result$ : EndIf
Next 

MessageRequester("", Str(ElapsedMilliseconds()-time))
If you'd tried iterating over all of them you'd discover your function is painfully slow

Re: StringField_BF - A alterable function - Feasibility demo

Posted: Sat May 08, 2021 5:28 pm
by Saki
Hi, many thanks for the info, that's right, I hadn't noticed.

Code updated.

I think, faster as this new code is unfortunately not possible with PB code.

It will certainly be interesting to see what the new compiler does with it.

Re: StringField_BF - A alterable function - Feasibility demo

Posted: Sat May 08, 2021 7:22 pm
by the.weavster
Saki wrote: Wed May 05, 2021 7:11 pm

Code: Select all

Define multiplier=5 ; <== was 15
You've massively shortened the delimited string so now you only have 225 fields rather than 229377 :lol:
I could have a weekend break in the time it takes to call your StringField_BF function 229377 times.


:idea: Here's an alternative methodology that completes the task in an acceptable time frame even with 229377 fields. Rather than using StringField it splits the delimited string into a list. It will even accept an escape character if needs be. Unless the delimited string is very short this method should be much more efficient as you only have to traverse the string once.

Code: Select all

EnableExplicit

Procedure.s String_Chunk(*source, nSize)
    Protected *bfr = AllocateMemory(nSize + 4)
    CopyMemory(*source, *bfr, nSize)
    Protected out$ = PeekS(*bfr)
    FreeMemory(*bfr)
    ProcedureReturn out$
EndProcedure

Procedure String_Split(val$, delim$, esc$, List OutList.s())
    Protected escdel$  = esc$ + delim$
    Protected nSizeDel = StringByteLength(delim$)
    Protected nSizeEsc = StringByteLength(escdel$)
    Protected nSizeVal = StringByteLength(val$)
    If nSizeDel = 0 Or nSizeVal = 0
        ProcedureReturn
    EndIf
    Define txt$ = "", nEnd = 0, nLen = 0, nStart = 0, bEscape = #False
    While nEnd <= nSizeVal
        If esc$ <> "" And CompareMemory(@val$ + nEnd, @escdel$, nSizeEsc)
            nEnd + nSizeEsc
            nLen + nSizeEsc
            bEscape = #True
        ElseIf CompareMemory(@val$ + nEnd, @delim$, nSizeDel)
            txt$ = String_Chunk(@val$ + nStart, nLen)
            If bEscape
                txt$ = ReplaceString(txt$, escdel$, delim$)
                bEscape = #False
            EndIf
            AddElement(OutList())
            OutList() = txt$
            nEnd + nSizeDel
            nStart = nEnd
            nLen = 0
        Else
            nEnd + 1
            nLen + 1
        EndIf
    Wend
    If nLen > 0
        txt$ = String_Chunk(@val$ + nStart, nLen)
        If bEscape : txt$ = ReplaceString(txt$, escdel$, delim$) : EndIf
        AddElement(OutList())
        OutList() = txt$
    EndIf
EndProcedure


CompilerIf #PB_Compiler_IsMainFile
    
    Define separator$ = " "
    Define index, result$, string$
    
    string$ = " Hello I am a splitted string "
    
    Define multiplier = 15
    
    Define i
    For i = 1 To multiplier
        string$ + string$
    Next i
    Debug "String length: " + Str(Len(string$))
    
    NewList FieldList.s()
    DisableDebugger
    Define time = ElapsedMilliseconds(), endtime = 0
    String_Split(string$, separator$, "", FieldList())
    ResetList(FieldList())
    While NextElement(FieldList())
        result$ = FieldList()
        ;If result$<>"" : Debug result$ : EndIf
    Wend
    endtime = ElapsedMilliseconds()-time
    EnableDebugger
    
    Debug "List length: " + Str(ListSize(FieldList()))
    MessageRequester("", Str(ElapsedMilliseconds()-time))
    
    End
    
CompilerEndIf

Re: StringField_BF - A alterable function - Feasibility demo

Posted: Sat May 08, 2021 9:07 pm
by Saki
Hi, yes, of course, if the prophet does not come to the mountain, the mountain must come to the prophet. :wink:

The unnecessary repetition is here one of the main problems.

You have solved that very nicely.

I have added a second code above that tricks the slow string generation.
The code immediately became twice as fast.

Well, working on this thing has deepened my knowledge and insight, that was the purpose of it.

I'm very curious to see what the new compiler does with it.
Unfortunately, it probably won't change anything in the string story.
I myself expect massive other problems, but I don't have the insight and I don't want to speculate about things I don't understand.
It's good that he's coming, there's no way around it.

You are very welcome in this thread !

Re: StringField_BF - A alterable function - Feasibility demo

Posted: Sat May 08, 2021 9:36 pm
by the.weavster
On reflection there may be a minor bug with my function that needs a tweak - as the last character of our string is the separator I think there should be one more empty element on the end of my FieldList().

I'm about to settle down on the sofa with a Thatchers Cider so I won't fix it right now, just be aware if you're going to use it :D

Re: StringField_BF - A alterable function - Feasibility demo

Posted: Sat May 08, 2021 10:06 pm
by Saki
Yes, the real world is often preferable to the virtual one, the impressions can be much deeper and more intense. :D