Page 1 of 4

String length should be stored for string variables

Posted: Sat Jul 18, 2020 1:26 pm
by Sicro
As it is now:
  • Each string function must recalculate the string length. This is an extreme performance killer.
As it should be:
  • The length should be stored before the string in the memory:

    Code: Select all

    Structure DynamicStringVariable
      stringLength.i
      *stringPointer
    EndStructure
    
    Structure FixedStringVariable
      stringLength.i
      string.c[0]
    EndStructure

    Code: Select all

    Define text$ = "Example string"
    
    Debug @text$ ; Prints the value of *stringPointer
    
    Debug PeekI(@text$ - SizeOf(Integer)) ; Prints the value of stringLength
Edit:
StringBuilder are not the solution, because we want also read strings fast.

Re: String length should be stored for string variables

Posted: Sat Jul 18, 2020 7:17 pm
by User_Russian
I asked for the same thing about 6 years ago. viewtopic.php?f=3&t=58892
But it is foolish to expect that it will be done sometime. Because requests are rarely fulfilled!
The requests section needs to be closed, because it's useless. viewtopic.php?f=3&t=75576

Re: String length should be stored for string variables

Posted: Sat Jul 18, 2020 7:40 pm
by kenmo
This would be a nice change, we can spare the extra bytes for faster performance now :)

However it would be a major change to the string library, and could break 20 years of existing code (memory bugs)

I would save this for maybe a big Version 5.0 update, and include it with other big changes - such as fully going to UTF-8 for PB strings, instead of the current UCS-2 style implementation.

Re: String length should be stored for string variables

Posted: Sat Jul 18, 2020 8:27 pm
by Saki
kenmo wrote: However it would be a major change to the string library, and could break 20 years of existing code (memory bugs)
The effort and the problems are certainly too great.
Many users don't know how the PB strings work, they just wonder when things get slow.
Others do not notice it at all because they do not use large strings.
For "Hello World", it is always very fast.
It is not a mistake, but a method.

Re: String length should be stored for string variables

Posted: Sat Jul 18, 2020 9:07 pm
by mk-soft
For Windows a change to UTF8 is not useful, because the API's are all Widechar.
For macOS and Linux maybe.
But then there is no static string anymore and Array of Chars won't work anymore.
A UTF8 character can take from 1 byte to 6 bytes in memory.
This leads to more problems than advantages.

The current string management works and is sufficient for 99.99% of applications.
If you process with string with a length of several megabytes, you should consider if the approach of processing the data has a thought error.

Re: Die Zeichenfolgenlänge sollte für Zeichenfolgenvariablen

Posted: Sat Jul 18, 2020 9:16 pm
by Saki
Moreover, UTF8 is quite slow due to the variable length and many things become more complicated.
I think there are more important construction sites.
Starting there now would probably produce an avalanche of new bugs.

Re: String length should be stored for string variables

Posted: Sat Jul 18, 2020 9:24 pm
by STARGÅTE
mk-soft wrote:The current string management works and is sufficient for 99.99% of applications.
If you process with string with a length of several megabytes, you should consider if the approach of processing the data has a thought error.
+1

I think some confuse a string with a memory buffer.

Re: String length should be stored for string variables

Posted: Sat Jul 18, 2020 9:35 pm
by User_Russian
Saki wrote:Others do not notice it at all because they do not use large strings.
100 thousand characters is a small line.
But PB needs more than 20 seconds to complete.

Code: Select all

DisableDebugger
s.s=""
t=ElapsedMilliseconds()
For i=1 To 100000
  s+"x"
Next
MessageRequester("", Str(ElapsedMilliseconds()-t)+" ms")
If the string contains a million characters, you get tired of waiting for the code to execute.

Re: String length should be stored for string variables

Posted: Sat Jul 18, 2020 9:40 pm
by Saki
You have to edit the strings binary !

Oh dear,
unsigned integers would be more important, yes, and even feasible without problems.

Re: String length should be stored for string variables

Posted: Sat Jul 18, 2020 9:54 pm
by mk-soft
0 ms ...

Code: Select all

DisableDebugger
s.s=""
t=ElapsedMilliseconds()
s = LSet("", 100000, "x")
MessageRequester("", Str(ElapsedMilliseconds()-t)+" ms")

Re: String length should be stored for string variables

Posted: Sat Jul 18, 2020 9:57 pm
by Saki
LOL, you are faster mk-soft, but here a binary 0 ms way

Code: Select all

DisableDebugger
s.s=Space(100000)
t=ElapsedMilliseconds()
For i=0 To 100000-8 Step 8
 PokeQ(@s+i, $0078007800780078)
Next
MessageRequester("", Str(ElapsedMilliseconds()-t)+" ms")
EnableDebugger
ShowMemoryViewer(@s, 100000)
Debug Len(s)

Re: String length should be stored for string variables

Posted: Sat Jul 18, 2020 10:16 pm
by mk-soft
All kidding aside.

Depending on the application PB is fast enough.
You just have to split the data in a sensible way and not pack a huge string all at once.

For our IT I have to convert, fill and export data regularly.
The program processes about 600000 records with 40 fields each in about 10-20 seconds.
So fast enough

Re: String length should be stored for string variables

Posted: Sat Jul 18, 2020 10:45 pm
by STARGÅTE
User_Russian wrote:
Saki wrote:Others do not notice it at all because they do not use large strings.
100 thousand characters is a small line.
But PB needs more than 20 seconds to complete.

Code: Select all

DisableDebugger
s.s=""
t=ElapsedMilliseconds()
For i=1 To 100000
  s+"x"
Next
MessageRequester("", Str(ElapsedMilliseconds()-t)+" ms")
If the string contains a million characters, you get tired of waiting for the code to execute.
To save the length of the string (the topic of this thread!) would not help to make such code faster, because it is the re-allocation of memory each time, which is the slow part.

Re: String length should be stored for string variables

Posted: Sat Jul 18, 2020 11:40 pm
by User_Russian
STARGÅTE wrote:To save the length of the string (the topic of this thread!) would not help to make such code faster, because it is the re-allocation of memory each time, which is the slow part.
Re-allocation of memory takes little time.

Code: Select all

DisableDebugger
t=ElapsedMilliseconds()
For i=1 To 100000
  *p=ReAllocateMemory(*p, i)
Next
MessageRequester("", Str(ElapsedMilliseconds()-t)+" ms")
Most of the time is spent on calculating the length of the string. The longer the string the more time is needed.

Re: Die Zeichenfolgenlänge sollte für Zeichenfolgenvariablen

Posted: Sat Jul 18, 2020 11:46 pm
by Saki
Yep, Re-allocation of memory is very fast.
Sarching the thermination is the chunk.