Page 3 of 4

Re: String length should be stored for string variables

Posted: Mon Jul 20, 2020 1:35 pm
by mk-soft
Programmed on the fast one to quickly create very large strings.
Can also be reprogrammed to UTF8, for example to save the data directly as UTF8. :wink:

For me it takes about 50 ms to add the 100000 characters

Code: Select all

; Moved to Tricks 'n' Tips
Link: Fast String

Re: String length should be stored for string variables

Posted: Mon Jul 20, 2020 3:34 pm
by wilbert
mk-soft wrote:Programmed on the fast one to quickly create very large strings.
Can also be reprogrammed to UTF8, for example to save the data directly as UTF8. :wink:
It's even possible to store the internal format and create string procedures that can use Ascii, Unicode or UTF8 internally. 8)

Re: String length should be stored for string variables

Posted: Sat Jul 25, 2020 12:20 pm
by Sicro
User_Russian wrote:I asked for the same thing about 6 years ago. viewtopic.php?f=3&t=58892
But it is foolish to expect that it will be done sometime. Because requests are rarely fulfilled!
The requests section needs to be closed, because it's useless. viewtopic.php?f=3&t=75576
I always try to find existing feature requests before I open one myself. Your thread is too much about string builders (merging strings) later.
Through my thread I wanted to start the discussion about fast strings again from scratch. So that, also the fast reading of strings is more considered.

At least recently the wish to update the third-party libraries was fulfilled.
But I agree with you, the feature requests should get more reactions from the PB team, e.g. the PB team could tell us where the problems are if the feature requests are not integrated for a long time. So we could maybe help with the solution search. With the feedback from the PB team, creating feature requests would also make it more fun to create feature requests, because then we know that the PB team is paying attention to them.
kenmo wrote:However, it would be a major change to the string library, and could break 20 years of existing code (memory bugs)
Currently I don't see any problems that could break existing code after the integration of the feature request.
mk-soft wrote:The current string management works and is sufficient for 99.99% of applications.
Here in the forum you can often see that codes with pointers are used when it comes to speed instead of using the normal string functions. I think that the PB string functions should be the preferred functions and should therefore be fast, so that codes with pointers are not necessary at all, but only in special cases.
STARGĂ…TE wrote:I think some confuse a string with a memory buffer.
With a few megabytes of string size there should be no problems. If the size is much more, I also recommend to use memory rather than variables, because AllocateMemory() allows reacting in case of a failure, so the program does not crash in any case, if the memory could not be allocated. We then have the opportunity to intervene and determine what should happen in case of failure. Then I unfortunately have to work with memory functions, although I actually want to use the string functions, because I actually work with strings. Difficult thing.
Rinzwind wrote:Also storing the length with the string makes it possible to use them for any binary storage (since null doesnt have to have special meaning anymore if you choose so). Can be quite convenient.
My feature request is not to abolish the zero termination character. Memory should still be used for binary data. Unlike with string variables, we can not write the string length before the string in the memory, because then there is the danger that already existing codes do not work correctly anymore. After the implementation of this feature request, PeekS() for example must still search for the string end character (null character).
helpy wrote:-1

If internally storing string length with each PB string a new problem would arise ;-)
This problem would occour if you manipulate a PB string using pointers, memory functions and writing directly to the string memory using Poke or other *PointerToCharcter. Manipulating a string this way would not update the internal string length and PB functions would not work correctly ... :-(
Extending string variables with memory functions (e.g. PokeS()) is already risky, because the PB string management doesn't notice that:

Code: Select all

Define text$ = "Hello"
PokeS(@text$, "Hello world!") ; Overflow in a string memory block
NicTheQuick wrote:Isn't there a thing like MemorySize() for a strings buffer? How does this usually work together with AllocateMemory()?
If the operating system already knows how big the memory buffer to a given pointer is, this could be another idea.
For performance reasons PB always allocates more memory for strings than necessary, so that the memory for the string does not have to be reallocated with every small string extension. So your imaginary MemorySize() function would return the string size plus the extra memory size, but not the real string size.

@mk-soft: Thanks for your code, but it only partially solves the problem. You would have to rewrite every PB function that handles with strings, so the implementation in PB must be done natively.

Re: String length should be stored for string variables

Posted: Sat Jul 25, 2020 1:26 pm
by mk-soft
The string management of Purebasic is old and needs to be updated.

Here a sad result for comparison ...
Link: viewtopic.php?f=13&t=75750#p557919

So a big plus to the features request after all.

Re: String length should be stored for string variables

Posted: Sat Jul 25, 2020 1:59 pm
by Saki
mk-soft wrote:The string management of Purebasic is old and needs to be updated.

Here a sad result for comparison ...
Link: viewtopic.php?f=13&t=75750#p557919

So a big plus to the features request after all.
Great mk_soft, primary, i think, we can say, it's buggy :(

Re: String length should be stored for string variables

Posted: Sat Jul 25, 2020 2:51 pm
by User_Russian
Sicro wrote: I edited my first post for a hint in the fixed string structure. With fixed strings we don't need to store the string length, because the memory buffer surely corresponds exactly to the size needed to store the string and the string length can therefore be easily calculated:

Code: Select all

stringLength = MemorySize(*fixedString) / SizeOf(Character)
Your code will show the maximum length of the string, not how many characters it stores.
Compare with this code.

Code: Select all

s.s{128}

s="1234"
Debug Len(s)

Re: String length should be stored for string variables

Posted: Sat Jul 25, 2020 6:13 pm
by Sicro
@User_Russian: Argh, yes, of course. I reversed my edit.

Re: String length should be stored for string variables

Posted: Sat Oct 15, 2022 10:17 am
by BarryG
kenmo wrote: Sat Jul 18, 2020 7:40 pmHowever it would be a major change to the string library, and could break 20 years of existing code (memory bugs)
How? It causes no changes to our source codes, so old compiled exes would still work the same, and new compiled exes would just use the new compiled string code. So what's to break?

[Edit] Fixed a typo.

Re: String length should be stored for string variables

Posted: Tue Oct 18, 2022 4:12 am
by kenmo
Hi @BarryG , that was 2 years ago so I don't know exactly what bugs I was thinking of. But off the top of my head:

- sometimes in PB, people allocate a string variable, and then poke a different string over it (or an API call writes over it)

- sometimes people poke a NUL char to truncate a string shorter

In these two cases, the stored string length could become out of sync with the actual contents... I'm assuming null chars would still be terminators in this hypothetical new PureBasic string library.

Re: String length should be stored for string variables

Posted: Sun Nov 13, 2022 12:09 pm
by Sicro
kenmo wrote: Tue Oct 18, 2022 4:12 am- sometimes in PB, people allocate a string variable, and then poke a different string over it (or an API call writes over it)
Yes, I obviously didn't think of that at the time when I created that feature request, although I also did that in my codes sometimes. Mostly this is done with WinAPI functions because they are compatible with the PB strings unlike the other API functions of the other OS.

In this case the following has to be done after the API function call, which means that old code has to be adapted and thus there is no backwards compatibility:

Code: Select all

result$ = Space(255)
WinAPI_Function(@result$)
result$ = PeekS(@result$)
Debug result$
or with a new function UpdateStringLength() that does not require creation of a new string:

Code: Select all

result$ = Space(255)
WinAPI_Function(@result$)
; -1 = search for the null character
UpdateStringLength(result$, -1)
Debug result$

Code: Select all

result$ = Space(255)
length = WinAPI_Function(@result$)
UpdateStringLength(result$, length)
Debug result$
kenmo wrote: Tue Oct 18, 2022 4:12 am- sometimes people poke a NUL char to truncate a string shorter
That's apparently pretty rare. I can't remember where I've seen this.

----------------

Also, the new format for fixed strings that I suggested in the opening post is problematic because fixed strings are inserted directly into the structure instead of just the pointer, as is the case with normal strings. The structure is then different after the implementation of this feature and this results in no backward compatibility to old code.

----------------

The alternatives with full backward compatibility with old code would be:
  • The new string variable type is implemented as an addition to the existing string variable type. All PB string functions must then support two different string variable types.

    Code: Select all

    newString.z = "Hello"
    oldString.s = newString
    alterString + " World"
    newString = oldString
    Debug newString ; Outputs 'Hello World'
    Debug oldString ; Outputs 'Hello World'

    Code: Select all

    oldString.s{5} = "Hello World"
    newString.z{5} = "Hello World"
    Debug oldString ; Outputs 'Hello'
    Debug newString ; Outputs 'Hello'
    
    The new '.z' declares the new string variable type.
  • Each string function gets an optional parameter that can be used to pass the already known string length to the function, so that the function does not have to calculate the string length again.

    Code: Select all

    oldString.s = "Hello World"
    length = Len(oldString)
    Debug Left(oldString, 5, length) ; Outputs 'Hello'
    Debug Right(oldString, 5, length) ; Outputs 'World'
    

Re: String length should be stored for string variables

Posted: Sun Nov 13, 2022 1:13 pm
by mk-soft
I don't see a problem here, since PB requests the memory for the string, but the string functions go to zero bytes. There is no memory leak either, since all memory is freed. Even if the null byte is no longer at the end.

In an API that passes a string as a parameter, the length of the buffer must always be specified as well.
If it is very old API's that can only ASCII, must be worked anyway with memory buffer.

Code: Select all


; API Dummy
Procedure AnyApiW(*String, cbByte)
  Protected r1.s, len
  r1 = "Hello World"
  len = StringByteLength(r1)
  If *String
    If len <= cbByte
      PokeS(*String, r1)
      ProcedureReturn len
    EndIf
  Else
    ProcedureReturn Len
  EndIf
EndProcedure

Debug "****"
t1.s = Space(1024)
r1 = AnyApiW(@t1, StringByteLength(t1))
Debug Left(t1, 5)
Debug Right(t1, 5)

Debug "****"
r1 = AnyApiW(0, 0)
t1.s = Space(r1 >> 1)
r1 = AnyAPIW(@t1, StringByteLength(t1))
Debug Left(t1, 5)
Debug Right(t1, 5)

Debug "****"
Structure sData
  iVal.i
  text.s{20}
  null.w
EndStructure

Define d1.sData
r1 = AnyApiW(@d1\text, 40)
ShowMemoryViewer(d1, 80)
Debug Left(d1\text, 5)
Debug Right(d1\text, 5)


Re: String length should be stored for string variables

Posted: Sun Nov 13, 2022 5:53 pm
by Sicro
@mk-soft

After this feature is implemented, the PB string functions Left(), Right() etc. no longer search for a null character, but use the string length field value prefixed to the string. The problem: The WinAPI function does not update the value of the string length field.

Code: Select all

Procedure WinAPI_Function(*value)
  PokeS(*value, "Test")
EndProcedure

Define value$ = "Example"
; value$\charLength = 7

WinAPI_Function(@value$)

Debug value$
; Outputs 'Testple'
; value$\charLength = 7

Re: String length should be stored for string variables

Posted: Sun Nov 13, 2022 6:49 pm
by mk-soft
The normal string functions are terminated with zero bytes. Like now with PB.
If change to Type B-STR in PB, this leads of course to problems.

Re: String length should be stored for string variables

Posted: Sun Nov 13, 2022 11:22 pm
by AZJIO
Sicro wrote: Sun Nov 13, 2022 5:53 pm After this feature is implemented, the PB string functions Left(), Right() etc. no longer search for a null character
I was trying to figure out which way is faster. If the length of the string is known, then you use "For i=1 To Len", and if not known, then you use the character test "While *c\c". In either case, you check whether the counter is longer than the string, or whether the character is null.

The idea with the new .z string type is interesting, it would allow the new feature without deleting the old and then deciding whether to delete the old way.
wilbert wrote: Mon Jul 20, 2020 6:11 am And some functions like Split and Join would be a welcome addition.
I assumed that the authors do not add string functions for the reason that they can be made independently based on the existing functionality. Why not make an additional section in the help file with examples of interesting solutions. Many beginners find it difficult to write functions on their own, so if these functions were offered as ready-made functionality in the help file, it would improve the attractiveness of the language during the learning phase.

Re: String length should be stored for string variables

Posted: Mon Nov 14, 2022 10:28 am
by juergenkulow

Code: Select all

; Linux only x64 8 Byte before string. 
For i=1 To 35 Step 1
  s.s="TEST"+Space(i)
  *plen.Integer=@s-8
  slen=Len(s)
  Debug Str(*plen\i)+" "+Str(slen)+" "+Str((*plen\i-19)/2) ;+" "+Hex(*p)
Next 
ShowMemoryViewer(@s-8,Len(s)*2+10)
CompilerIf #PB_OS_Linux<>#PB_Compiler_OS : CompilerError "only LINUX" :CompilerEndIf
; 33 5 7
; 33 6 7
; 33 7 7
; 49 8 15
; 49 9 15
; 49 10 15
; 49 11 15
; 49 12 15
; 49 13 15
; 49 14 15
; 49 15 15
; 65 16 23
; 65 17 23
; 65 18 23
; 65 19 23
; 65 20 23
; 65 21 23
; 65 22 23
; 65 23 23
; 81 24 31
; 81 25 31
; 81 26 31
; 81 27 31
; 81 28 31
; 81 29 31
; 81 30 31
; 81 31 31
; 97 32 39
; 97 33 39
; 97 34 39
; 97 35 39
; 97 36 39
; 97 37 39
; 97 38 39
; 97 39 39
; 0000000000773878  61 00 00 00 00 00 00 00 54 00 45 00 53 00 54 00  a.......T.E.S.T.
; 0000000000773888  20 00 20 00 20 00 20 00 20 00 20 00 20 00 20 00   . . . . . . . .
; 0000000000773898  20 00 20 00 20 00 20 00 20 00 20 00 20 00 20 00   . . . . . . . .
; 00000000007738A8  20 00 20 00 20 00 20 00 20 00 20 00 20 00 20 00   . . . . . . . .
; 00000000007738B8  20 00 20 00 20 00 20 00 20 00 20 00 20 00 20 00   . . . . . . . .
; 00000000007738C8  20 00 20 00 20 00 00 00                           . . ...