BSTR* fast dynamic string datatype

Share your advanced PureBasic knowledge/code with the community.
User avatar
Keya
Addict
Addict
Posts: 1891
Joined: Thu Jun 04, 2015 7:10 am

Re: BSTR* fast dynamic string datatype

Post by Keya »

nco2k btw here's the problem im trying to address with the structure... the main issue is that a .String isn't an address of a string, it's an address of a pointer to the string:

Code: Select all

Define *mystr.String
*mystr=AllocateMemory(4)
*mystr\s = "abc"
Debug Hex(@*mystr)      ;4350F4
Debug Hex(*mystr)       ;3A1E90 all three are dispersed, not consecutive
Debug Hex(@*mystr\s)    ;391EA8
So I can't just have "*bstr = [Size][StringData]" and return *bstr+4, because @ +4 is the string data, not a pointer to a string.
So for example this fails:

Code: Select all

Procedure.i BSTR()
  *bstr = AllocateMemory(512)
  Debug "Alloc @ " + Hex(*bstr)
  PokeL(*bstr,4)
  PokeS(*bstr+4, "abcd", #PB_Ascii)
  ProcedureReturn *bstr+4
EndProcedure

Define *mystr.String 
*mystr = BSTR()
Debug "*mystr = " + Hex(*mystr)
Debug *mystr\s  ;invalid, trying to read the data as the address
This works, but still not quite there:

Code: Select all

Structure BSTR
  bufaddr.i  ;always points to @buf[0]
  size.l     ;string size
  buf.a[0]   ;string data
EndStructure

Procedure.i BSTR() ;create a 4-byte Bstr "abcd"
  *bstr.BSTR = AllocateMemory(512)
  *bstr\bufaddr = @*bstr\buf[0]
  *bstr\size = 4
  PokeS(*bstr+8, "abcd", #PB_Ascii)
  ProcedureReturn *bstr
EndProcedure

Define *mystr.String 
*mystr = BSTR()
Debug "*mystr = " + Hex(*mystr)
Debug *mystr\s
Trying to make the size and string consecutive now
User avatar
Keya
Addict
Addict
Posts: 1891
Joined: Thu Jun 04, 2015 7:10 am

Re: BSTR* fast dynamic string datatype

Post by Keya »

think ive got it, everything is consecutive in memory now. It seems the only difference from true BSTR is the addition of the address pointer before the size, and because i'm returning that instead of the address of the string (required for PB strings) the size is also retrieved differently @ PeekL(*bstr+4)

Code: Select all

Structure BSTR
  bufaddr.i  ;always points to @buf[0]  (as PB Strings need a pointer to the string, not the string address directly)
  size.l     ;string size
  buf.a[0]   ;string data
EndStructure

Procedure.i BSTR() ;create a 5-byte Bstr "abcde"
  *bstr.BSTR    = AllocateMemory(SizeOf(BSTR)) 
  *bstr\bufaddr = @*bstr\buf[0]
  *bstr\size    = 5
  PokeS(*bstr\bufaddr, "abcde", #PB_Ascii)
  ProcedureReturn *bstr
EndProcedure

Define *mystr.String = BSTR()
Debug "Text="+*mystr\s
Debug "Size="+Str(PeekL(*mystr+4)) ;v1
Debug "Size="+Str(PeekL(PeekL(*mystr)-4)) ;v2 - to access it via "-4" its 2x Peeks due to ptr-to-ptr
Because *mystr.String is a pointer-to-string-pointer and not pointer-to-data i can't quite envisage how a true BSTR could be constructed, but at the end of the day:
1) it's still accessible as a normal string via \s
2) the address returned is the address of the string (just like normal PB strings, and similar to true BStr returning address of string)
3) PeekL() wouldnt normally be used to get the string size anyway - that's what bLen() is for
4) its still perfectly correct valid BStr structure if you give it the address @size, skipping the address variable.
5) its all in the one memory allocation now (my first demo uses two as the String was stored separately)
So i dont think this difference (having the pointer at the start) is particularly important, and as the pointer is the only difference BSTR* has turned out to be a good name hehe
Last edited by Keya on Sun Mar 19, 2017 7:06 am, edited 1 time in total.
User avatar
Keya
Addict
Addict
Posts: 1891
Joined: Thu Jun 04, 2015 7:10 am

Re: BSTR* fast dynamic string datatype

Post by Keya »

PB can convert strings to bstr with its Pseudotype support:

Code: Select all

Prototype.i protMakeBstr(bstr.p-bstr)

Procedure MakeBstr_(*bstr.String)
  Debug "String=" + *bstr\s  ;doesn't show, as it's a ptr to the data, not address of string ptr
  ShowMemoryViewer(*bstr-4,30) ;but it is correct in memory - converted to unicode and stored with length @ -4
EndProcedure

MakeBstr.protMakeBstr = @MakeBstr_()
MakeBstr("test")
but this also demonstrates the BStr<>PB String incompatibility problem with how bstr is a ptr to the start of the data, not the address of the pointer to the start of data - so i think something like my above solution in previous post that includes the additional pointer field is unavoidable, but doesn't break the structure anyway - apart from the pointer at the start the rest of the structure is true BStr, and can therefore be accessed as such @ *bstr+Sizeof(Integer)+4

infratec, yes it seems bstr is always stored as unicode, regardless of ascii/unicode compile, so that would solve any Mid() issue i guess. I like the flexibility of offering Ascii also though for when its known a priori there'll only be Ascii chars and not any utf8
User avatar
mk-soft
Always Here
Always Here
Posts: 5393
Joined: Fri May 12, 2006 6:51 pm
Location: Germany

Re: BSTR* fast dynamic string datatype

Post by mk-soft »

@Keya
Very nice for fast string management, but is better you rename your BSTR to FastStr then there is no discusion about the structures used.
BSTR is not equal to FastStr and finished!

Property times a code written to create BSTR manually.
BSTR is only needed under Windows and there are ready APIs.

Update for Keya :wink:

Code: Select all

; BSTR Functions; Created by mk-soft; Date 19.03.017

; *****************************************************************************

Structure udtArrayChar
  c.c[0]
EndStructure

Structure udtBStr;Align 4
  len.l
  str.udtArrayChar
EndStructure

; -----------------------------------------------------------------------------

Procedure CreateBStr(Text.s)
  Protected *bstr.udtBStr, len
  len = StringByteLength(Text)
  *bstr = AllocateMemory(len + SizeOf(Long) + SizeOf(character))
  *bstr\len = len
  CopyMemory(@Text, @*bstr\str, len)
  ProcedureReturn @*bstr\str
EndProcedure

; -----------------------------------------------------------------------------

Procedure FreeBStr(*Bstr)
  Protected *mem = *Bstr - SizeOf(Long)
  FreeMemory(*mem)
EndProcedure

; -----------------------------------------------------------------------------

Procedure ConcatBStr(T1, T2)
  Protected *t1.udtBStr, *t2.udtBStr, *r1.udtBStr, len
  *t1 = t1 - SizeOf(Long)
  *t2 = t2 - SizeOf(Long)
  If *t1\len And *t2\len
    len = *t1\len + *t2\len
    *r1 = AllocateMemory(len + SizeOf(Long) + SizeOf(character))
    *r1\len = len
    CopyMemory(@*t1\str, @*r1\str, *t1\len)
    CopyMemory(@*t2\str, @*r1\str + *t1\len, *t2\len)
  ElseIf *t1\len
    len = MemorySize(*t1)
    *r1 = AllocateMemory(len)
    CopyMemory(*t1, *r1, len)
  ElseIf *t2\len
    len = MemorySize(*t2)
    *r1 = AllocateMemory(len)
    CopyMemory(*t2, *r1, len)
  Else
    *r1 = AllocateMemory(SizeOf(long) + SizeOf(character))
  EndIf
  ProcedureReturn @*r1\str
EndProcedure

; -----------------------------------------------------------------------------

Procedure _AddBStr(BStr, Text.s)
  Protected *bstr.udtBStr, len, len2
  *bstr = Bstr - SizeOf(Long)
  len = StringByteLength(Text)
  len2 = *bstr\len + len
  *bstr = ReAllocateMemory(*bstr, len2 + SizeOf(Long) + SizeOf(character))
  CopyMemory(@Text, @*bstr\str + *bstr\len, len)
  *bstr\len = len2
  ProcedureReturn @*bstr\str
EndProcedure

Macro AddBstr(BStr, Text)
  BStr = _AddBStr(BStr, Text)
EndMacro

; -----------------------------------------------------------------------------

Procedure LenBStr(*Bstr)
  Protected *mem.udtBStr = *Bstr - SizeOf(Long)
  ProcedureReturn (*mem\len / SizeOf(character))
EndProcedure

; -----------------------------------------------------------------------------

Procedure LeftBStr(BStr, Lenght)
  Protected *r1.udtBStr, *BStr.udtBStr, len
  *BStr.udtBStr = Bstr - SizeOf(Long)
  len = Lenght * SizeOf(character)
  If len > *BStr\len
    len = *BStr\len
  EndIf
  *r1 = AllocateMemory(len + SizeOf(Long) + SizeOf(character))
  *r1\len = len
  CopyMemory(@*BStr\str, @*r1\str, len)
  ProcedureReturn @*r1\str
EndProcedure

; -----------------------------------------------------------------------------

Procedure RightBStr(BStr, Lenght)
  Protected *r1.udtBStr, *BStr.udtBStr, len, pos
  *BStr.udtBStr = Bstr - SizeOf(Long)
  len = Lenght * SizeOf(character)
  If len > *BStr\len
    len = *BStr\len
  EndIf
  *r1 = AllocateMemory(len + SizeOf(Long) + SizeOf(character))
  *r1\len = len
  pos = *BStr\len - len
  CopyMemory(@*BStr\str + Pos, @*r1\str, len)
  ProcedureReturn @*r1\str
EndProcedure

; -----------------------------------------------------------------------------

Procedure MidBStr(BStr, Position, Lenght = 0)
  Protected *r1.udtBStr, *BStr.udtBStr, len, ofs
  *BStr.udtBStr = Bstr - SizeOf(Long)
  ofs = (position - 1) * SizeOf(character)
  len = Lenght * SizeOf(character)
  Repeat
    If ofs >= *BStr\len Or ofs < 0
      *r1 = AllocateMemory(SizeOf(Long) + SizeOf(character))
      Break
    EndIf
    If Not len
      len = *BStr\len
    EndIf
    If ofs + len > *BStr\len
      len = *BStr\len - ofs
    EndIf
    *r1 = AllocateMemory(len + SizeOf(Long) + SizeOf(character))
    *r1\len = len
    CopyMemory(@*BStr\str + ofs, @*r1\str, len)
  Until #True
  ProcedureReturn @*r1\str
EndProcedure

; -----------------------------------------------------------------------------

Procedure.s BStrString(*BStr)
  Protected *value.String = @*Bstr
  ProcedureReturn *value\s
EndProcedure

; -----------------------------------------------------------------------------

Procedure BStrVal(*BStr)
  Protected *value.String = @*Bstr
  ProcedureReturn Val(*value\s)
EndProcedure

; -----------------------------------------------------------------------------

Procedure.f BStrValF(*BStr)
  Protected *value.String = @*Bstr
  ProcedureReturn ValF(*value\s)
EndProcedure

; -----------------------------------------------------------------------------

Procedure.d BStrValD(*BStr)
  Protected *value.String = @*Bstr
  ProcedureReturn ValD(*value\s)
EndProcedure

; -----------------------------------------------------------------------------
; -----------------------------------------------------------------------------
; -----------------------------------------------------------------------------

;- Test

CompilerIf #PB_Compiler_Debugger
  
  Define t1, t2, t3, r1
  t1 = CreateBStr("Hello World")
  t2 = CreateBStr(", Purebasic Power")
  t3 = ConcatBStr(t1,t2)
  AddBStr(t3, " !")
  Debug LenBStr(t3)
  Debug BStrString(t3)
  r1 = LeftBStr(t3, 5)
  Debug BStrString(r1)
  FreeBStr(r1)
  r1 = RightBStr(t3, 7)
  Debug BStrString(r1)
  FreeBStr(r1)
  r1 = MidBStr(t3, 14, 9)
  Debug BStrString(r1)
  Debug LenBStr(r1)
  FreeBStr(r1)
  r1 =CreateBStr("12345.12345")
  Debug BStrVal(r1)
  Debug BStrValF(r1)
  Debug BStrValD(r1)
  FreeBStr(r1)
  FreeBStr(t1)
  FreeBStr(t2)
  FreeBStr(t3)
  
CompilerElse
  Define t1, append$   ;a BString and a normal String$
  
  append$ = ""
  Time1=ElapsedMilliseconds()
  For i = 1 To 20000
    append$ + "Append This"
  Next i
  Time2=ElapsedMilliseconds()
  
  A$ = "Str$ Time=" + Str(Time2 - Time1) + ~"ms\n" 
  
  t1 = CreateBStr("")
  Time1=ElapsedMilliseconds()
  For i = 1 To 20000
    AddBStr(t1, "Append This")
  Next i
  Time2=ElapsedMilliseconds()
  
  A$ + "BSTR Time=" + Str(Time2 - Time1) + "ms"
  
  MessageRequester("BSTR timings",A$)
  ;Str$ Time=7716ms
  ;BSTR Time=26ms
CompilerEndIf
:wink:

P.S. Only Unicode
Last edited by mk-soft on Sun Mar 19, 2017 10:20 pm, edited 1 time in total.
My Projects ThreadToGUI / OOP-BaseClass / EventDesigner V3
PB v3.30 / v5.75 - OS Mac Mini OSX 10.xx - VM Window Pro / Linux Ubuntu
Downloads on my Webspace / OneDrive
User avatar
Sicro
Enthusiast
Enthusiast
Posts: 538
Joined: Wed Jun 25, 2014 5:25 pm
Location: Germany
Contact:

Re: BSTR* fast dynamic string datatype

Post by Sicro »

In order to accelerate the expansion of strings even more, you can reserve generous storage space so that you do not have to extend the memory for each expansion.

A module for quickly expanding strings I have also recently written:
https://github.com/SicroAtGIT/PureBasic ... trings.pbi
Image
Why OpenSource should have a license :: PB-CodeArchiv-Rebirth :: Pleasant-Dark (syntax color scheme) :: RegEx-Engine (compiles RegExes to NFA/DFA)
Manjaro Xfce x64 (Main system) :: Windows 10 Home (VirtualBox) :: Newest PureBasic version
User avatar
Keya
Addict
Addict
Posts: 1891
Joined: Thu Jun 04, 2015 7:10 am

Re: BSTR* fast dynamic string datatype

Post by Keya »

in your demo is there no way to access the strings except by using PeekS? I've done my best to keep mine compatible with \s so it can be treated directly as a PB string for read ops, ie mine are *mybstr.String's not *mybstr.bstrstruct
mk-soft wrote:but is better you rename your BSTR to FastStr then there is no discusion about the structures used.
But it IS a BStr, the only difference is the address pointer at the start, but every byte from then on is identical to BStr, and can be referenced as such very easily as TrueBStr = *bstr+Sizeof(Integer), while at the same time it's directly accessible as a PB string as *bstr\s
BSTR is only needed under Windows and there are ready APIs.
wellll, fast string handling is also needed in Linux and Mac where the same issues of C-style null-terminated strings equally apply - they have the exact same issue. It doesn't necessarily need to be in the form of BStr though, it just seemed a good model to try :)
User avatar
mk-soft
Always Here
Always Here
Posts: 5393
Joined: Fri May 12, 2006 6:51 pm
Location: Germany

Re: BSTR* fast dynamic string datatype

Post by mk-soft »

Unfortunately, this is not entirely true.
In a 'BSTR' the pointer points directly to the text and 4 byte before the pointer to the length.
With you it is not the pointer to the text, but the pointer to the pointer of the text. Thus, it is not compatible with the BSTR.

Code: Select all

t1 = bstr(0, "Hello World")
ShowMemoryViewer(t1 - 4, 32)
CallDebugger
t2 = SysAllocString_("Hello World")
ShowMemoryViewer(t2 - 4, 32)
But is very fast :wink:
My Projects ThreadToGUI / OOP-BaseClass / EventDesigner V3
PB v3.30 / v5.75 - OS Mac Mini OSX 10.xx - VM Window Pro / Linux Ubuntu
Downloads on my Webspace / OneDrive
User avatar
nco2k
Addict
Addict
Posts: 1344
Joined: Mon Sep 15, 2003 5:55 am

Re: BSTR* fast dynamic string datatype

Post by nco2k »

ha, i was so tired yesterday, that i accidentally replied you in private instead of public. :lol:

like explained in the link i posted earlier, a real bstr is [size.l][unicode string][null.w] and the returned pointer of a bstr function points to [unicode string] and not [size.l]. i think a lot of confusion here is due the fact that you use two buffers. one for the info and one for the actual string, while a real bstr uses only one buffer, like in the example from mk-soft. only that a real bstr is always unicode and actually binary safe, hence the name binary-string.

when you write ABC[0]DEF[0], purebasic will stop after hitting the first null and return only ABC[0], while a real bstr could return ABC[0]DEF[0]. a bstr doesnt search for null, it reads the [size.l] value and copies everything of that size, wether it contains null or not. thats why they are so fast. purebasic strings are not binary safe, so if you want proper bstr handling, you would have to use CompareMemory() etc. but the point of this thread is not really to re-create bstr handling, but making pbstr handling faster. :D

and yes, .String needs a pointer to a pointer, thats why you have to either carry an additional variable in your structure, or simply use .String in the functions that require it:

Code: Select all

String$ = "ABCDEF"
*Memory = AllocateMemory(StringByteLength(String$) + SizeOf(Character))
PokeS(*Memory, String$)

Procedure$ MyLeft(*MyStr, Length)
  Protected *String.String = @*MyStr
  ProcedureReturn Left(*String\s, Length)
EndProcedure

Debug MyLeft(*Memory, 3)
c ya,
nco2k
If OSVersion() = #PB_OS_Windows_ME : End : EndIf
User avatar
Keya
Addict
Addict
Posts: 1891
Joined: Thu Jun 04, 2015 7:10 am

Re: BSTR* fast dynamic string datatype

Post by Keya »

mk-soft wrote:Unfortunately, this is not entirely true. In a 'BSTR' the pointer points directly to the text and 4 byte before the pointer to the length. With you it is not the pointer to the text, but the pointer to the pointer of the text. Thus, it is not compatible with the BSTR.
Please read again what i said - every byte from then on (after the pointer) is identical to BStr, and can be referenced as such very easily as TrueBStr = *bstr+Sizeof(Integer), while at the same time it's directly accessible as a PB string as *bstr\s. The only difference is the pointer at the start, which can be ignored when using it as a BStr, but its presence makes it fully compatible as a PB string while also being fully BStr compatible. Its structure is literally [pointer][True BStr]
User avatar
mk-soft
Always Here
Always Here
Posts: 5393
Joined: Fri May 12, 2006 6:51 pm
Location: Germany

Re: BSTR* fast dynamic string datatype

Post by mk-soft »

What nco2k writes is correct.

Perhaps you look at times my code again to get around without double AllocateMemory get along.
For this I update my stand again in the previous code.

I think your idea is very good. :wink: :wink: :wink:

To work with windows with BSTR should in the case the API be used if one with foreign functions or Dll's would like to work the BSTR need.

:wink:
My Projects ThreadToGUI / OOP-BaseClass / EventDesigner V3
PB v3.30 / v5.75 - OS Mac Mini OSX 10.xx - VM Window Pro / Linux Ubuntu
Downloads on my Webspace / OneDrive
User avatar
Keya
Addict
Addict
Posts: 1891
Joined: Thu Jun 04, 2015 7:10 am

Re: BSTR* fast dynamic string datatype

Post by Keya »

mk-soft wrote:Perhaps you look at times my code again to get around without double AllocateMemory get along.
sorry if i'm misunderstanding you, but while I did 2 allocations in my first post in my last example i'm only doing 1 allocation - everything is exactly the same as true BStr apart from a pointer at the start of it :) and again to use it as a true Bstr the address is simply *bstr+Sizeof(Integer), as opposed to referencing it as a string with *bstr\s - hopefully best of both worlds. But is the BSTR format the best approach for this? I don't know, but if anything it has the advantage of Windows compatibility and that doesn't really seem to come at any particular cost
User avatar
Fig
Enthusiast
Enthusiast
Posts: 351
Joined: Thu Apr 30, 2009 5:23 pm
Location: Côtes d'Azur, France

Re: BSTR* fast dynamic string datatype

Post by Fig »

I think we should go with the "fast string" idea and let the Bstr away.
Because as it was highlighted, bstr api to create them are not ios/linux compatible. (and accessing with regular pb function will ignore next chr(0) strings)
This said, we don't care anymore of compatibility with real Bstr.
On windows Os, we can create a translate procedure to convert them in real Bstr if needed.

The structure looks very good and should be adopted.
FastString\string
FastString\size

If everybody agree, why not starting from that to write efficient function in asm ?
It will become the new standard of PB faststring.
Last edited by Fig on Sun Jun 25, 2017 12:35 pm, edited 1 time in total.
There are 2 methods to program bugless.
But only the third works fine.

Win10, Pb x64 5.71 LTS
User avatar
Mijikai
Addict
Addict
Posts: 1360
Joined: Sun Sep 11, 2016 2:17 pm

Re: BSTR* fast dynamic string datatype

Post by Mijikai »

Code: Select all

Structure POWERBASIC_Str
  Ptr.l
  Size.l
  ;StringBuffer
EndStructure
Recently i looked at some powerbasic code which seems to do exactly that ->
http://www.purebasic.fr/english/viewtop ... 13&t=68613
I think its a fast & smart way to deal with strings. :)

(theres also some code that shows how i deal with the memory allocation...)
Post Reply