BSTR* fast dynamic string datatype

Share your advanced PureBasic knowledge/code with the community.
User avatar
Keya
Addict
Addict
Posts: 1891
Joined: Thu Jun 04, 2015 7:10 am

BSTR* fast dynamic string datatype

Post by Keya »

Ok, nearly BSTR ... very subtle difference - BSTR is normally: [4 bytes Length][String data]
But to allow for relatively seamless integration with PB's String datatype i'm using this:

Code: Select all

Structure BSTR
 size.l
 s.s ;POINTER to the string, not the actual string data itself
EndStructure
That for example allows for Messagebox(*bstr\s), whereas with a true BSTR you would have to do Messagebox(@*bstr+4), which the compiler won't allow as that parameter isn't a string, which functions like Messagebox require.

Simple rule of thumb - when READING you can treat it as a normal string, eg. Debug "The string is " + *bstr\s
But whenever WRITING you must use the bXxx() functions.

In the following simple SpeedTest.pb which repeatedly appends an 11 byte string 20000 times, PB's String function took 7716ms, BSTR took 26ms (both in Debug mode):

Code: Select all

XIncludeFile("BSTR.pbi")

Define *append.BSTR, append$   ;a BString and a normal String$

append$ = ""
Time1=ElapsedMilliseconds()
For i = 1 To 20000
  append$ + "Append This"
Next i
Time2=ElapsedMilliseconds()
Debug "Str$ Time=" + Str(Time2-Time1)+"ms" ;+ " String=" + append$

*append.BSTR = BSTR(0, "")
Time1=ElapsedMilliseconds()
For i = 1 To 20000
  bAppend(*append, "Append This")
Next i
Time2=ElapsedMilliseconds()
Debug "BSTR Time=" + Str(Time2-Time1)+"ms" ;+ " String=" + *append\s

;Str$ Time=7716ms
;BSTR Time=26ms
Here's BSTR.pbi:

Code: Select all

;### BSTR Dynamic Strings ###
;by Keya. Public domain.
;Unicode + Ascii, 32 + 64, Win+Linux+Mac.


Structure BSTR
  size.l    ;"ABC" = 3 (Asc) or 6 (unicode)
  s.s       ;Never directly WRITE to this; always use bFunctions() to ensure \Size is updated.
EndStructure


CompilerIf #PB_Compiler_Unicode
  #PB_StringFormat = #PB_Unicode
CompilerElse
  #PB_StringFormat = #PB_Ascii
CompilerEndIf  

CompilerIf #PB_Compiler_Unicode
  Macro UNICALC(value)
    (value)*2
  EndMacro
CompilerElse
  Macro UNICALC(value)
    (value)
  EndMacro
CompilerEndIf


Procedure BSTR(*bstr.BSTR, string$) ;Create/overwrite bStrings
  strlen = StringByteLength(string$)
  If Not *bstr
    *bstr = AllocateMemory(SizeOf(BSTR))
    *newaddr = AllocateMemory(strlen+2)
    PokeI(*bstr+4, *newaddr)
  Else
    If *bstr\size <> strlen
      *newaddr = ReAllocateMemory(@*bstr\s, strlen+2)
      PokeI(*bstr+4, *newaddr)
    EndIf
  EndIf
  PokeL(*bstr, strlen)
  CopyMemory(@string$, @*bstr\s, strlen + SizeOf(Character))
  ProcedureReturn *bstr
EndProcedure


Procedure bAppend(*bstr.BSTR, string$) ;Append String to bString
  strlen = StringByteLength(string$)
  oldlen = *bstr\size
  newsize = oldlen + strlen
  *newaddr = ReAllocateMemory(@*bstr\s, newsize+2)
  PokeI(*bstr+4, *newaddr)
  CopyMemory(@string$, @*bstr\s + oldlen, strlen + SizeOf(Character))
  PokeL(*bstr, newsize)
EndProcedure


Procedure bFree(*bstr.BSTR) ;Free. Not really required (there's no Free for normal strings)
  If *bstr 
    FreeMemory(@*bstr\s)
    PokeL(*bstr,0): PokeI(*bstr+4,0)
    FreeMemory(*bstr)
  EndIf
EndProcedure


Procedure.i bLen(*bstr.BSTR)
  CompilerIf #PB_Compiler_Unicode
    ProcedureReturn *bstr\size >> 1
  CompilerElse
    ProcedureReturn *bstr\size
  CompilerEndIf
EndProcedure


Procedure.i bStringByteLength(*bstr.BSTR)
  ProcedureReturn *bstr\size
EndProcedure


Procedure.s bMid(*bstr.BSTR, startpos, length=0, format=#PB_StringFormat)
  If length<=0: length=*bstr\size: EndIf
  ProcedureReturn PeekS(@*bstr\s + UNICALC(startpos)-SizeOf(Character), length, format)
EndProcedure


Macro bRight(bstr, length)
  PeekS(@bstr\s + bstr\size - UNICALC(length), length, #PB_StringFormat)
EndMacro
; Procedure.s bRight(*bstr.BSTR, length, format=#PB_StringFormat)
;   ProcedureReturn PeekS(@*bstr\s + *bstr\size - UNICALC(length), length, format)
; EndProcedure


Macro bLeft(bstr, length)
  PeekS(@bstr\s, length, #PB_StringFormat)
EndMacro
;Procedure.s bLeft(*bstr.BSTR, length, format=#PB_StringFormat)
;  ProcedureReturn PeekS(@*bstr\s, length, format)
;EndProcedure


Macro bTrim(pbstr,character=" ")
  Trim(pbstr\s,character)
EndMacro

Macro bLTrim(pbstr,character=" ")
  LTrim(pbstr\s,character)
EndMacro

Macro bRTrim(pbstr,character=" ")
  RTrim(pbstr\s,character)
EndMacro

Macro bLCase(pbstr)
  LCase(pbstr\s)
EndMacro

Macro bUCase(pbstr)
  UCase(pbstr\s)
EndMacro

Macro bFindString(pbstr, stringtofind, startposition=0,mode=#PB_String_CaseSensitive)
  FindString(pbstr\s, stringtofind, startposition, mode)
EndMacro
And here's a more comprehensive Demo.pb.
Most functions are implemented as macros. I've only added about 1/3rd of the string functions, but wanted to keep it fairly short for this first post. I don't envisage any problems adding full support. This demo uses a normal string function alongside a bstr string function - this is just for comparison, you don't need a normal string in order to use a bstr.

Code: Select all

XIncludeFile("BSTR.pbi")

Define *bstr1.BSTR, normal$   ;a BString and a normal String


*bstr1 = BSTR(0,"First")        ;create string "First"
normal$ = "First"
Debug "Full: " + normal$
Debug "Full: " + *bstr1\s

*bstr1 = BSTR(*bstr1,"Second")  ;overwrite string to become "Second"
normal$ = "Second"
Debug "Full: " + normal$
Debug "Full: " + *bstr1\s

Debug "Left: " + Left(normal$,3)
Debug "Left: " + bLeft(*bstr1,3)

Debug "Right: " + Right(normal$, 3)
Debug "Right: " + bRight(*bstr1, 3)

Debug "Mid: " + Mid(normal$, 2, 3)
Debug "Mid: " + bMid(*bstr1, 2, 3)

normal$ = normal$ + " appended   "
*bstr1 = BSTR(*bstr1,*bstr1\s + " appended   ")
Debug "Append: " + normal$  + "[End]"
Debug "Append: " + *bstr1\s + "[End]"

Debug "Trim: " + Trim(normal$)  + "[End]"
Debug "Trim: " + bTrim(*bstr1) + "[End]"

Debug "Len: " + Str(Len(normal$))
Debug "Len: " + Str(bLen(*bstr1))

Debug "StrByteLen: " + Str(StringByteLength(normal$))
Debug "StrByteLen: " + Str(bStringByteLength(*bstr1))

Debug "FindString: " + Str(FindString(normal$, "pen"))
Debug "FindString: " + Str(bFindString(*bstr1, "pen"))

Debug "UCase: " + UCase(normal$)
Debug "UCase: " + bUcase(*bstr1)

bFree(*bstr1)   ;not really needed
davido
Addict
Addict
Posts: 1890
Joined: Fri Nov 09, 2012 11:04 pm
Location: Uttoxeter, UK

Re: BSTR* fast dynamic string datatype

Post by davido »

@Keya,
Impressive!!

Your speed test demo looks even better without the debugger:

Code: Select all

XIncludeFile("BSTR.pbi")

Define *append.BSTR, append$   ;a BString and a normal String$

append$ = ""
Time1=ElapsedMilliseconds()
For i = 1 To 20000
  append$ + "Append This"
Next i
Time2=ElapsedMilliseconds()

A$ = "Str$ Time=" + Str(Time2 - Time1) + ~"ms\n" 

*append.BSTR = BSTR(0, "")
Time1=ElapsedMilliseconds()
For i = 1 To 20000
  bAppend(*append, "Append This")
Next i
Time2=ElapsedMilliseconds()

A$ + "BSTR Time=" + Str(Time2 - Time1) + "ms"

MessageRequester("BSTR timings",A$)
;Str$ Time=7716ms
;BSTR Time=26ms
On my machine it was: 4513ms versus 4ms
DE AA EB
User avatar
Kwai chang caine
Always Here
Always Here
Posts: 5353
Joined: Sun Nov 05, 2006 11:42 pm
Location: Lyon - France

Re: BSTR* fast dynamic string datatype

Post by Kwai chang caine »

Thanks for sharing 8)

My result on W10 X64 (v5.60 x86)
Str$ Time=13698ms
BSTR Time=49ms
if I'm not mistaken, the BSTR is mainly use by CROSOFT for VB6...like SafeArray no ?
For my knowledge, why you need it ? :oops:
ImageThe happiness is a road...
Not a destination
User avatar
Keya
Addict
Addict
Posts: 1891
Joined: Thu Jun 04, 2015 7:10 am

Re: BSTR* fast dynamic string datatype

Post by Keya »

Kwai chang caine wrote:For my knowledge, why you need it ? :oops:
By simply storing the length of the string and adjusting it with each modification it's able to overcome the performance issues of null-terminated strings, where for example if you want to append to a string you have to first scan the entire string to determine its length. It also means you can use them to store data that has nulls, such as file data, but because of the way ive structured it it can also be directly read as a standard PB string with *bstr\s so it maintains all the flexibility of PB's native strings and full access via PB's native string functions
walbus
Addict
Addict
Posts: 929
Joined: Sat Mar 02, 2013 9:17 am

Re: BSTR* fast dynamic string datatype

Post by walbus »

This old thread ,i believe, demonstrate the same way :wink:

http://www.purebasic.fr/english/viewtop ... 12&t=66419
User avatar
Keya
Addict
Addict
Posts: 1891
Joined: Thu Jun 04, 2015 7:10 am

Re: BSTR* fast dynamic string datatype

Post by Keya »

not really, no. This isn't a Hex2Dec function either.
walbus
Addict
Addict
Posts: 929
Joined: Sat Mar 02, 2013 9:17 am

Re: BSTR* fast dynamic string datatype

Post by walbus »

A little :wink:
Last edited by walbus on Sun Apr 02, 2017 5:29 pm, edited 1 time in total.
User avatar
nco2k
Addict
Addict
Posts: 1344
Joined: Mon Sep 15, 2003 5:55 am

Re: BSTR* fast dynamic string datatype

Post by nco2k »

why are you using PokeL(*bstr, newsize) instead of *bstr\size = newsize?

i would also try to use *str.string instead of str.s in your structure. this way you could get rid of PokeI() too and even PeekS(), if you jump and terminate the string (*str\c=0) at the required position.

also (R)Trim has to search for the end of the string first, before it can start chopping off the characters. since you already know the length of the string, maybe it would be faster to write your own routine.

btw, true bstrings always point to the first character of the string and not to the length prefix. you wouldnt have to write *bstr+4 anyway. :wink:

doesnt pb automatically free a local string at the end of a procedure? that would not be the case when using AllocateMemory(), so you definitely should use bFree() to avoid leaks.

c ya,
nco2k
If OSVersion() = #PB_OS_Windows_ME : End : EndIf
User avatar
Keya
Addict
Addict
Posts: 1891
Joined: Thu Jun 04, 2015 7:10 am

Re: BSTR* fast dynamic string datatype

Post by Keya »

nco2k wrote:why are you...
This v0.1 is just me thinking out loud, wondering what it might look like and trying various things. It was in response to the request for fast strings which pops up every now and then, but there wasn't really any other example i could find.

Basically all i'm trying to do is 1) maintain the flexibility native string type and access to all string functions, while 2) seamlessly incorporating the Size field so operations can be made straight away without scanning the string.
nco2k wrote:why are you using PokeL(*bstr, newsize) instead of *bstr\size = newsize?
quantum physics. In my mind it was in a superposition of both states, but due to inadequate caffeine levels when I typed it it came out as the Poke version. In most other universes i correctly typed *bstr\size, when i wasn't typing out the complete works of Shakespeare or putting cats in radioactive boxes.

Ok it was just bad coding not quantum physics, work with me here.
i would also try to use *str.string instead of str.s in your structure. this way you could get rid of PokeI() too and even PeekS(), if you jump and terminate the string (*str\c=0) at the required position.
seems problematic:

Code: Select all

Structure BSTR
  size.l
  s1.s
  s2.String
EndStructure

Define *bstr1.BSTR
Debug "Full: " + *bstr1\s1  ;OK
Debug "Full: " + *bstr1\s2  ;CompilerError "A structure can't automatically be converted to a string"
also (R)Trim has to search for the end of the string first, before it can start chopping off the characters. since you already know the length of the string, maybe it would be faster to write your own routine.
Yes i have to be careful to consider which existing PB native string functions scan the string, as they're the ones this module needs to provide support/alternatives for
btw, true bstrings always point to the first character of the string and not to the length prefix. you wouldnt have to write *bstr+4 anyway. :wink:
Sounds good but can you provide an example of the structure and how you would reference it? thanks
doesnt pb automatically free a local string at the end of a procedure? that would not be the case when using AllocateMemory(), so you definitely should use bFree() to avoid leaks.
Good thinking, yes i think PB automatically includes an invisible "_PB_SysFreeString(string)" (or something) for local strings in procedures so bFree would be needed for that
infratec
Always Here
Always Here
Posts: 6874
Joined: Sun Sep 07, 2008 12:45 pm
Location: Germany

Re: BSTR* fast dynamic string datatype

Post by infratec »

After a first short look:

Code: Select all

Macro UNICALC(value)
  (value)*2
 EndMacro
Should be

Code: Select all

Macro UNICALC(value)
  (value << 1)
 EndMacro
Shift for speed and brackets outside for safety.


Also

Code: Select all

*newaddr = AllocateMemory(strlen+2)
can be replaced with

Code: Select all

*newaddr = AllocateMemory(strlen+SizeOf(Character))
Or a CompilerIf which safes a + calculation.

And I think UTF8 will result in trouble.

Bernd
Last edited by infratec on Sat Mar 18, 2017 11:21 pm, edited 1 time in total.
User avatar
Keya
Addict
Addict
Posts: 1891
Joined: Thu Jun 04, 2015 7:10 am

Re: BSTR* fast dynamic string datatype

Post by Keya »

infratec wrote:And I think UTF8 will result in trouble.
Bernd
why? no problem determining its length via StringByteLength
infratec
Always Here
Always Here
Posts: 6874
Joined: Sun Sep 07, 2008 12:45 pm
Location: Germany

Re: BSTR* fast dynamic string datatype

Post by infratec »

bMid() and UTF8 :?:

The startposition is wrong if an UTF8 character is before the startposition.

I think the only way to solve this is to use always Unicode internally.
User avatar
Keya
Addict
Addict
Posts: 1891
Joined: Thu Jun 04, 2015 7:10 am

Re: BSTR* fast dynamic string datatype

Post by Keya »

ahh good thinking! im not getting a problem showing up in a test though? this seems fine on my XP in both Asc and Uni, with bMid giving the same output as Mid

Code: Select all

XIncludeFile("BSTR.pbi")

Define *bstr1.BSTR, normal$   ;a BString and a normal String

normal$ = "1234"+Chr(10349)+"ABCD"   ;includes exotic UTF8 char
*bstr1 = BSTR(0,normal$)
;  31 00 32 00 33 00 34 00 6D_28   1.2.3.4.m(
;  41 00 42 00 43 00 44 00 00 00   A.B.C.D...

Debug "Full: " + normal$
Debug "Full: " + *bstr1\s

Debug "Mid: " + Mid(normal$, 3, 3)
Debug "Mid: " + bMid(*bstr1, 3, 3)
Debug "Mid: " + Mid(normal$, 4, 3)
Debug "Mid: " + bMid(*bstr1, 4, 3)
Debug "Mid: " + Mid(normal$, 5, 3)
Debug "Mid: " + bMid(*bstr1, 5, 3)
The PeekS internal to bMid uses the string format field:

Code: Select all

CompilerIf #PB_Compiler_Unicode
  #PB_StringFormat = #PB_Unicode
CompilerElse
  #PB_StringFormat = #PB_Ascii
CompilerEndIf 

Procedure.s bMid(*bstr.BSTR, startpos, length=0, format=#PB_StringFormat)
  If length<=0: length=*bstr\size: EndIf
  ProcedureReturn PeekS(@*bstr\s + UNICALC(startpos)-SizeOf(Character), length, format)
EndProcedure
User avatar
nco2k
Addict
Addict
Posts: 1344
Joined: Mon Sep 15, 2003 5:55 am

Re: BSTR* fast dynamic string datatype

Post by nco2k »

>> This v0.1 is just me thinking out loud
fair enough. :D

>> In most other universes i correctly typed *bstr\size
actually Peek/Poke can be faster than using a structure sometimes. looks like pb can optimize those calls quiet a bit. at least that used to be the case long time ago. worth trying out both and see whats better. although i would prefer structures because of their readability, but thats just me.

>> seems problematic

Code: Select all

;something like this

*Test1 = @"ABC"
*Test2 = @"DEF"

*String.String = @*Test1
Debug *String\s

*String.String = @*Test2
Debug *String\s

;its also compatible with regular string functions

Debug Left(*String\s, 1)
Debug Mid(*String\s, 2, 1)
Debug Right(*String\s, 1)

;then you can trim it, by placing a nullbyte

*Character.Character = @*String\s
*Character + SizeOf(Character) + SizeOf(Character)
*Character\c = 0
Debug *String\s
>> Sounds good but can you provide an example
no, i meant functions like SysAllocString_() that return a bstring:

Code: Select all

*String = SysAllocString_("ABC")
Debug PeekS(*String, -1, #PB_Unicode)
Debug PeekL(*String - SizeOf(Long))
you said:
Keya wrote:with a true BSTR you would have to do Messagebox(@*bstr+4)
which is not the case:
MSDN wrote:A BSTR is a pointer. The pointer points to the first character of the data string, not to the length prefix.
so dont worry about it. ;)

https://msdn.microsoft.com/de-de/librar ... s.85).aspx

>> Yes i have to be careful to consider which existing PB native string functions scan the string
everything that goes right-to-left. :D

c ya,
nco2k
If OSVersion() = #PB_OS_Windows_ME : End : EndIf
User avatar
Keya
Addict
Addict
Posts: 1891
Joined: Thu Jun 04, 2015 7:10 am

Re: BSTR* fast dynamic string datatype

Post by Keya »

nco2k do you mean simply switching the order of Size and StringPtr in the structure, so that StringPtr is first? I think the only reason i put the size field first was because i thought BSTR was [Size][Buffer of variable size...] so you would always want Size first with that format, but in my case it's just [Size][Pointer to buffer], so there probably is no need for Size to go first, it would probably make sense to put the String first in v0.2

btw im avoiding SysAllocString_("ABC") at least for now as its not all-OS
Post Reply