Using strings larger than 64K

Share your advanced PureBasic knowledge/code with the community.
PB
PureBasic Expert
PureBasic Expert
Posts: 7581
Joined: Fri Apr 25, 2003 5:24 pm

Using strings larger than 64K

Post by PB »

Code updated for 5.20+ (According to PureBasic v4.00 documentation. Added: Unlimited length strings in both ascii and unicode mode)

Currently, PureBasic's strings are limited to a maximum size of 64K, but
there are times when you wish they could be bigger. So here is a tip by
Fred from another thread, and re-posted here (with a few modifications)
since it's so useful and shouldn't be overlooked by those who need it. :)
Just call it at the start of your app before you use any string code. Don't
make the size too big, because the bigger the buffer, the slower your
string routines will perform.

Code: Select all

; PureBasic can now handle strings of virtually large sizes
; and is no longer constrained by this previous 64K limit.

; Assign a string to be exactly 1048576 bytes (1 MB).
a$ = "s" + Space(1048576) + "e"

; Prove it by showing Len and start/end chars of string.
Debug Str(Len(a$)) + " " + Left(a$, 1) + " " + Right(a$, 1)
UPDATE: DEU.exe reports that this tip now works, see here:
viewtopic.php?t=13271

Thus, all "bugs" listed below in this thread are no longer relevant. :)
Last edited by PB on Mon Feb 28, 2005 5:10 am, edited 3 times in total.
Kale
PureBasic Expert
PureBasic Expert
Posts: 3000
Joined: Fri Apr 25, 2003 6:03 pm
Location: Lincoln, UK
Contact:

Post by Kale »

Code: Select all

Debug PeekS(@a$)
Doesn't seem to work :? the first character doesn't show.
--Kale

Image
PB
PureBasic Expert
PureBasic Expert
Posts: 7581
Joined: Fri Apr 25, 2003 5:24 pm

Post by PB »

I assume it's a bug with PeekS then, perhaps because Fred never expected
to tell us how to use >64K strings. In any case, this tip has been working
fine for me over the last week or two. You can see by my Debug line that
the 1 MB string is intact and correct. :)

To test it, open a large text file (say 500K), copy all the text, then do
a$=UCase(GetClipboardText()) and SetClipboardText(a$). Then paste
the text back into Notepad, and you'll see that the entire 500K of a$
will be in upper case, which proves that this tip definitely works.
Dare2
Moderator
Moderator
Posts: 3321
Joined: Sat Dec 27, 2003 3:55 am
Location: Great Southern Land

Post by Dare2 »

Maybe debug can't handle strings that size.

After rtn, try

r$=PeekS(@a$)
Debug Str(Len(r$))+" "+Left(r$,1)+" "+Right(r$,1)

which shows the string is okay.
User avatar
Danilo
Addict
Addict
Posts: 3036
Joined: Sat Apr 26, 2003 8:26 am
Location: Planet Earth

Re: Using strings larger than 64K

Post by Danilo »

PB wrote:Don't make the size too big, because the bigger the buffer,
the slower your string routines will perform.
I dont think thats correct. The speed has nothing to do with
the string buffer size IMO, but with the actual length of the string.
If you set the buffer to 20MB and use 10-char-strings only, its
not slower. Its slower if you use 20MB strings, yes... logical :D
PB wrote:

Code: Select all

; Now assign a string to be exactly 1048576 bytes (1 MB). 
a$="s"+Space(1048574)+"e"
Thats not correct because the string buffer includes the ending 0 (ASCIIZ).
If you set the string buffer to 1.000.000, you can only use strings
with size 999.999.
If you use bigger sizes, other memory gets overwritten and result
is undefined. Sometimes it crashes, sometimes it works - but still
other memory got overwritten by your code sample.

Anyway, this trick doesnt work with some functions:

Code: Select all

Procedure SetStringManipulationBufferSize(Bytes) 
  PBStringBase.l = 0 
  PBMemoryBase.l = 0 
  !MOV eax, dword [PB_StringBase] 
  !MOV [esp+4],eax 
  !MOV eax, dword [PB_MemoryBase] 
  !MOV [esp+8],eax 
  HeapReAlloc_(PBMemoryBase, #GMEM_ZEROINIT, PBStringBase, Bytes) 
  !MOV dword [_PB_StringBase],eax 
EndProcedure 

; Set the buffer size for all strings to 1 MB. 
SetStringManipulationBufferSize(1048576)
A$ = Space(1000000)+"abc  d "


; CRASH 1 - ReplaceString()
A$ = ReplaceString(A$,"abc","def")


; CRASH 2 - RemoveString()
A$ = RemoveString(A$,"abc")


; DOESNT WORK - PeekS() (or debugger?)
mem = AllocateMemory(1,1000000,0)
*mem.LONG = mem
For a = 1 To 1000000/4
  *mem\l = '4321'
  *mem + 4
Next a

Debug PeekS(mem,$FFFF) ; 65k max, works
Debug PeekS(mem)       ; doesnt return a string
Debug PeekS(mem,66000) ; doesnt return a string
I have seen other things that dont work correctly with this
system, but cant remember anymore atm (i think it was together
with EditorGadget or another Gadget and GetGadgetText() function).

Did somebody try ReadString() and WriteStringN() with big strings,
lets say 1MB? I dont want to test this... :lol:
cya,
...Danilo
...:-=< http://codedan.net/work >=-:...
-= FaceBook.com/DaniloKrahn =-
PB
PureBasic Expert
PureBasic Expert
Posts: 7581
Joined: Fri Apr 25, 2003 5:24 pm

Re: Using strings larger than 64K

Post by PB »

> Its slower if you use 20MB strings, yes

That's what I meant, of course. :)

> this trick doesnt work with some functions

Fair enough -- I didn't test it with all String functions, but it seems okay
with most of them. Besides, the two crashes you showed can easily be
replaced with Procedures to get them working again, and the benefit of
being able to use long strings would be worth it (IMO).

Anyway, it's Fred's tip -- not mine -- I was just posting it here so it's
easier to find in the forums. :P
User avatar
Danilo
Addict
Addict
Posts: 3036
Joined: Sat Apr 26, 2003 8:26 am
Location: Planet Earth

Re: Using strings larger than 64K

Post by Danilo »

PB wrote:Besides, the two crashes you showed can easily be replaced
with Procedures to get them working again, and the benefit of
being able to use long strings would be worth it (IMO).
IMO Fred should fix the 2 or 3 functions that dont work
with large strings and add SetStringBufferSize() natively
into PureBasic.
cya,
...Danilo
...:-=< http://codedan.net/work >=-:...
-= FaceBook.com/DaniloKrahn =-
techjunkie
Addict
Addict
Posts: 1126
Joined: Wed Oct 15, 2003 12:40 am
Location: Sweden
Contact:

Re: Using strings larger than 64K

Post by techjunkie »

Danilo wrote:IMO Fred should fix the 2 or 3 functions that dont work
with large strings and add SetStringBufferSize() natively
into PureBasic.
Couldn't agree more... It's a "pain in the ***" with the 64k limit...
Image
(\__/)
(='.'=) This is Bunny. Copy and paste Bunny into your
(")_(") signature to help him gain world domination.
ricardo
Addict
Addict
Posts: 2438
Joined: Fri Apr 25, 2003 7:06 pm
Location: Argentina

Post by ricardo »

One question:

Settiung a buffer size of one mega, but using strings smaller (because i cant know the real size of the strings but im sure it will be smaller) does generate any secondary problem?

Slow the app? Make it use more RAM or anything?
ARGENTINA WORLD CHAMPION
freak
PureBasic Team
PureBasic Team
Posts: 5940
Joined: Fri Apr 25, 2003 5:21 pm
Location: Germany

Post by freak »

Only the buffer for string operations is changed here. When each string is
stored in memory, they only use up as much memory as their length is.

So the memory usage only increases by the size you set the buffer, nothing
more. There should also be no speed differences.

Timo
quidquid Latine dictum sit altum videtur
Kris_a
User
User
Posts: 92
Joined: Sun Feb 15, 2004 8:04 pm
Location: Manchester, UK

Post by Kris_a »

Why did Fred (or whoever it was) use a Word for strings, rather than Long, I wonder? :roll: (seems a bit pointless since a long would probably be faster to process, too)
freak
PureBasic Team
PureBasic Team
Posts: 5940
Joined: Fri Apr 25, 2003 5:21 pm
Location: Germany

Post by freak »

This limit has nothing to do with a word limit.
It is the limit of the string operation buffer, which is statically allocated.
quidquid Latine dictum sit altum videtur
GPI
PureBasic Expert
PureBasic Expert
Posts: 1394
Joined: Fri Apr 25, 2003 6:41 pm

Re: Using strings larger than 64K

Post by GPI »

techjunkie wrote:Couldn't agree more... It's a "pain in the ***" with the 64k limit...
Thats funny. I have never problem with the size of the string-buffer. I think, it is more than big engouth (maybe to big for some cases).

A string is for me a small sentence with some characters. When i need a long text, i use AllocateMemory()...

But a Compiler-Command (not a function!) or a option in the compile-options of the IDE would be nice.
PB
PureBasic Expert
PureBasic Expert
Posts: 7581
Joined: Fri Apr 25, 2003 5:24 pm

Re: Using strings larger than 64K

Post by PB »

> I have never problem with the size of the string-buffer

If you ever use the GetClipboardText command, you run the risk of your
app crashing if the clipboard holds >64K. You never know what the user
has put into the clipboard, and having to set up a memory buffer to deal
with it isn't very Basic-like. Same applies to ANY string commands where
you don't know the size of the string (ReadString, etc) -- are we supposed
to stop using strings totally and start using memory buffers instead? :)
User avatar
NoahPhense
Addict
Addict
Posts: 1999
Joined: Thu Oct 16, 2003 8:30 pm
Location: North Florida

Re: Using strings larger than 64K

Post by NoahPhense »

PB wrote:> I have never problem with the size of the string-buffer

If you ever use the GetClipboardText command, you run the risk of your
app crashing if the clipboard holds >64K. You never know what the user
has put into the clipboard, and having to set up a memory buffer to deal
with it isn't very Basic-like. Same applies to ANY string commands where
you don't know the size of the string (ReadString, etc) -- are we supposed
to stop using strings totally and start using memory buffers instead? :)
But isn't a string just a memory buffer anyhow. An array of characters --
if you will.. ;)

- np
Locked