I'd like to get around 1.000.000 words into a string array (or list) as fast as possible - and it would be fine to keep the resulting program as small as possible.
Actually a text file (20MB) containing a list of words will be taken by ReadString into a string array. There are two things which aren't that fine - reading the words take about a second and the data file is not very compact, the zipped size is much smaller.
Now I think about preparing a compressed datasection which could be decompressed to memory, but then I would also need to keep a million offsets (and length) values which needs again a lot of space.
I don't believe there's another option, like GetStringArrayDataFromMemory(?Start,Words(),ArraySize(Words())) - but maybe someone has another idea, what could be done here...
Decompressing data to a string array on the fly...
- Michael Vogel
- Addict
- Posts: 2797
- Joined: Thu Feb 09, 2006 11:27 pm
- Contact:
- NicTheQuick
- Addict
- Posts: 1504
- Joined: Sun Jun 22, 2003 7:43 pm
- Location: Germany, Saarbrücken
- Contact:
Re: Decompressing data to a string array on the fly...
If every word in that memory section ends with a null byte you can simply use PeekS() on every word to peek it from the decompressed memory section. So you do not need to store lengths and offsets of each word in your list. Simply go from null byte to null byte and peek the words in between.
And if you are familiar with pointers you do not even have to peek the strings, just set the pointer to the beginning of every word. Of course your words have to be encoded in unicode after decompressing. For example like this:
And if you are familiar with pointers you do not even have to peek the strings, just set the pointer to the beginning of every word. Of course your words have to be encoded in unicode after decompressing. For example like this:
Code: Select all
Procedure initWordsArray(Array words.String(1), *memory)
Protected *char.Character = *memory
Protected cWords.i = ArraySize(words()) + 1
Protected i.i = 0
While i < cWords
PokeI(@words(i), *char)
*char + (MemoryStringLength(*char) + 1) * SizeOf(Character)
i + 1
Wend
EndProcedure
Dim words.String(2)
initWordsArray(words(), ?words)
Debug words(0)\s
Debug words(1)\s
Debug words(2)\s
DataSection
words: ; Imagine this is your decompressed memory block
Data.s "word1", "word2", "word3"
EndDataSection
Last edited by NicTheQuick on Sun Sep 02, 2018 11:55 am, edited 1 time in total.
The english grammar is freeware, you can use it freely - But it's not Open Source, i.e. you can not change it or publish it in altered way.
Re: Decompressing data to a string array on the fly...
For what, what will you do with him next?
Can cache access to data on external media effectively enough, such as
http://minimdb.com/minimono.html
or
SqLite-(:memory:) cache(for frequently used) + SqLite-(HDD)(for rarely used) ?
Can cache access to data on external media effectively enough, such as
http://minimdb.com/minimono.html
or
SqLite-(:memory:) cache(for frequently used) + SqLite-(HDD)(for rarely used) ?
Dawn will come inevitably.
Re: Decompressing data to a string array on the fly...
1) What is the final access ? Read only or not ?
2) Do each word be different of all the other one ?
2) Do each word be different of all the other one ?
- Michael Vogel
- Addict
- Posts: 2797
- Joined: Thu Feb 09, 2006 11:27 pm
- Contact:
Re: Decompressing data to a string array on the fly...
Oh, what great ideas!
As I only need to read all words and they are sorted alphabetically, it may take muh less memory if I remove the leading chars which are identical to its predecessor.
Instead of AAA [0] AAAB [0] AABC [0] BAAA...
...it would be AAA [3] B [2] BC [0] BAAA...
I need to check that now - hey, using the following simple method, the words need only 2MB now
As I only need to read all words and they are sorted alphabetically, it may take muh less memory if I remove the leading chars which are identical to its predecessor.
Instead of AAA [0] AAAB [0] AABC [0] BAAA...
...it would be AAA [3] B [2] BC [0] BAAA...
I need to check that now - hey, using the following simple method, the words need only 2MB now

Code: Select all
Procedure ident(*a.Character,*b.Character)
Protected i
While *a\c=*b\c
i+1
*a+SizeOf(Character)
*b+SizeOf(Character)
Wend
ProcedureReturn i
EndProcedure
If Init(1)
;Debug word(1)
z=Len(word(1))
For i=2 To #Words
n=ident(@word(i-1),@word(i))
;Debug word(i)+", "+Str(n)
z+Len(word(i))-n
Next i
Debug z+#Words
EndIf
Re: Decompressing data to a string array on the fly...
Dawn will come inevitably.