Page 1 of 1

Decompressing data to a string array on the fly...

Posted: Sun Sep 02, 2018 11:37 am
by Michael Vogel
I'd like to get around 1.000.000 words into a string array (or list) as fast as possible - and it would be fine to keep the resulting program as small as possible.

Actually a text file (20MB) containing a list of words will be taken by ReadString into a string array. There are two things which aren't that fine - reading the words take about a second and the data file is not very compact, the zipped size is much smaller.

Now I think about preparing a compressed datasection which could be decompressed to memory, but then I would also need to keep a million offsets (and length) values which needs again a lot of space.

I don't believe there's another option, like GetStringArrayDataFromMemory(?Start,Words(),ArraySize(Words())) - but maybe someone has another idea, what could be done here...

Re: Decompressing data to a string array on the fly...

Posted: Sun Sep 02, 2018 11:55 am
by NicTheQuick
If every word in that memory section ends with a null byte you can simply use PeekS() on every word to peek it from the decompressed memory section. So you do not need to store lengths and offsets of each word in your list. Simply go from null byte to null byte and peek the words in between.

And if you are familiar with pointers you do not even have to peek the strings, just set the pointer to the beginning of every word. Of course your words have to be encoded in unicode after decompressing. For example like this:

Code: Select all

Procedure initWordsArray(Array words.String(1), *memory)
	Protected *char.Character = *memory
	Protected cWords.i = ArraySize(words()) + 1
	Protected i.i = 0
	
	While i < cWords
		PokeI(@words(i), *char)
		
		*char + (MemoryStringLength(*char) + 1) * SizeOf(Character)
		i + 1
	Wend
	
	
EndProcedure

Dim words.String(2)

initWordsArray(words(), ?words)

Debug words(0)\s
Debug words(1)\s
Debug words(2)\s

DataSection
	words: ; Imagine this is your decompressed memory block
		Data.s "word1", "word2", "word3"
EndDataSection

Re: Decompressing data to a string array on the fly...

Posted: Sun Sep 02, 2018 11:55 am
by useful
For what, what will you do with him next?
Can cache access to data on external media effectively enough, such as
http://minimdb.com/minimono.html
or
SqLite-(:memory:) cache(for frequently used) + SqLite-(HDD)(for rarely used) ?

Re: Decompressing data to a string array on the fly...

Posted: Sun Sep 02, 2018 12:07 pm
by Olliv
1) What is the final access ? Read only or not ?
2) Do each word be different of all the other one ?

Re: Decompressing data to a string array on the fly...

Posted: Sun Sep 02, 2018 1:46 pm
by Michael Vogel
Oh, what great ideas!
As I only need to read all words and they are sorted alphabetically, it may take muh less memory if I remove the leading chars which are identical to its predecessor.

Instead of AAA [0] AAAB [0] AABC [0] BAAA...
...it would be AAA [3] B [2] BC [0] BAAA...

I need to check that now - hey, using the following simple method, the words need only 2MB now :lol:

Code: Select all

Procedure ident(*a.Character,*b.Character)
	
	Protected i
	
	While *a\c=*b\c
		i+1
		*a+SizeOf(Character)
		*b+SizeOf(Character)
	Wend
	
	ProcedureReturn i
	
EndProcedure

If Init(1)

	;Debug word(1)
	z=Len(word(1))
	For i=2 To #Words
		n=ident(@word(i-1),@word(i))
		;Debug word(i)+", "+Str(n)
		z+Len(word(i))-n
	Next i
	Debug z+#Words
	
EndIf

Re: Decompressing data to a string array on the fly...

Posted: Sun Sep 02, 2018 3:16 pm
by useful
It was invented over 50 years ago. :D
https://en.wikipedia.org/wiki/MUMPS