Decompressing data to a string array on the fly...

Just starting out? Need help? Post your questions and find answers here.
User avatar
Michael Vogel
Addict
Addict
Posts: 2797
Joined: Thu Feb 09, 2006 11:27 pm
Contact:

Decompressing data to a string array on the fly...

Post by Michael Vogel »

I'd like to get around 1.000.000 words into a string array (or list) as fast as possible - and it would be fine to keep the resulting program as small as possible.

Actually a text file (20MB) containing a list of words will be taken by ReadString into a string array. There are two things which aren't that fine - reading the words take about a second and the data file is not very compact, the zipped size is much smaller.

Now I think about preparing a compressed datasection which could be decompressed to memory, but then I would also need to keep a million offsets (and length) values which needs again a lot of space.

I don't believe there's another option, like GetStringArrayDataFromMemory(?Start,Words(),ArraySize(Words())) - but maybe someone has another idea, what could be done here...
User avatar
NicTheQuick
Addict
Addict
Posts: 1504
Joined: Sun Jun 22, 2003 7:43 pm
Location: Germany, Saarbrücken
Contact:

Re: Decompressing data to a string array on the fly...

Post by NicTheQuick »

If every word in that memory section ends with a null byte you can simply use PeekS() on every word to peek it from the decompressed memory section. So you do not need to store lengths and offsets of each word in your list. Simply go from null byte to null byte and peek the words in between.

And if you are familiar with pointers you do not even have to peek the strings, just set the pointer to the beginning of every word. Of course your words have to be encoded in unicode after decompressing. For example like this:

Code: Select all

Procedure initWordsArray(Array words.String(1), *memory)
	Protected *char.Character = *memory
	Protected cWords.i = ArraySize(words()) + 1
	Protected i.i = 0
	
	While i < cWords
		PokeI(@words(i), *char)
		
		*char + (MemoryStringLength(*char) + 1) * SizeOf(Character)
		i + 1
	Wend
	
	
EndProcedure

Dim words.String(2)

initWordsArray(words(), ?words)

Debug words(0)\s
Debug words(1)\s
Debug words(2)\s

DataSection
	words: ; Imagine this is your decompressed memory block
		Data.s "word1", "word2", "word3"
EndDataSection
Last edited by NicTheQuick on Sun Sep 02, 2018 11:55 am, edited 1 time in total.
The english grammar is freeware, you can use it freely - But it's not Open Source, i.e. you can not change it or publish it in altered way.
User avatar
useful
Enthusiast
Enthusiast
Posts: 402
Joined: Fri Jul 19, 2013 7:36 am

Re: Decompressing data to a string array on the fly...

Post by useful »

For what, what will you do with him next?
Can cache access to data on external media effectively enough, such as
http://minimdb.com/minimono.html
or
SqLite-(:memory:) cache(for frequently used) + SqLite-(HDD)(for rarely used) ?
Dawn will come inevitably.
User avatar
Olliv
Enthusiast
Enthusiast
Posts: 542
Joined: Tue Sep 22, 2009 10:41 pm

Re: Decompressing data to a string array on the fly...

Post by Olliv »

1) What is the final access ? Read only or not ?
2) Do each word be different of all the other one ?
User avatar
Michael Vogel
Addict
Addict
Posts: 2797
Joined: Thu Feb 09, 2006 11:27 pm
Contact:

Re: Decompressing data to a string array on the fly...

Post by Michael Vogel »

Oh, what great ideas!
As I only need to read all words and they are sorted alphabetically, it may take muh less memory if I remove the leading chars which are identical to its predecessor.

Instead of AAA [0] AAAB [0] AABC [0] BAAA...
...it would be AAA [3] B [2] BC [0] BAAA...

I need to check that now - hey, using the following simple method, the words need only 2MB now :lol:

Code: Select all

Procedure ident(*a.Character,*b.Character)
	
	Protected i
	
	While *a\c=*b\c
		i+1
		*a+SizeOf(Character)
		*b+SizeOf(Character)
	Wend
	
	ProcedureReturn i
	
EndProcedure

If Init(1)

	;Debug word(1)
	z=Len(word(1))
	For i=2 To #Words
		n=ident(@word(i-1),@word(i))
		;Debug word(i)+", "+Str(n)
		z+Len(word(i))-n
	Next i
	Debug z+#Words
	
EndIf
User avatar
useful
Enthusiast
Enthusiast
Posts: 402
Joined: Fri Jul 19, 2013 7:36 am

Re: Decompressing data to a string array on the fly...

Post by useful »

It was invented over 50 years ago. :D
https://en.wikipedia.org/wiki/MUMPS
Dawn will come inevitably.
Post Reply