I'm currently working with LZW compression and I've reached a point where I have my algorithms producing correct results with the full spectrum of test data. This comes after some several days of fighting 'to the death' with it, I don't mind admitting. However, in order to get here I've deferred doing a couple of things 'the right way' and just wrote something that I knew to be crap-on-a-stick but could be overhauled and optimized later. Now it's later! Here's what I did:
The objective
You have to produce a stream of codes. Each code can hold a value from 0 - 0xFFF and there is no 'leading waste' on it. For example, '0001' is the same value as '1' but contains wasted digits. You start with a 'minimum code size' which can vary from 3 to 9 bits and as you progress you raise your code size to accomodate larger values. At some, none or several points conditions will occur where your next code size would be 13 bits long and then you clear out and reinitialize your codemap, write out a CLR code at your current 12-bit size and go back to using your minimum code size. It will continue to grow from there again and on and on until you're done. When all the codes are in the stream they must be written out as packed bytes. This is the part I wanted to 'just do the simplest way' until I had the algorithms working and then revisit it. So here's what I have right now:
Linked list of codes represented here as binary strings, string length is code size:
'0101'
'0010'
'0111'
'1111'
'01010'
'01101'
'11111'
'011011'
...[]
I'm just concatenating the strings like this:
codestream$ = RSet(Bin(CodeTable(ib),#PB_Long),codesize,"0") + codestream$
which for the values shown produces this:
0110111111101101010101111011100100101
and then I make the length a multiple of 8 like this:
newsize = Len(codestream$)+(8-(Len(codestream$)%8))
codestream$ = RSet(codestream$, newsize, "0")
which results in this:
0000110111111101101010101111011100100101
and then I produce my final list of bytes to be written:
NewList codesout.a()
For i=Len(codestream$)-7 To 1 Step -8
AddElement(codesout())
codesout() = Val("%"+Mid(codestream$,i,8))
Next
and now my codestream is all packed into bytes and this is the list:
25
F7
AA
FD
0D
These are the right bytes and after a long fight they make me very happy. But string operations as an approach to this task are no solution for a final product, I'm sure you'll agree! Especially as real-world usage of this produces strings many thousands of chars long. So - working with numbers instead of strings, using any or all techniques such as bitshifting, masking with binary AND OR XOR NOT etc., what's a good fast way to do this? ASM solutions are fine, but if you're going to reply with asm please explain how it works so the rest of us can understand it.
I hope I've described the task adequately, if not just ask. Thanks in advance for any help you can offer, it's appreciated!
Once this is finished KCC can have his Gif animator



