Page 1 of 5

Packing bytes: a little help from my friends

Posted: Wed Nov 13, 2013 8:52 pm
by netmaestro
Are you looking for a nice little challenge to occupy your mind for a while?

I'm currently working with LZW compression and I've reached a point where I have my algorithms producing correct results with the full spectrum of test data. This comes after some several days of fighting 'to the death' with it, I don't mind admitting. However, in order to get here I've deferred doing a couple of things 'the right way' and just wrote something that I knew to be crap-on-a-stick but could be overhauled and optimized later. Now it's later! Here's what I did:

The objective
You have to produce a stream of codes. Each code can hold a value from 0 - 0xFFF and there is no 'leading waste' on it. For example, '0001' is the same value as '1' but contains wasted digits. You start with a 'minimum code size' which can vary from 3 to 9 bits and as you progress you raise your code size to accomodate larger values. At some, none or several points conditions will occur where your next code size would be 13 bits long and then you clear out and reinitialize your codemap, write out a CLR code at your current 12-bit size and go back to using your minimum code size. It will continue to grow from there again and on and on until you're done. When all the codes are in the stream they must be written out as packed bytes. This is the part I wanted to 'just do the simplest way' until I had the algorithms working and then revisit it. So here's what I have right now:

Linked list of codes represented here as binary strings, string length is code size:

'0101'
'0010'
'0111'
'1111'
'01010'
'01101'
'11111'
'011011'
...[]

I'm just concatenating the strings like this:

codestream$ = RSet(Bin(CodeTable(ib),#PB_Long),codesize,"0") + codestream$

which for the values shown produces this:

0110111111101101010101111011100100101

and then I make the length a multiple of 8 like this:

newsize = Len(codestream$)+(8-(Len(codestream$)%8))
codestream$ = RSet(codestream$, newsize, "0")

which results in this:

0000110111111101101010101111011100100101

and then I produce my final list of bytes to be written:

NewList codesout.a()
For i=Len(codestream$)-7 To 1 Step -8
AddElement(codesout())
codesout() = Val("%"+Mid(codestream$,i,8))
Next

and now my codestream is all packed into bytes and this is the list:

25
F7
AA
FD
0D

These are the right bytes and after a long fight they make me very happy. But string operations as an approach to this task are no solution for a final product, I'm sure you'll agree! Especially as real-world usage of this produces strings many thousands of chars long. So - working with numbers instead of strings, using any or all techniques such as bitshifting, masking with binary AND OR XOR NOT etc., what's a good fast way to do this? ASM solutions are fine, but if you're going to reply with asm please explain how it works so the rest of us can understand it.

I hope I've described the task adequately, if not just ask. Thanks in advance for any help you can offer, it's appreciated!

Once this is finished KCC can have his Gif animator :mrgreen:

Re: Packing bytes: a little help from my friends

Posted: Wed Nov 13, 2013 10:22 pm
by davido
Hi netmaestro,

Could you do something like this:

Code: Select all

Sum = 0

A = %0101
B = %0010
C = %0111
D = %1111
E = %01010
F = %01101
G = %11111
H = %011011

Sum = H
Sum << 5
Sum + G
Sum << 5
Sum + F
Sum << 5
Sum + E
Sum << 4
Sum + D
Sum << 4
Sum + C
Sum << 4
Sum + B
Sum << 4
Sum + A 
I am assuming that you know the length of each 'code'

Looks a bit simplistic; hope I'm not wasting you time.

Re: Packing bytes: a little help from my friends

Posted: Wed Nov 13, 2013 10:27 pm
by netmaestro
Looks a bit simplistic
Your logic is good. It would work pretty much as written if I had a datatype of unlimited size for Sum. As the largest native type we have is 64 bits long though, it will need a bit more complexity. But that's the idea for sure, thanks for your response.

Re: Packing bytes: a little help from my friends

Posted: Thu Nov 14, 2013 12:06 am
by infratec
Hi,

maybe something like this:

Code: Select all

Structure DataStruc
  Bits.i
  Pattern.i
EndStructure

Dim DataArray.DataStruc(7)

DataArray(0)\Bits = 4
DataArray(0)\Pattern = %0101

DataArray(1)\Bits = 4
DataArray(1)\Pattern = %0010

DataArray(2)\Bits = 4
DataArray(2)\Pattern = %0111

DataArray(3)\Bits = 4
DataArray(3)\Pattern = %1111

DataArray(4)\Bits = 5
DataArray(4)\Pattern = %01010

DataArray(5)\Bits = 5
DataArray(5)\Pattern = %01101

DataArray(6)\Bits = 5
DataArray(6)\Pattern = %11111

DataArray(7)\Bits = 6
DataArray(7)\Pattern = %011011

Sum = 0
Bits = 0

*Buffer = AllocateMemory(100)
Ptr = 0

For i = 0 To 7
  Sum | (DataArray(i)\Pattern << Bits)
 
  Debug "S:" + Bin(Sum)
 
  Bits + DataArray(i)\Bits
 
  If Bits > 7
    PokeA(*Buffer + Ptr, Sum & $FF)
    Debug "B:" + Bin(Sum & $FF, #PB_Byte)
    Sum >> 8
    Debug "NS:" + Bin(Sum)
    Bits - 8
    Ptr + 1
  EndIf
 
Next i

If Bits
  PokeA(*Buffer + Ptr, Sum & $FF)
EndIf

ShowMemoryViewer(*Buffer, Ptr + 1)
CallDebugger
Bernd

Re: Packing bytes: a little help from my friends

Posted: Thu Nov 14, 2013 12:27 am
by netmaestro
Thanks for helping! It looks promising but the memory is showing as:

DB 6F ED 4F 0A

where the desired output is:

25 F7 AA FD 0D

If those were lottery numbers we'd have to keep working for another week :mrgreen: I'll try and see what's happening :?

Re: Packing bytes: a little help from my friends

Posted: Thu Nov 14, 2013 2:01 am
by idle
this works
0110111111101101010101111011100100101
25 F7 AA FD 0D

Code: Select all

Macro Setbits(buf,val,bitcount) 
   *tb.word 
   *tb = buf + ((pos) >> 3) 
   *tb\w | (val << (pos % 8)) 
    pos + bitcount 
EndMacro 

Global *buf,pos 
*buf = AllocateMemory(16) 

Setbits(*buf,%0101,4)
Setbits(*buf,%0010,4)
Setbits(*buf,%0111,4)
Setbits(*buf,%1111,4)
Setbits(*buf,%01010,5)
Setbits(*buf,%01101,5)
Setbits(*buf,%11111,5)
Setbits(*buf,%011011,6) 

Debug RSet(Bin(PeekQ(*buf),#PB_Quad),pos,"0") 

Global output.s 
For a = 0 To 4 
   output + RSet(Hex(PeekA(*buf+a),#PB_Ascii),2,"0") + " " 
Next    

Debug output 


Re: Packing bytes: a little help from my friends

Posted: Thu Nov 14, 2013 2:43 am
by netmaestro
Seems perfect, Idle! Now to plug it into the algorithm and see how it works with live data. I'm building a structured linked list with the codes and running it like this:

Code: Select all

Macro Setbits(buf,val,bitcount) 
   *tb.word 
   *tb = buf + ((pos) >> 3) 
   *tb\w | (val << (pos % 8)) 
    pos + bitcount 
EndMacro 

Global *buf,pos 
*buf = AllocateMemory(16) 

Structure code
  code.l
  size.l
EndStructure

NewList codes.code()

AddElement(codes())
With codes()
  \size = 4
  \code = %0101
EndWith

AddElement(codes())
With codes()
  \size = 4
  \code = %0010
EndWith

AddElement(codes())
With codes()
  \size = 4
  \code = %0111
EndWith

AddElement(codes())
With codes()
  \size = 4
  \code = %1111
EndWith

AddElement(codes())
With codes()
  \size = 5
  \code = %01101
EndWith

AddElement(codes())
With codes()
  \size = 5
  \code = %01101
EndWith

AddElement(codes())
With codes()
  \size = 5
  \code = %11111
EndWith

AddElement(codes())
With codes()
  \size = 6
  \code = %011011
EndWith

ForEach codes()
  Setbits(*buf,codes()\code,codes()\size)
Next

ShowMemoryViewer(*buf, 16)
CallDebugger
The compressor will spit out the codes with their sizes, then I can count through the list for a total of the sizes, byte-align it, allocate a memory block and run your routine to fill it. I'll let you know how it goes. If it works it'll be at least fifty times faster than what I have now :D

Re: Packing bytes: a little help from my friends

Posted: Thu Nov 14, 2013 5:19 am
by netmaestro
It's failing after several hundred codes. Once it fails you'd think every code after that would be wrong but that isn't the case. In the smaller tests it passes but once you get over 700 or so (it's not at the same place every time) it throws in a wrong one and the next hundred or so might be right and then there's another bad one. Is the macro designed to handle code sizes from 3 to 12?

I'm working on figuring out why this is happening, I suspect it might be failing when the code sizes grow past a certain point.

Re: Packing bytes: a little help from my friends

Posted: Thu Nov 14, 2013 5:57 am
by idle
I tested it up to 12 bits but I changed the pointer to a long

Code: Select all

Macro Setbits(buf,val,bitcount) 
   *tb.long 
   *tb = buf + ((pos) >> 3) 
   *tb\l | (val << (pos % 8)) 
    pos + bitcount 
EndMacro 


Re: Packing bytes: a little help from my friends

Posted: Thu Nov 14, 2013 6:01 am
by netmaestro
I was about to make the identical post! :lol:

Works now, going to see how the speed is :mrgreen:

Re: Packing bytes: a little help from my friends

Posted: Thu Nov 14, 2013 6:06 am
by netmaestro
Looking good! The processing time is a small fraction of the string version and it's passed all my tests.

Thanks for your help, it's much appreciated!

Re: Packing bytes: a little help from my friends

Posted: Thu Nov 14, 2013 6:12 am
by idle
Wait till wilbert gets his hands on it, It'll become a fraction of fraction! :wink:

Re: Packing bytes: a little help from my friends

Posted: Thu Nov 14, 2013 6:14 am
by netmaestro
Yep! It'll leave skidmarks on the desktop and I won't understand a friggin' line of it :lol:

Re: Packing bytes: a little help from my friends

Posted: Thu Nov 14, 2013 6:18 am
by netmaestro
Actually this one is very fast. The total time to process an image with 256 colors resulting in a 135kb file, including the NeuQuant & dither, is 303 ms. More than acceptable.

Re: Packing bytes: a little help from my friends

Posted: Thu Nov 14, 2013 6:26 am
by idle
seems reasonable to me, If there's a way to do it without the mod it would be quicker