Page 1 of 1

Don't hassle the Huff'

Posted: Sun Sep 21, 2008 2:30 pm
by Fluid Byte
I am currently working on a implementation of the Huffman algorithm in PB. I really thought I have done this in a couple of days but it's taking much longer than expected now. At the moment I am able to count the character frequency, construct the Huffman-Tree and create the Huffman-Code for each character.

For example I have the input string "MISSISSIPPI". That would result in the following Huffman-Codes: S = 0, I = 11, P = 101, M = 100. When you use these codes to encode the string you get: 100110011001110110111. Now you split it up in put into three bytes: 10011001 / 10011101 / 10111.

I hope that this is correct so far. Because I found a Huffmancode on this forum wich results in 5 instead of 3 Bytes for the encoded string.

My question is now how do you decode the string again? I think in order to be able to decode the string even after the applications ends you have to save some addtional information in the encoded string. At least I read that on some sites I found via Google.

Posted: Sun Sep 21, 2008 3:15 pm
by Road Runner
The only other information you should need to decode your bit stream is the symbol table you just generated:S = 0, I = 11, P = 101, M = 100

You scan your coded data from the start and compare with each symbol in your symbol table. When you get a match, you have found the next symbol, you remove those bits and start again to look for the next symbol.

e.g.
Code = 100110011001110110111

Try 1 bit, value = 1, no match in symbol table.
Try 2 bits, value = 10, no match in symbol table.
Try 3 bits, value = 100, matches M in symbol table

Remove those 3 bits and start again to find next symbol
Code = 110011001110110111
Try 1 bit, value = 1, no match in symbol table.
Try 2 bits, value = 11, matches I in symbol table

Remove those 2 bits and start again to find next symbol
Code = 0011001110110111
Try 1 bit, value = 0, matches S in symbol table

Remove that 1 bit and start again to find next symbol
Code = 011001110110111
Try 1 bit, value = 0, matches S in symbol table

Remove that 1 bit and start again to find next symbol
Code = 11001110110111
Try 1 bit, value = 1, no match in symbol table.
Try 2 bits, value = 11, matches I in symbol table

Remove those 2 bits and start again to find next symbol
Code = 001110110111
Try 1 bit, value = 0, matches S in symbol table

Remove that 1 bit and start again to find next symbol
Code = 01110110111
Try 1 bit, value = 0, matches S in symbol table

Remove that 1 bit and start again to find next symbol
Code = 1110110111
Try 1 bit, value = 1, no match in symbol table.
Try 2 bits, value = 11, matches I in symbol table

Remove those 2 bits and start again to find next symbol
Code = 10110111
Try 1 bit, value = 1, no match in symbol table.
Try 2 bits, value = 10, no match in symbol table.
Try 3 bits, value = 101, matches P in symbol table

Remove those 3 bits and start again to find next symbol
Code = 10111
Try 1 bit, value = 1, no match in symbol table.
Try 2 bits, value = 10, no match in symbol table.
Try 3 bits, value = 101, matches P in symbol table

Remove those 3 bits and start again to find next symbol
Code = 11
Try 1 bit, value = 1, no match in symbol table.
Try 2 bits, value = 11, matches I in symbol table

Remove those 2 bits. There are no remaining bits in the code so the decode is finished and the result is MISSISSIPPI

..editted to correct cut and paste error in P.

Posted: Sun Sep 21, 2008 4:03 pm
by blueb
Fluid Byte...

You might take a look at: http://www.pbcrypto.com/view.php?algorithm=huffman

These are PowerBASIC files that focus on encryption techniques.

--blueb

Posted: Mon Sep 22, 2008 2:58 pm
by Fluid Byte
Hehe, now that's a really detailed way of doing it. Actually it would have been enuff to show the decoding of the first two characters but you never know. :wink:

Thank you both, got it working now. :)

Posted: Mon Sep 22, 2008 3:51 pm
by Kaeru Gaman
thanx for the example, Fluid & RoadRunner...

never cared about Huff before, know I know what it is...

Posted: Mon Sep 22, 2008 4:54 pm
by Fluid Byte
@Kaeru:
Here is the page that got me started with Huffman (really easy to understand):

http://www.ziegenbalg.ph-karlsruhe.de/m ... ierung.htm

It features an excellent explanation of the algorithm and uses the same demoword as here.

Here's another site with a good info as well plus a nice javascript to demonstrate the algorithm in realtime:

http://www.inf.fh-flensburg.de/lang/alg ... uffman.htm

Both in German only, sorry. :P

Posted: Mon Sep 22, 2008 6:46 pm
by Rook Zimbabwe
Interesting link to how the code came about:

http://www.huffmancoding.com/david/algorithm.html

8)

Long winded English language explanation:

http://www.maths.abdn.ac.uk/~igc/tch/mx ... ode59.html