Don't hassle the Huff'

Just starting out? Need help? Post your questions and find answers here.
User avatar
Fluid Byte
Addict
Addict
Posts: 2336
Joined: Fri Jul 21, 2006 4:41 am
Location: Berlin, Germany

Don't hassle the Huff'

Post by Fluid Byte »

I am currently working on a implementation of the Huffman algorithm in PB. I really thought I have done this in a couple of days but it's taking much longer than expected now. At the moment I am able to count the character frequency, construct the Huffman-Tree and create the Huffman-Code for each character.

For example I have the input string "MISSISSIPPI". That would result in the following Huffman-Codes: S = 0, I = 11, P = 101, M = 100. When you use these codes to encode the string you get: 100110011001110110111. Now you split it up in put into three bytes: 10011001 / 10011101 / 10111.

I hope that this is correct so far. Because I found a Huffmancode on this forum wich results in 5 instead of 3 Bytes for the encoded string.

My question is now how do you decode the string again? I think in order to be able to decode the string even after the applications ends you have to save some addtional information in the encoded string. At least I read that on some sites I found via Google.
Windows 10 Pro, 64-Bit / Whose Hoff is it anyway?
Road Runner
User
User
Posts: 48
Joined: Tue Oct 07, 2003 3:10 pm

Post by Road Runner »

The only other information you should need to decode your bit stream is the symbol table you just generated:S = 0, I = 11, P = 101, M = 100

You scan your coded data from the start and compare with each symbol in your symbol table. When you get a match, you have found the next symbol, you remove those bits and start again to look for the next symbol.

e.g.
Code = 100110011001110110111

Try 1 bit, value = 1, no match in symbol table.
Try 2 bits, value = 10, no match in symbol table.
Try 3 bits, value = 100, matches M in symbol table

Remove those 3 bits and start again to find next symbol
Code = 110011001110110111
Try 1 bit, value = 1, no match in symbol table.
Try 2 bits, value = 11, matches I in symbol table

Remove those 2 bits and start again to find next symbol
Code = 0011001110110111
Try 1 bit, value = 0, matches S in symbol table

Remove that 1 bit and start again to find next symbol
Code = 011001110110111
Try 1 bit, value = 0, matches S in symbol table

Remove that 1 bit and start again to find next symbol
Code = 11001110110111
Try 1 bit, value = 1, no match in symbol table.
Try 2 bits, value = 11, matches I in symbol table

Remove those 2 bits and start again to find next symbol
Code = 001110110111
Try 1 bit, value = 0, matches S in symbol table

Remove that 1 bit and start again to find next symbol
Code = 01110110111
Try 1 bit, value = 0, matches S in symbol table

Remove that 1 bit and start again to find next symbol
Code = 1110110111
Try 1 bit, value = 1, no match in symbol table.
Try 2 bits, value = 11, matches I in symbol table

Remove those 2 bits and start again to find next symbol
Code = 10110111
Try 1 bit, value = 1, no match in symbol table.
Try 2 bits, value = 10, no match in symbol table.
Try 3 bits, value = 101, matches P in symbol table

Remove those 3 bits and start again to find next symbol
Code = 10111
Try 1 bit, value = 1, no match in symbol table.
Try 2 bits, value = 10, no match in symbol table.
Try 3 bits, value = 101, matches P in symbol table

Remove those 3 bits and start again to find next symbol
Code = 11
Try 1 bit, value = 1, no match in symbol table.
Try 2 bits, value = 11, matches I in symbol table

Remove those 2 bits. There are no remaining bits in the code so the decode is finished and the result is MISSISSIPPI

..editted to correct cut and paste error in P.
Last edited by Road Runner on Sun Sep 21, 2008 6:53 pm, edited 1 time in total.
User avatar
blueb
Addict
Addict
Posts: 1111
Joined: Sat Apr 26, 2003 2:15 pm
Location: Cuernavaca, Mexico

Post by blueb »

Fluid Byte...

You might take a look at: http://www.pbcrypto.com/view.php?algorithm=huffman

These are PowerBASIC files that focus on encryption techniques.

--blueb
- It was too lonely at the top.

System : PB 6.21(x64) and Win 11 Pro (x64)
Hardware: AMD Ryzen 9 5900X w/64 gigs Ram, AMD RX 6950 XT Graphics w/16gigs Mem
User avatar
Fluid Byte
Addict
Addict
Posts: 2336
Joined: Fri Jul 21, 2006 4:41 am
Location: Berlin, Germany

Post by Fluid Byte »

Hehe, now that's a really detailed way of doing it. Actually it would have been enuff to show the decoding of the first two characters but you never know. :wink:

Thank you both, got it working now. :)
Windows 10 Pro, 64-Bit / Whose Hoff is it anyway?
User avatar
Kaeru Gaman
Addict
Addict
Posts: 4826
Joined: Sun Mar 19, 2006 1:57 pm
Location: Germany

Post by Kaeru Gaman »

thanx for the example, Fluid & RoadRunner...

never cared about Huff before, know I know what it is...
oh... and have a nice day.
User avatar
Fluid Byte
Addict
Addict
Posts: 2336
Joined: Fri Jul 21, 2006 4:41 am
Location: Berlin, Germany

Post by Fluid Byte »

@Kaeru:
Here is the page that got me started with Huffman (really easy to understand):

http://www.ziegenbalg.ph-karlsruhe.de/m ... ierung.htm

It features an excellent explanation of the algorithm and uses the same demoword as here.

Here's another site with a good info as well plus a nice javascript to demonstrate the algorithm in realtime:

http://www.inf.fh-flensburg.de/lang/alg ... uffman.htm

Both in German only, sorry. :P
Windows 10 Pro, 64-Bit / Whose Hoff is it anyway?
User avatar
Rook Zimbabwe
Addict
Addict
Posts: 4322
Joined: Tue Jan 02, 2007 8:16 pm
Location: Cypress TX
Contact:

Post by Rook Zimbabwe »

Interesting link to how the code came about:

http://www.huffmancoding.com/david/algorithm.html

8)

Long winded English language explanation:

http://www.maths.abdn.ac.uk/~igc/tch/mx ... ode59.html
Binarily speaking... it takes 10 to Tango!!!

Image
http://www.bluemesapc.com/
Post Reply