Educational_Resources

Lecture 2: Huffman Coding

Created By: Yusuf Pisan

formatted to Github Markdown syntax by Ryan Peters

Be sure to check the other lectures out after you finish this one!

Previous lecture Next lecture

Table of Contents


Overview


Assignment 1: TurtleProgram

Each program is a series of strings (comes as 2 strings at a time)

Draw the UML Class Diagram - public functions, private variables

Write the constructors and << so you can easily display them

You are overloading multiple operators << == != = + +=

Signature must match (friend functions)

Remember to dynamically allocate the array to be just the right size (not string[100])


Common Programming Mistakes

Use compiler flags -Wall -Wextra -Wpedantic -Weffc++ and even -Werror

Review code examples: http://faculty.washington.edu/pisan/cpp/snippets.html


Binary Search Tree - Definition

Inserting items from a sorted list?

Write a BST for int in the simplest way possible with only single constructor and add.

Use the code from textbook. Generalize it to take any data type, to have proper getters/setters.

Assignment #2 will use Trees.


Binary Search Tree

search(BST, target)
  if (BST is empy)
    item not found
  else if target == data in BST
    item found
  else if target < data
    search(left subtree, target)
  else
    search(right subtree, target)

What is the worst case complexity for a badly constructed tree?

Create a balanced BST for A, B, C, D, E, F

What does it mean for a tree to be balanced, full, complete?


UML for Binary Tree

UML for BinaryTree

Implementation choices


Why Trees?


Group Exercise: Prove

Prove: A full binary tree of height ≥ 0 has 2h - 1 nodes


Group Exercise: Prove - Solution

Prove: A full binary tree of height ≥ 0 has 2h - 1 nodes

Proof by induction

Basis: When h = 0, the full binary tree is empty, and it contains 0 = 20 – 1 nodes

Inductive hypothesis: Assume that a full binary tree of height k has 2h – 1 nodes when 0 ≤ k < h.

Inductive conclusion

We must show that a full binary tree of height h has 2h – 1 nodes

Let’s look at a tree with height h-1. By the inductive hypothesis, TL and TR each have 2h-1 – 1 nodes. The number of nodes in T is

1 (for root) + (number of nodes in TL) + (number of nodes in TR) = 1 + (2h-1 – 1) + (2h-1 – 1) = 1 + 2 x (2h-1 – 1) = 1 + 2h - 2 = 2h - 1

Group Exercise: Insert Elements

Insert the letters in “Huffman Coding” to create a binary search tree


Tree as Array

Not the most natural or common, but important

TreeNode<ItemType> tree[MAX_NODES]; // array of nodes
int root; // index of root
int free; // index of free list
class TreeNode
{   
private:
   ItemType item;        // Data portion
   int      leftChild;   // Index to left child
   int      rightChild;  // Index to right child
}


Group Exercise: Array Representation

Represent “Huffman Coding” tree as an array


Tree as Linked Nodes

class BinaryNode
{
private:
   int          item;          // Data portion
   BinaryNode * leftChildPtr;  // Pointer to left child
   BinaryNode * rightChildPtr; // Pointer to right child
}
template<class ItemType>
class BinaryNode
{
private:
   ItemType                              item;          // Data portion
   BinaryNode<ItemType> * leftChildPtr;  // Pointer to left child
   BinaryNode<ItemType> * rightChildPtr; // Pointer to right child
}
template<class ItemType>
class BinaryNode
{
private:
   ItemType                              item;          // Data portion
   shared_ptr<BinaryNode<ItemType>> leftChildPtr;  // Pointer to left child
   shared_ptr<BinaryNode<ItemType>> rightChildPtr; // Pointer to right child
}

Smart Pointers

shared_ptr - shared object, does reference counting, similar to regular pointer

unique_ptr - unique ownership, nobody else can reference it

weak_ptr - observer of the object, cannot be used to delete, does not add to reference count

Box<string> myptr = new Box<string>();
shared_ptr<Box<string>> mysharedptr(new Box<string>());
...
delete myptr;
mysharedptr.reset();

If interested, read C++ Interlude 4

Using smart pointers is optional

Do not mix smart pointers and regular pointers


Group Exercise: Order of Inserts

If this is our final binary search tree, find at least 2 possible insertion orders.


Huffman Coding

Used for compression (part of the gzip, jpeg and many other algorithm)

Take advantage of repetitions

Assign a code to each letter. Short codes for frequent letters.

ASCII characters are represented as 8-bits. Lots of wasted space.

010011011001100110001110001

Where does one code begin and the other one end? Need unique prefixes

Extra: Details of gzip using LZ77 and Huffman at http://www.gzip.org/algorithm.txt

and more from Mark Adler (co-author of zlib and gzip)
https://stackoverflow.com/questions/20762094/how-are-zlib-gzip-and-zip-related-what-do-they-have-in-common-and-how-are-they


Huffman Coding - Algorithm

Group Exercise: Free Beer
  1. Calculate number of times each letter appears

  2. Create the Huffman Tree

  3. Write out the code (not unique)

Extra:

How much wood would a woodchuck chuck if a woodchuck could chuck wood?

After Class