26  BST as a Template Class + Huffman Trees

27 BST as a Template Class + Huffman Trees

Week 2, Session 2. CSS 343.

27.1 Warmup

Palindrome Number (#9) again. Now write operator== for a Fraction class and use it inside.

27.2 Learning objectives

  1. Write a complete template BST class satisfying the Rule of Three in C++.
  2. Explain why prefix codes enable unambiguous decoding.
  3. Construct a Huffman tree by hand from a frequency table.
  4. Implement Huffman construction using std::priority_queue with a custom comparator.
  5. Encode a short string using a Huffman code table.

27.3 A full template BST class

We have been writing BSTs with raw TreeNode* pointers and free functions. Time to wrap them.

// BST.h
#pragma once
#include <iostream>

template <typename T>
class BST {
 public:
  BST() : root_(nullptr) {}
  BST(const BST& other) : root_(deepCopy(other.root_)) {}
  BST& operator=(const BST& other);
  ~BST() { deleteAll(root_); }

  void insert(const T& value);
  bool contains(const T& value) const;
  void remove(const T& value);
  void inOrderPrint(std::ostream& os) const;

 private:
  struct Node {
    T data;
    Node* left;
    Node* right;
    Node(const T& v) : data(v), left(nullptr), right(nullptr) {}
  };
  Node* root_;

  void insertHelper(Node*& node, const T& value);
  bool containsHelper(const Node* node, const T& value) const;
  void removeHelper(Node*& node, const T& value);
  Node* deepCopy(const Node* node) const;
  void deleteAll(Node* node);
  Node* findMin(Node* node) const;
  void printHelper(const Node* node, std::ostream& os) const;
};

#include "BST.tpp"     // implementation file, included from header
// BST.tpp  (template implementation, included from BST.h)

template <typename T>
void BST<T>::insertHelper(Node*& node, const T& value) {
  if (node == nullptr) { node = new Node(value); return; }
  if (value < node->data) insertHelper(node->left, value);
  else if (node->data < value) insertHelper(node->right, value);
}

template <typename T>
void BST<T>::insert(const T& value) { insertHelper(root_, value); }

template <typename T>
bool BST<T>::containsHelper(const Node* node, const T& value) const {
  if (node == nullptr) return false;
  if (value < node->data) return containsHelper(node->left, value);
  if (node->data < value) return containsHelper(node->right, value);
  return true;
}

template <typename T>
bool BST<T>::contains(const T& value) const {
  return containsHelper(root_, value);
}

template <typename T>
typename BST<T>::Node* BST<T>::findMin(Node* node) const {
  while (node->left != nullptr) node = node->left;
  return node;
}

template <typename T>
void BST<T>::removeHelper(Node*& node, const T& value) {
  if (node == nullptr) return;
  if (value < node->data) { removeHelper(node->left, value); return; }
  if (node->data < value) { removeHelper(node->right, value); return; }
  // Match
  if (node->left == nullptr && node->right == nullptr) {
    delete node; node = nullptr;
  } else if (node->left == nullptr) {
    Node* tmp = node; node = node->right; delete tmp;
  } else if (node->right == nullptr) {
    Node* tmp = node; node = node->left; delete tmp;
  } else {
    Node* succ = findMin(node->right);
    node->data = succ->data;
    removeHelper(node->right, succ->data);
  }
}

template <typename T>
void BST<T>::remove(const T& value) { removeHelper(root_, value); }

template <typename T>
typename BST<T>::Node* BST<T>::deepCopy(const Node* node) const {
  if (node == nullptr) return nullptr;
  Node* copy = new Node(node->data);
  copy->left = deepCopy(node->left);
  copy->right = deepCopy(node->right);
  return copy;
}

template <typename T>
void BST<T>::deleteAll(Node* node) {
  if (node == nullptr) return;
  deleteAll(node->left);
  deleteAll(node->right);
  delete node;
}

template <typename T>
BST<T>& BST<T>::operator=(const BST& other) {
  if (this == &other) return *this;
  deleteAll(root_);
  root_ = deepCopy(other.root_);
  return *this;
}

template <typename T>
void BST<T>::printHelper(const Node* node, std::ostream& os) const {
  if (node == nullptr) return;
  printHelper(node->left, os);
  os << node->data << " ";
  printHelper(node->right, os);
}

template <typename T>
void BST<T>::inOrderPrint(std::ostream& os) const {
  printHelper(root_, os);
  os << "\n";
}

Three things worth noticing:

  1. #include "BST.tpp" at the end of BST.h. The .tpp (or .ipp) extension is convention for “template implementation file.” It is included like a header so the template definitions are visible at every instantiation site.
  2. Comparison is done via < only. No == between elements. a < b and b < a both false means equal. This matches std::set’s requirement and lets the template work for any type that has <.
  3. typename BST<T>::Node* as a return type. Because Node is a dependent name (depends on T), the compiler needs the typename keyword to know it is a type.

27.4 Huffman trees

A Huffman code is a variable-length prefix code derived from a frequency table. Frequent characters get short codes; rare characters get long codes. The result is a lossless compression scheme: the original data is recoverable, the compressed size is smaller.

27.4.1 Prefix codes

A code is a prefix code if no codeword is a prefix of another. This is what makes Huffman codes uniquely decodable. If a = 0 and b = 01, then 01 could be “b” or “a then start of something else” — ambiguous. Prefix codes forbid this.

Example: a = 0, b = 10, c = 110, d = 111. The string 1010110111 decodes uniquely: b b c d.

27.4.2 Building the tree

Algorithm:

  1. Make a leaf node for each character, weighted by its frequency.
  2. Insert all leaves into a min-heap (priority queue) keyed by weight.
  3. While more than one node remains: extract the two smallest, combine into a parent whose weight is the sum, and reinsert.
  4. The final node is the root.
struct HNode {
  char ch;
  int freq;
  HNode* left;
  HNode* right;
  HNode(char c, int f) : ch(c), freq(f), left(nullptr), right(nullptr) {}
  HNode(HNode* l, HNode* r) : ch(0), freq(l->freq + r->freq), left(l), right(r) {}
};

struct CompareHNode {
  bool operator()(const HNode* a, const HNode* b) const {
    return a->freq > b->freq;     // min-heap on freq
  }
};

HNode* buildHuffman(const std::map<char, int>& freq) {
  std::priority_queue<HNode*, std::vector<HNode*>, CompareHNode> pq;
  for (const auto& [ch, f] : freq) pq.push(new HNode(ch, f));

  while (pq.size() > 1) {
    HNode* a = pq.top(); pq.pop();
    HNode* b = pq.top(); pq.pop();
    pq.push(new HNode(a, b));
  }
  return pq.top();
}

27.4.3 Worked example

Frequencies: {a: 5, b: 2, c: 1, d: 3}.

Priority queue (sorted by freq, smallest first): c(1), b(2), d(3), a(5).

  1. Extract c(1) and b(2). Combine: cb(3). PQ: d(3), cb(3), a(5).
  2. Extract d(3) and cb(3). Combine: dcb(6). PQ: a(5), dcb(6).
  3. Extract a(5) and dcb(6). Combine: adcb(11). PQ: adcb(11).
  4. One node: done.
        adcb (11)
       /          \
      a (5)      dcb (6)
                  /    \
                d (3)   cb (3)
                        /    \
                      c (1)  b (2)

Reading the codes (left = 0, right = 1):

  • a: 0
  • d: 10
  • c: 110
  • b: 111

Encoding “aabbcd”: 0 0 111 111 110 10 = 0011111111010 = 13 bits. Fixed 2-bit code (4 chars need 2 bits each) would be 12 bits — slightly worse here because the input is too small for Huffman to win. On a 1 MB English text file, Huffman saves ~40%.

Huffman is the canonical example of a greedy algorithm: at every step, combine the two smallest. The greedy choice is provably optimal here — we will revisit this in Chapter 32.

27.5 Try it

Build a Huffman tree for "hello world". Compute the code table. Encode the original string and compare bit lengths against fixed 8-bit ASCII.

27.6 Self-check

1. Why must template implementations be visible at every use site?
b. Without seeing the body, the compiler cannot generate the concrete code.
2. The Huffman tree's priority queue is a:
c. We always pull the two least-frequent nodes. Min-heap on frequency.

27.7 Challenges

  1. Implement the BST template class above. Test with int, string, and a custom comparable class.
  2. Implement Huffman encoding and decoding. Encode a sentence; decode back; confirm equality.
  3. Modify Huffman to use canonical Huffman codes (codes are determined by code length, not by tree shape — useful for serialization).

27.8 Where this fits

You can now build trees from the ground up. Next chapter shifts from comparison-based trees to array-backed trees — the heap.

You are here Coming up
Template BST with Rule of Three, Huffman trees Chapter 25: heaps and heap sort