27  Heaps, Heap Sort, and Sorting Comparison

28 Heaps, Heap Sort, and Sorting Comparison

Week 3, Session 1. CSS 343.

28.1 Warmup

Kth Largest Element in an Array (#215). Given nums and k, return the kth largest element.

class Solution {
 public:
  int findKthLargest(vector<int>& nums, int k) {
    priority_queue<int, vector<int>, greater<int>> minHeap;
    for (int n : nums) {
      minHeap.push(n);
      if ((int)minHeap.size() > k) minHeap.pop();
    }
    return minHeap.top();
  }
};

A min-heap of size k. The smallest of the top-k is at the top. After processing all numbers, that smallest is the kth largest. O(n log k) time, O(k) space.

You used a priority_queue and did not see how it works inside. This chapter rips the cover off.

28.2 Learning objectives

  1. Map a heap to its array representation using parent/child index arithmetic.
  2. Trace insert (sift-up) and extractMax (sift-down) on a max-heap.
  3. Explain why heapify (build heap from array) is O(n), not O(n log n).
  4. Implement heap sort in-place and state its O(n log n) worst-case guarantee.
  5. Compare merge sort, quicksort, and heap sort on stability, in-place, and worst-case.

28.3 The heap property

A max-heap is a complete binary tree where every node is ≥ its children. The maximum is always at the root.

A complete binary tree fills levels top-to-bottom, left-to-right. The last level may be partially filled, but only on the right.

          50
         /  \
        30    40
       / \   /
      10 20 35

Every node ≥ its children. Maximum (50) is at the root.

28.4 Array representation

Because a heap is complete, we can store it in a plain array with no pointers. Index 0 is the root. For node at index i:

  • left child: 2i + 1
  • right child: 2i + 2
  • parent: (i - 1) / 2

The above tree as an array: [50, 30, 40, 10, 20, 35].

       0:50
      /    \
    1:30   2:40
   /  \   /
 3:10 4:20 5:35

Index 1’s parent is (1-1)/2 = 0 ✓. Index 5’s parent is (5-1)/2 = 2 ✓. Memorize these formulas.

28.5 Insert: sift up

To insert a value:

  1. Append to the end.
  2. While the new node is larger than its parent, swap them.
class MaxHeap {
 public:
  void push(int v) {
    data_.push_back(v);
    siftUp(data_.size() - 1);
  }

 private:
  vector<int> data_;

  void siftUp(int i) {
    while (i > 0) {
      int parent = (i - 1) / 2;
      if (data_[i] <= data_[parent]) break;
      swap(data_[i], data_[parent]);
      i = parent;
    }
  }
};

The new node travels up at most log₂ n steps. O(log n).

28.6 Extract max: sift down

Maximum is at index 0. To remove it:

  1. Swap the root with the last element.
  2. Pop the last element (the old root, removed).
  3. From the new root, sift down: swap with the larger child while violations exist.
class MaxHeap {
 public:
  int top() const { return data_[0]; }

  void pop() {
    swap(data_[0], data_.back());
    data_.pop_back();
    if (!data_.empty()) siftDown(0);
  }

 private:
  void siftDown(int i) {
    int n = data_.size();
    while (true) {
      int left = 2 * i + 1;
      int right = 2 * i + 2;
      int largest = i;
      if (left < n && data_[left] > data_[largest]) largest = left;
      if (right < n && data_[right] > data_[largest]) largest = right;
      if (largest == i) break;
      swap(data_[i], data_[largest]);
      i = largest;
    }
  }
};

siftDown is O(log n) — at most one swap per level.

28.7 heapify — O(n), not O(n log n)

Given an arbitrary array, can we make it a heap in linear time? Yes.

void heapify(vector<int>& v) {
  int n = v.size();
  for (int i = n / 2 - 1; i >= 0; --i) {
    siftDown(v, i, n);
  }
}

Start from the last non-leaf (index n/2 - 1) and sift down each node, walking back to the root. Leaves are already valid heaps by themselves.

The naive analysis says: n nodes, sift each O(log n), so O(n log n). But that overcounts — most nodes are near the leaves and sift down by very little. The careful sum:

\[ \sum_{h=0}^{\log n} \frac{n}{2^{h+1}} \cdot h = O(n) \]

(The proof uses the geometric series \(\sum h/2^h\) which converges to 2.) Result: heapify is O(n).

This matters because heap sort uses heapify once at the start.

28.8 Heap sort

void heapSort(vector<int>& v) {
  int n = v.size();

  // Build max-heap in place
  for (int i = n / 2 - 1; i >= 0; --i) siftDown(v, i, n);

  // Repeatedly extract max to the end
  for (int i = n - 1; i > 0; --i) {
    swap(v[0], v[i]);          // largest goes to position i
    siftDown(v, 0, i);          // sift down within reduced heap
  }
}

The result is ascending sorted order — because the max goes to the end of the array on each pass.

Complexity: heapify O(n), then n−1 extractions each O(log n). Total O(n log n), worst case (always).

Heap sort guarantees O(n log n) like merge sort, but uses O(1) extra space like quicksort. It is also slower in practice than both because the comparisons jump around in memory (poor cache behavior). When std::sort falls back from quicksort to avoid the O(n²) worst case, it switches to heap sort.

28.9 Sorting comparison

Sort Best Average Worst Space Stable In place
Insertion O(n) O(n²) O(n²) O(1) Yes Yes
Merge O(n log n) O(n log n) O(n log n) O(n) Yes No
Quick O(n log n) O(n log n) O(n²) O(log n) No Yes
Heap O(n log n) O(n log n) O(n log n) O(1) No Yes

std::sort is introsort: quicksort that switches to heap sort when recursion depth exceeds 2 log n. Combines quicksort’s average speed with heap sort’s worst-case guarantee.

28.10 STL connection

std::priority_queue<T> is a max-heap on T, backed by a vector<T>. The kth-largest warmup used a min-heap by passing greater<int> as the comparator. Now you know exactly what was happening: sift-up on push, sift-down on pop.

You can also build a heap with raw std::make_heap, std::push_heap, std::pop_heap from <algorithm> — that operates on a vector directly. Useful when you need fine-grained control.

28.11 Try it

Implement MaxHeap from scratch. Test with push(10), push(5), push(20), push(7), push(15). Print the underlying array — it should respect the heap property at every index.

Then pop() three times and confirm you get 20, 15, 10 in that order.

28.12 Self-check

1. In an array-backed heap, the parent of index 7 is at:
c. (7-1)/2 = 3.
2. Building a heap from an unsorted array (heapify) is:
b. The summation of work per level converges; details in CLRS chapter 6.

28.13 Challenges

  1. Kth Largest Element in a Stream (#703) — priority_queue.
  2. Merge k Sorted Lists (#23) — heap of list heads.
  3. Find Median from Data Stream (#295) — two heaps.
  4. Implement heapSort in place; verify on 100k random integers.

28.14 Where this fits

You can now build the array-backed heap. From here we leave tree-and-heap land and enter graph algorithms — same recursive shapes, more pieces.

You are here Coming up
Heap (insert, extract, heapify), heap sort, sorting comparison Chapter 26: graphs, BFS, DFS