30 Hash Tables
31 Hash Tables
Week 4, Session 2. CSS 343.
31.1 Warmup
Two Sum (#1). Brute force then optimized.
// O(n²)
vector<int> twoSum_brute(vector<int>& nums, int target) {
for (int i = 0; i < (int)nums.size(); ++i) {
for (int j = i + 1; j < (int)nums.size(); ++j) {
if (nums[i] + nums[j] == target) return {i, j};
}
}
return {};
}
// O(n) — hash map
vector<int> twoSum(vector<int>& nums, int target) {
unordered_map<int, int> seenAt;
for (int i = 0; i < (int)nums.size(); ++i) {
int complement = target - nums[i];
if (seenAt.count(complement) == 1) return {seenAt[complement], i};
seenAt[nums[i]] = i;
}
return {};
}The hash-map version is the canonical illustration of using O(n) extra space to drop a time-complexity factor. The trade is almost always worth it.
31.2 Learning objectives
- Evaluate a hash function for uniformity and efficiency.
- Trace separate chaining and linear probing on a small table.
- Explain why load factor > 0.7 triggers rehashing and state amortized O(1).
- Use
unordered_map/unordered_setwith custom key types via a hash functor. - Solve frequency-counting and two-sum problems optimally using hash maps.
31.3 What a hash table is
An array, plus a function that maps keys to array indices.
class StringMap {
vector<string> data_;
int hash(const string& key) const {
// some integer derived from key
}
};If hash is good, lookup is O(1) — compute the index, look at the slot. If two keys hash to the same slot — a collision — we need a strategy.
31.4 Hash functions
A good hash function:
- Distributes keys roughly uniformly across the table.
- Is fast to compute.
- Uses all parts of the key.
For integers: hash(k) = k % tableSize. For strings, a polynomial rolling hash:
size_t hash(const string& s, size_t tableSize) {
size_t h = 0;
for (char c : s) h = h * 31 + c;
return h % tableSize;
}Multiply by 31 (a small prime), add the next character. Each character influences the final value. Mod by table size to get an index.
std::hash<string> does essentially this with better constants and security mitigations. You can use it directly.
31.5 Collision resolution: separate chaining
Each slot holds a linked list (or vector) of all keys that hash there.
table[0] → [key1, key17, key29]
table[1] → []
table[2] → [key5]
table[3] → [key3, key19]
Lookup: hash the key, scan the bucket’s list.
template <typename K, typename V>
class HashMap {
public:
HashMap(size_t buckets = 16) : buckets_(buckets), size_(0) {}
void put(const K& key, const V& value) {
auto& bucket = buckets_[hash(key) % buckets_.size()];
for (auto& [k, v] : bucket) {
if (k == key) { v = value; return; }
}
bucket.push_back({key, value});
++size_;
if ((double)size_ / buckets_.size() > 0.75) rehash();
}
bool get(const K& key, V& out) const {
const auto& bucket = buckets_[hash(key) % buckets_.size()];
for (const auto& [k, v] : bucket) {
if (k == key) { out = v; return true; }
}
return false;
}
private:
vector<list<pair<K, V>>> buckets_;
size_t size_;
void rehash();
};31.6 Collision resolution: open addressing
If the slot is taken, probe to the next slot. Variants:
- Linear probing: slot, slot+1, slot+2, …
- Quadratic probing: slot, slot+1, slot+4, slot+9, …
- Double hashing: slot, slot + h2(key), slot + 2·h2(key), …
Linear probing is fast in practice (cache-friendly) but suffers from clustering — runs of occupied slots that grow longer. Once load factor exceeds ~0.5, performance degrades.
31.7 Load factor and rehashing
Load factor = (number of items) / (table size).
When the load factor exceeds a threshold (often 0.7), the table doubles in size and every key is re-hashed into the new table. That single operation is O(n), but it happens rarely enough that the amortized cost per insert is O(1).
template <typename K, typename V>
void HashMap<K, V>::rehash() {
vector<list<pair<K, V>>> old = std::move(buckets_);
buckets_.assign(old.size() * 2, {});
size_ = 0;
for (auto& bucket : old) {
for (auto& [k, v] : bucket) put(k, v);
}
}31.8 Custom hash functors
unordered_map<pair<int,int>, int> does not compile out of the box because pair has no built-in hash. Provide one:
struct PairHash {
size_t operator()(const pair<int,int>& p) const {
return hash<int>()(p.first) ^ (hash<int>()(p.second) << 1);
}
};
unordered_map<pair<int,int>, int, PairHash> grid;A simple combine — XOR with a shift — works for short pairs. For better mixing, use boost::hash_combine or implement a similar function.
Writing your own hash for a custom type is a common interview question and a real-world need. The mechanics are simple: combine the hashes of the fields. The art is making it well-distributed.
31.9 When NOT to use a hash table
- You need keys in sorted order → use
map/set. - You need fast range queries (e.g. all keys between 10 and 20) → balanced BST.
- Keys have very poor hash distribution → hash table degenerates.
- Adversarial input → security risk (hash-collision DoS attacks).
For most application code, unordered_map is the right default.
31.10 Try it
Insert {10, 20, 30, 15, 25} into a hash table of size 7 using key % 7. Show the table state with (a) separate chaining and (b) linear probing.
Separate chaining:
10 % 7 = 3, 20 % 7 = 6, 30 % 7 = 2, 15 % 7 = 1, 25 % 7 = 4. Table:[_, 15, 30, 10, 25, _, 20].Linear probing: same hashes, no collisions → same table.
Now insert 17 (17 % 7 = 3). Chaining: bucket 3 becomes [10, 17]. Linear probing: slot 3 occupied (10), try 4 (occupied, 25), try 5 (empty). Insert at 5.
31.11 Self-check
unordered_map::find amortized O(1) despite occasional O(n) rehashes?unordered_map<pair<int,int>, int> fails to compile because:31.12 Challenges
- Group Anagrams (#49) —
unordered_map<string, vector<string>>. - LRU Cache (#146) —
unordered_map+ doubly linked list. - Longest Substring Without Repeating Characters (#3) — sliding window + hash set.
- Implement
HashMap<K, V>with separate chaining; benchmark againstunordered_map.
31.13 Where this fits
Hash tables are the fast-lookup workhorse. The next chapter goes the other direction — guaranteed O(log n) via balanced trees.
| You are here | Coming up |
|---|---|
| Hash tables, chaining, probing, load factor, custom hash | Chapter 29: AVL trees and red-black trees |