40 Turing Machines, Computability, and Language Modeling

41 Turing Machines and Computability

Week 9, Session 2. CSS 343.

41.1 Warmup

Recall the DFA chapter. A DFA cannot recognize {aⁿbⁿ : n ≥ 0}. A pushdown automaton (DFA + stack) can. What can a Turing machine do that a pushdown automaton cannot?

Answer: count two stacks worth of things at once. A Turing machine recognizes {aⁿbⁿcⁿ : n ≥ 0}, which a PDA cannot.

This chapter is about the ceiling — the most powerful computation model, and what even it cannot do.

41.2 Learning objectives

Describe the components of a Turing machine and trace a simple TM.
Explain the Church-Turing thesis.
State the halting problem and explain why it is undecidable.
Classify problems as P, NP, or NP-complete with two concrete examples of each.
Describe a bigram language model.

41.3 Turing machines

A Turing machine has:

A two-way infinite tape, divided into cells.
A read/write head positioned at one cell.
A finite set of states Q.
A transition function δ : Q × Σ → Q × Σ × {L, R}.

In each step, the machine reads the current cell, writes (possibly the same symbol), moves the head one cell left or right, and transitions to a new state. If it enters a halting state, the computation ends. The output is whatever is on the tape (or simply accept/reject).

41.3.1 TM for `aⁿbⁿ`

Algorithm:

Find the first unmarked a. Mark it (replace with X). Walk right.
Find the first unmarked b. Mark it (replace with Y). Walk left.
Find the leftmost unmarked a.
Repeat. If you cannot find an a and the rest is all Ys, accept. If counts mismatch, reject.

Trace on “aabb”:

[a a b b]              → mark first a
[X a b b]              → walk right to first b, mark
[X a Y b]              → walk left to first unmarked a
[X X Y b]              → mark, walk right
[X X Y Y]              → all Ys remain. Accept.

A few dozen state transitions. The point is not that TMs are practical — they are not — but that they are the formal definition of “computable.”

41.4 Church-Turing thesis

Any function computable by any reasonable model of computation is computable by a Turing machine.

This is a thesis, not a theorem. It is not provable because “reasonable model” is informal. But every model anyone has invented — lambda calculus, register machines, your laptop — turns out to be Turing-equivalent. The thesis has held up for 90 years.

What this means: when we ask “is this problem computable?”, we mean “is there a Turing machine that solves it?” If yes, your laptop can solve it (modulo time and memory). If no, neither can your laptop, no matter how fast.

41.5 The halting problem

Given a program P and an input x, does P halt on x, or does it loop forever?

Claim: this problem is undecidable — no algorithm exists to solve it for all P and x.

Proof sketch (diagonalization):

Suppose we have a function HALT(P, x) that returns true iff P halts on x. Construct:

function Diagonal(P):
  if HALT(P, P):
    loop forever
  else:
    halt

Now ask: does Diagonal(Diagonal) halt?

If it halts, then HALT(Diagonal, Diagonal) returned true. But our code then loops forever. Contradiction.
If it loops forever, then HALT(Diagonal, Diagonal) returned false. But our code then halts. Contradiction.

Either way, contradiction. So HALT does not exist. The halting problem is undecidable. ∎

This is one of the most important results in computer science. It is also the same proof shape as Cantor’s diagonal argument that |ℝ| > |ℕ|, and Gödel’s incompleteness theorem. The pattern: assume the thing exists, build something that contradicts it.

The halting problem is the first uncomputable problem you meet. There are infinitely many. A surprising fraction of “static analyzer” features in compilers are best-effort approximations to undecidable questions.

41.6 Complexity classes

P: problems solvable in polynomial time on a deterministic TM. Examples: sorting, shortest path, primality (since 2002).
NP: problems whose solutions can be verified in polynomial time. Examples: subset sum, 3-SAT, traveling salesman (decision version).
NP-complete: the “hardest” problems in NP — if any one of them has a polynomial-time algorithm, all problems in NP do. Examples: 3-SAT, vertex cover, Hamiltonian path.

P ⊆ NP. Whether P = NP is the most famous open problem in computer science. $1M prize if you solve it.

41.7 Language modeling — a different kind of theory

Language models predict the next word. A bigram model uses one-word context:

\[ P(w_n | w_1 w_2 \ldots w_{n-1}) \approx P(w_n | w_{n-1}) \]

The Markov assumption: only the previous word matters. You estimate from a training corpus:

\[ P(w_n | w_{n-1}) = \frac{\text{count}(w_{n-1}, w_n)}{\text{count}(w_{n-1})} \]

unordered_map<string, unordered_map<string, int>> bigramCounts;
// after training:
double prob(const string& w1, const string& w2) {
  int joint = bigramCounts[w1][w2];
  int total = 0;
  for (const auto& [_, c] : bigramCounts[w1]) total += c;
  return (double)joint / total;
}

Modern LLMs use vastly larger context (the entire conversation) and very different math (transformers), but the bigram model captures the intuition: predict from observed frequencies.

41.8 The Chomsky hierarchy

Level	Language class	Recognized by
0	Recursively enumerable	Turing machine
1	Context-sensitive	Linear-bounded automaton
2	Context-free	Pushdown automaton
3	Regular	Finite automaton

Each level is strictly more powerful. C++ syntax is context-free. Natural language is context-sensitive (informally — a topic of ongoing research). Things like “does this program halt?” are beyond even Turing machines.

41.9 Try it

Write a regular expression that matches aⁿbⁿ for n ≤ 3. Now try aⁿbⁿ for arbitrary n. You can do the first (it is finite). You cannot do the second. Convince yourself why.

41.10 Self-check

1. The halting problem is undecidable. This means:

a. It takes exponential time.
b. It requires more memory than is available.
c. No algorithm exists that solves it for all inputs.
d. Only humans can solve it.

c. Proven by Turing in 1936 via diagonalization. Holds regardless of computer speed.

2. Verifying a proposed solution to a Sudoku puzzle is in:

a. P only
b. P (and therefore NP)
c. Undecidable
d. NP-complete

b. Verifying takes polynomial time (check rows, columns, boxes). Solving generalized Sudoku is NP-complete; verification is easier.

41.11 Challenges

Implement a TM simulator in C++ using map<pair<int, char>, tuple<int, char, int>> for transitions. Test on the aⁿbⁿ TM.
Build a bigram language model from a text file. Compute the probability of a few sample sentences.
Read the Wikipedia article on Cook-Levin theorem (3-SAT is NP-complete). State the theorem.

41.12 Where this fits

The end of new content. Next chapter is the final review; the chapter after closes the book.

You are here	Coming up
Turing machines, halting, P/NP/NP-complete, language models	Chapter 39: final exam review