A* Search, Visually: the Heuristic Is the Whole Game
A* is not a clever algorithm so much as Dijkstra plus a bet about the future. The same code becomes Dijkstra, greedy best-first, or A* depending on one term in the priority key — and admissibility is the single property that buys optimality.
A* (pronounced "A-star") gets taught as a brand-new algorithm — a clever specialised thing you reach for when Dijkstra is too slow. That framing hides what is going on. A* is Dijkstra plus a bet about the future. You keep the exact same priority queue, the exact same expand-the-cheapest-node loop, and you add one term to the score: a guess at how far each node still has to go. Change that one term and the same code becomes Dijkstra, greedy best-first search, or A*. Nothing else moves.
The baseline: Dijkstra is search with no opinion
Start from the algorithm with no bet at all. Edsger Dijkstra's 1959 shortest-path algorithm, run from a single source toward a goal, is uniform-cost search: a priority queue of frontier nodes, each keyed by — the cheapest cost found so far to reach node from the start. You pop the cheapest node, mark it settled, and relax its neighbours: if going through gives a cheaper route to a neighbour , you record the new and push onto the queue.
The key word is cheapest. Because you always settle the lowest- frontier node, you settle nodes in strict order of distance from the start. The closed set grows as a wavefront — every node at cost is settled before any node at cost .
Dijkstra is A* with a blindfold
Both algorithms pop the lowest-keyed node from a priority queue and relax its neighbours. The only difference is the key. Dijkstra sorts by "how far I have come." A* sorts by "how far I have come, plus how far I think I still have to go." Give A* a heuristic that always returns zero and it becomes Dijkstra, byte for byte. There is no separate Dijkstra in the lab — one search loop, three score functions.
The bet: add a guess, get f = g + h
Here is the move. To each node, attach a second number — a heuristic, an estimate of the remaining cost from to the goal. On a grid you get one for free: the straight-line or city-block distance to the goal, ignoring walls. Then sort the queue not by alone but by their sum.
That is the whole of A*. is what you have spent; is what you bet you still owe; is your best current guess at the total cost of the cheapest path through . Pop the lowest , settle it, relax neighbours, repeat. Equation (1) is the most consequential plus sign in search — every property below follows from it.
Read the two endpoints of that sum and you recover the whole family:
- Set everywhere. Then , and you are back to Dijkstra. The bet is "I have no idea," so the search has no opinion.
- Ignore entirely and sort by . Now you are doing greedy best-first search — always lunging at whichever frontier cell looks closest to the goal, never accounting for the cost already paid. It is fast and direct, and it cheerfully walks into dead-ends and detours because it never asks what a step cost.
A* is not a new algorithm. It is a dial between caution and ambition, and the heuristic is the dial.
Set the Algorithm control to Greedy in either lab above. The frontier streaks at the goal even harder than A* — and on a cluttered maze you will catch it returning a visibly kinked, longer path. It got there fast; it did not get there cheaply. A* sits between the extremes, weighting "where I've been" against "where I'm going," and that balance is what lets it be both fast and correct. But "correct" is a promise, and the promise has a precondition.
Why admissibility buys optimality
A* returns the genuinely shortest path only when the heuristic obeys one rule: it must never overestimate the true remaining cost. Call the actual cheapest cost from to the goal — the thing you do not know yet, the thing you are trying to compute. The condition is one inequality:
A heuristic satisfying equation (2) is admissible — optimistic, never pessimistic. Straight-line distance on a grid is admissible because no path can be shorter than a straight line; walls only ever make the real cost bigger. So the guess is always a lower bound, which is exactly what you need.
The proof intuition is short enough to carry in your head. Suppose A* is about to pop the goal with path cost , but a cheaper path of cost actually exists. That cheaper path passes through some node still sitting on the frontier — the start is settled, the goal is not yet, so somewhere along the cheaper route there is a first unsettled node. Assuming already carries its optimal cost-so-far, , its key is
The inequality is admissibility (); the last equality holds because lies on the optimal path, so its cost-so-far plus its true cost-to-go is exactly . So . But A* always pops the lowest first — it would have popped , and continued down the cheaper path, before ever popping the goal at cost . Contradiction. The goal is popped only at the true optimal cost.
Optimistic, not accurate
Admissibility is a one-sided constraint. The heuristic is allowed to be wildly low — is admissible, and that just gives you Dijkstra, slow but correct. What it must never do is guess high. The instant your heuristic overestimates even one node's cost-to-go, A* can settle the goal via a route it wrongly believed was cheap and walk past a better path it wrongly believed was expensive. You do not lose a little optimality — you lose the guarantee entirely. A heuristic that is "usually accurate" but occasionally optimistic is fine; one that is "usually accurate" but occasionally pessimistic is broken.
Consistency: settling each node exactly once
Admissibility buys the optimal path, but a slightly stronger property buys efficiency. A heuristic is consistent (or monotone) when the estimate never drops by more than the actual step cost between neighbours:
where is the cost of the edge from to a neighbour . This is the triangle inequality wearing a heuristic's clothes: the estimated distance from can't exceed one real step plus the estimate from where that step lands you. Consistency implies admissibility, and it gives you something concrete in return — along any path the values never decrease, so the first time A* settles a node it has already found the cheapest route to it. You never have to re-open a settled node. The closed set is final.
That is why a grid implementation can mark a cell closed and skip it forever after. Without consistency you would occasionally discover a cheaper route to an already-settled node and have to drag it back into the open set. The grid heuristics in the lab — Manhattan, Euclidean, Chebyshev — are all consistent, which is why the closed set in the visualisation only ever grows.
- Dijkstra
- ~1,100
- cells settled — floods every direction
- A* (w=1)
- ~280
- same optimal path, a quarter of the work
- Weighted A* (w=3)
- ~90
- fewer still, path now up to 3× longer
Those are representative counts for one mid-density maze — watch the expanded: readout in the lab's caption strip and you will see the same ordering hold on every fresh maze: Dijkstra dwarfs A*, and inflating the weight shrinks A* further. The exact numbers shift; the ranking never does.
Weighted A*: trading optimality for speed, visibly
Equation (2) gives you optimality. What if you do not need it? Real systems — game AI choosing a route inside a single frame, a robot replanning at 30 Hz — often want a good path fast more than the best path eventually. So you cheat, deliberately and by a known amount: inflate the heuristic by a weight and sort by .
The same dial, relabelled
The family is really one continuous dial, not three discrete algorithms. Send and the heuristic vanishes — you have Dijkstra. At you have optimal A*. As , becomes negligible and you have greedy best-first. "Dijkstra," "A*," and "greedy" are three labelled detents on a single weight knob. The lab exposes both ends: the weight slider sweeps the stretch (optimal A* up toward greedy), and the Algorithm dropdown snaps to the detents the slider can't reach — Dijkstra ( dropped entirely) and pure greedy ( dropped entirely).
Choosing h: match the heuristic to the moves
The heuristic is the whole game, so picking it well is most of the engineering. The constraint from equation (2) is sharp: must be a lower bound on real cost, which means it has to respect how the agent is actually allowed to move.
Flip the Heuristic control between the three options and watch the grid's connectivity change with it:
- Manhattan distance, , is the exact shortest distance on a 4-connected grid (no diagonals) with no walls. It is the tightest admissible heuristic there — A* with Manhattan on an open grid expands almost nothing but the path itself. Use it the instant diagonal moves are forbidden.
- Euclidean distance, , is the straight-line distance, and it is admissible on any grid: nothing physical beats a straight line. On an 8-connected grid where diagonals cost — which is what the lab charges — it is the honest lower bound, just a loose one, so it expands a touch more than it has to.
- Chebyshev distance, , counts the moves needed when a diagonal step costs the same as an orthogonal one. On the lab's grid each diagonal actually costs , so Chebyshev sits below the true cost — still admissible, just optimistic, which keeps the optimality guarantee intact while expanding a few more cells.
The rule under all three: a tighter admissible heuristic — one closer to without going over — expands fewer nodes. Pick Manhattan on an 8-connected grid and it overestimates (a diagonal shortcut beats the city-block count), breaking admissibility and the optimality guarantee with it. The lab enforces the safe pairing for you — choosing Manhattan switches the grid to 4-connected — precisely because mismatching the heuristic to the move set is the classic A* bug.
The heuristic is the only knob that matters
Everything else about A* is bookkeeping you can copy from a textbook: a binary heap, parent pointers, a closed set. The heuristic is the only place real judgement lives. A weak one wastes effort like Dijkstra; an inadmissible one silently returns wrong answers; a tight, admissible, consistent one gives you the provably shortest path for the least possible work. When A* underperforms, the heuristic is almost always why.
The same pattern, everywhere
A* is the canonical instance of informed search — the idea that an estimate of cost-to-go should steer exploration. Strip away the grid and the pattern reappears across computing. Network routing protocols run Dijkstra's algorithm (A* with ) to build forwarding tables; they skip the heuristic only because a router has no useful geometric guess about the rest of the internet. Game and robotics planners run weighted A* to hit a frame budget, trading a bounded slice of optimality for a guaranteed deadline.
The deeper relative is dynamic programming (DP). The relaxation step at the heart of every variant — "if going through gives a cheaper route to , update " — is the Bellman optimality equation, evaluated lazily in best-first order:
Dijkstra solves equation (4) by always expanding the node whose value is already final. A* solves the same equation with a heuristic that reorders which node you finalise next, so you reach the goal's value before computing everyone else's. The heuristic does not change what you compute — the optimal cost-to-come — only the order. That is why the admissibility proof works: an optimistic guess can reorder the work without corrupting the answer.
Seen this way, A* is not a pathfinding trick. It is a statement that a single optimistic lower bound on the future lets you solve a shortest-path problem in goal-directed order instead of breadth-first order — as true on a road network, a puzzle state graph, or a robot's configuration space as on the grid in the lab. Open it again, set the algorithm back to A*, and watch the cone form. You now know the cone is the heuristic, and the heuristic is the whole game.
Reading further
- Hart, Nilsson & Raphael (1968), A Formal Basis for the Heuristic Determination of Minimum Cost Paths — the original A* paper; short, and the source of the admissibility and optimality proofs sketched above.
- Russell & Norvig, Artificial Intelligence: A Modern Approach, ch. 3 (Informed Search) — the canonical textbook treatment of , admissibility versus consistency, and where A* sits in the search family.
- Dijkstra (1959), A Note on Two Problems in Connexion with Graphs — the two-and-a-half-page paper that defines the uniform-cost baseline A* generalises.
- Amit Patel, Introduction to A* (Red Blob Games) — the definitive interactive teaching case; the best place to build the same intuition this post argues for, with knobs of its own.
Try it in the lab
All effects →Gradient Descent
aiSGD, Momentum, RMSProp, and Adam racing down a loss landscape — ravines, saddles, and local minima.
optimizationdeep-learningtrainingOrbits
artBodies tracing a shared circle, leaving light trails.
trailstrigA* Pathfinder
aiA*, Dijkstra, and greedy best-first search — the heuristic pulling the frontier toward the goal.
searchgraphsa-star
More from the blog
How Python Dicts Really Work: Open Addressing, Probing, and the djb2 Lie
The dict is the most-used data structure in Python and almost nobody knows it uses open addressing, not chaining — so a single deletion leaves a tombstone, and a simple hash table is really a careful dance between load factor, probe sequences, and cache lines.
B-Trees vs LSM-Trees: The Two Religions of On-Disk Data
Every database you use bets on one of two storage engines: B-trees (read-optimised, update-in-place) or LSM-trees (write-optimised, append-and-compact). The choice isn't about speed but about which kind of amplification you're willing to pay.
Attention, From the Inside Out
Attention is just a weighted average whose weights the data computes by asking itself questions. A worked tour through scaled dot-product attention, temperature and sampling, and what a representative 46B-active / 1T-total Mixture-of-Experts spec actually means — with live matrices you can poke.