Software

17 JUNE 2026· 5 min read · Ben Ebsworth

A* Search, Visually: the Heuristic Is the Whole Game

A* is not a clever algorithm so much as Dijkstra plus a bet about the future. The same code becomes Dijkstra, greedy best-first, or A* depending on one term in the priority key — and admissibility is the single property that buys optimality.

softwarealgorithmscomputer scienceai

A* (pronounced "A-star") gets taught as a brand-new algorithm — a clever specialised thing you reach for when Dijkstra is too slow. That framing hides what is going on. A* is Dijkstra plus a bet about the future. You keep the exact same priority queue, the exact same expand-the-cheapest-node loop, and you add one term to the score: a guess at how far each node still has to go. Change that one term and the same code becomes Dijkstra, greedy best-first search, or A*. Nothing else moves.

The baseline: Dijkstra is search with no opinion

Start from the algorithm with no bet at all. Edsger Dijkstra's 1959 shortest-path algorithm, run from a single source toward a goal, is uniform-cost search: a priority queue of frontier nodes, each keyed by $g(n)$ — the cheapest cost found so far to reach node $n$ from the start. You pop the cheapest node, mark it settled, and relax its neighbours: if going through $n$ gives a cheaper route to a neighbour $m$ , you record the new $g(m)$ and push $m$ onto the queue.

The key word is cheapest. Because you always settle the lowest- $g$ frontier node, you settle nodes in strict order of distance from the start. The closed set grows as a wavefront — every node at cost $r$ is settled before any node at cost $r+1$ .

Dijkstra is A* with a blindfold

Both algorithms pop the lowest-keyed node from a priority queue and relax its neighbours. The only difference is the key. Dijkstra sorts by "how far I have come." A* sorts by "how far I have come, plus how far I think I still have to go." Give A* a heuristic that always returns zero and it becomes Dijkstra, byte for byte. There is no separate Dijkstra in the lab — one search loop, three score functions.

The bet: add a guess, get f = g + h

Here is the move. To each node, attach a second number $h(n)$ — a heuristic, an estimate of the remaining cost from $n$ to the goal. On a grid you get one for free: the straight-line or city-block distance to the goal, ignoring walls. Then sort the queue not by $g$ alone but by their sum.

f(n) = g(n) + h(n)

(1)The A* priority key: cost-so-far plus estimated cost-to-go.

That is the whole of A*. $g(n)$ is what you have spent; $h(n)$ is what you bet you still owe; $f(n)$ is your best current guess at the total cost of the cheapest path through $n$ . Pop the lowest $f$ , settle it, relax neighbours, repeat. Equation (1) is the most consequential plus sign in search — every property below follows from it.

Read the two endpoints of that sum and you recover the whole family:

Set $h(n) = 0$ everywhere. Then $f = g$ , and you are back to Dijkstra. The bet is "I have no idea," so the search has no opinion.
Ignore $g$ entirely and sort by $f = h$ . Now you are doing greedy best-first search — always lunging at whichever frontier cell looks closest to the goal, never accounting for the cost already paid. It is fast and direct, and it cheerfully walks into dead-ends and detours because it never asks what a step cost.

A* is not a new algorithm. It is a dial between caution and ambition, and the heuristic is the dial.

Set the Algorithm control to Greedy in either lab above. The frontier streaks at the goal even harder than A* — and on a cluttered maze you will catch it returning a visibly kinked, longer path. It got there fast; it did not get there cheaply. A* sits between the extremes, weighting "where I've been" against "where I'm going," and that balance is what lets it be both fast and correct. But "correct" is a promise, and the promise has a precondition.

Why admissibility buys optimality

A* returns the genuinely shortest path only when the heuristic obeys one rule: it must never overestimate the true remaining cost. Call $h^*(n)$ the actual cheapest cost from $n$ to the goal — the thing you do not know yet, the thing you are trying to compute. The condition is one inequality:

h(n) \le h^*(n) \quad \text{for every node } n

(2)Admissibility: the heuristic never overestimates the true cost-to-go. This single inequality is what makes A* optimal.

A heuristic satisfying equation (2) is admissible — optimistic, never pessimistic. Straight-line distance on a grid is admissible because no path can be shorter than a straight line; walls only ever make the real cost bigger. So the guess is always a lower bound, which is exactly what you need.

The proof intuition is short enough to carry in your head. Suppose A* is about to pop the goal with path cost $C$ , but a cheaper path of cost $C^* < C$ actually exists. That cheaper path passes through some node $n$ still sitting on the frontier — the start is settled, the goal is not yet, so somewhere along the cheaper route there is a first unsettled node. Assuming $n$ already carries its optimal cost-so-far, $g(n) = g^*(n)$ , its key is

f(n) = g(n) + h(n) \le g(n) + h^*(n) = C^*

The inequality is admissibility ( $h \le h^*$ ); the last equality holds because $n$ lies on the optimal path, so its cost-so-far plus its true cost-to-go is exactly $C^*$ . So $f(n) \le C^* < C$ . But A* always pops the lowest $f$ first — it would have popped $n$ , and continued down the cheaper path, before ever popping the goal at cost $C$ . Contradiction. The goal is popped only at the true optimal cost.

Optimistic, not accurate

Admissibility is a one-sided constraint. The heuristic is allowed to be wildly low — $h(n) = 0$ is admissible, and that just gives you Dijkstra, slow but correct. What it must never do is guess high. The instant your heuristic overestimates even one node's cost-to-go, A* can settle the goal via a route it wrongly believed was cheap and walk past a better path it wrongly believed was expensive. You do not lose a little optimality — you lose the guarantee entirely. A heuristic that is "usually accurate" but occasionally optimistic is fine; one that is "usually accurate" but occasionally pessimistic is broken.

Consistency: settling each node exactly once

Admissibility buys the optimal path, but a slightly stronger property buys efficiency. A heuristic is consistent (or monotone) when the estimate never drops by more than the actual step cost between neighbours:

h(n) \le c(n, n') + h(n')

(3)Consistency (the triangle inequality for heuristics): the estimate can't fall faster than you actually travel.

where $c(n, n')$ is the cost of the edge from $n$ to a neighbour $n'$ . This is the triangle inequality wearing a heuristic's clothes: the estimated distance from $n$ can't exceed one real step plus the estimate from where that step lands you. Consistency implies admissibility, and it gives you something concrete in return — along any path the $f$ values never decrease, so the first time A* settles a node it has already found the cheapest route to it. You never have to re-open a settled node. The closed set is final.

That is why a grid implementation can mark a cell closed and skip it forever after. Without consistency you would occasionally discover a cheaper route to an already-settled node and have to drag it back into the open set. The grid heuristics in the lab — Manhattan, Euclidean, Chebyshev — are all consistent, which is why the closed set in the visualisation only ever grows.

Dijkstra: ~1,100; cells settled — floods every direction
A* (w=1): ~280; same optimal path, a quarter of the work
Weighted A* (w=3): ~90; fewer still, path now up to 3× longer

Those are representative counts for one mid-density maze — watch the expanded: readout in the lab's caption strip and you will see the same ordering hold on every fresh maze: Dijkstra dwarfs A*, and inflating the weight shrinks A* further. The exact numbers shift; the ranking never does.

Weighted A*: trading optimality for speed, visibly

Equation (2) gives you optimality. What if you do not need it? Real systems — game AI choosing a route inside a single frame, a robot replanning at 30 Hz — often want a good path fast more than the best path eventually. So you cheat, deliberately and by a known amount: inflate the heuristic by a weight $w \ge 1$ and sort by $f = g + w \cdot h$ .

The same dial, relabelled

The family is really one continuous dial, not three discrete algorithms. Send $w \to 0$ and the heuristic vanishes — you have Dijkstra. At $w = 1$ you have optimal A*. As $w \to \infty$ , $g$ becomes negligible and you have greedy best-first. "Dijkstra," "A*," and "greedy" are three labelled detents on a single weight knob. The lab exposes both ends: the weight slider sweeps the $1 \le w \le 4$ stretch (optimal A* up toward greedy), and the Algorithm dropdown snaps to the detents the slider can't reach — Dijkstra ( $h$ dropped entirely) and pure greedy ( $g$ dropped entirely).

Choosing h: match the heuristic to the moves

The heuristic is the whole game, so picking it well is most of the engineering. The constraint from equation (2) is sharp: $h$ must be a lower bound on real cost, which means it has to respect how the agent is actually allowed to move.

Flip the Heuristic control between the three options and watch the grid's connectivity change with it:

Manhattan distance, $|\Delta x| + |\Delta y|$ , is the exact shortest distance on a 4-connected grid (no diagonals) with no walls. It is the tightest admissible heuristic there — A* with Manhattan on an open grid expands almost nothing but the path itself. Use it the instant diagonal moves are forbidden.
Euclidean distance, $\sqrt{\Delta x^2 + \Delta y^2}$ , is the straight-line distance, and it is admissible on any grid: nothing physical beats a straight line. On an 8-connected grid where diagonals cost $\sqrt{2}$ — which is what the lab charges — it is the honest lower bound, just a loose one, so it expands a touch more than it has to.
Chebyshev distance, $\max(|\Delta x|, |\Delta y|)$ , counts the moves needed when a diagonal step costs the same as an orthogonal one. On the lab's grid each diagonal actually costs $\sqrt{2}$ , so Chebyshev sits below the true cost — still admissible, just optimistic, which keeps the optimality guarantee intact while expanding a few more cells.

The rule under all three: a tighter admissible heuristic — one closer to $h^*$ without going over — expands fewer nodes. Pick Manhattan on an 8-connected grid and it overestimates (a diagonal shortcut beats the city-block count), breaking admissibility and the optimality guarantee with it. The lab enforces the safe pairing for you — choosing Manhattan switches the grid to 4-connected — precisely because mismatching the heuristic to the move set is the classic A* bug.

The heuristic is the only knob that matters

Everything else about A* is bookkeeping you can copy from a textbook: a binary heap, parent pointers, a closed set. The heuristic is the only place real judgement lives. A weak one wastes effort like Dijkstra; an inadmissible one silently returns wrong answers; a tight, admissible, consistent one gives you the provably shortest path for the least possible work. When A* underperforms, the heuristic is almost always why.

The same pattern, everywhere

A* is the canonical instance of informed search — the idea that an estimate of cost-to-go should steer exploration. Strip away the grid and the pattern reappears across computing. Network routing protocols run Dijkstra's algorithm (A* with $h = 0$ ) to build forwarding tables; they skip the heuristic only because a router has no useful geometric guess about the rest of the internet. Game and robotics planners run weighted A* to hit a frame budget, trading a bounded slice of optimality for a guaranteed deadline.

The deeper relative is dynamic programming (DP). The relaxation step at the heart of every variant — "if going through $n$ gives a cheaper route to $m$ , update $m$ " — is the Bellman optimality equation, evaluated lazily in best-first order:

g(m) = \min_{n} \big[\, g(n) + c(n, m) \,\big]

(4)The Bellman optimality equation for cost-to-come: A* and Dijkstra both solve this, differing only in the order they finalise nodes.

Dijkstra solves equation (4) by always expanding the node whose value is already final. A* solves the same equation with a heuristic that reorders which node you finalise next, so you reach the goal's value before computing everyone else's. The heuristic does not change what you compute — the optimal cost-to-come — only the order. That is why the admissibility proof works: an optimistic guess can reorder the work without corrupting the answer.

Seen this way, A* is not a pathfinding trick. It is a statement that a single optimistic lower bound on the future lets you solve a shortest-path problem in goal-directed order instead of breadth-first order — as true on a road network, a puzzle state graph, or a robot's configuration space as on the grid in the lab. Open it again, set the algorithm back to A*, and watch the cone form. You now know the cone is the heuristic, and the heuristic is the whole game.

Reading further

Hart, Nilsson & Raphael (1968), A Formal Basis for the Heuristic Determination of Minimum Cost Paths — the original A* paper; short, and the source of the admissibility and optimality proofs sketched above.
Russell & Norvig, Artificial Intelligence: A Modern Approach, ch. 3 (Informed Search) — the canonical textbook treatment of $f = g + h$ , admissibility versus consistency, and where A* sits in the search family.
Dijkstra (1959), A Note on Two Problems in Connexion with Graphs — the two-and-a-half-page paper that defines the uniform-cost baseline A* generalises.
Amit Patel, Introduction to A* (Red Blob Games) — the definitive interactive teaching case; the best place to build the same intuition this post argues for, with knobs of its own.

Try it in the lab

All effects →

More from the blog

Software17 JUNE 2026

How Python Dicts Really Work: Open Addressing, Probing, and the djb2 Lie

The dict is the most-used data structure in Python and almost nobody knows it uses open addressing, not chaining — so a single deletion leaves a tombstone, and a simple hash table is really a careful dance between load factor, probe sequences, and cache lines.

Software17 JUNE 2026

B-Trees vs LSM-Trees: The Two Religions of On-Disk Data

Every database you use bets on one of two storage engines: B-trees (read-optimised, update-in-place) or LSM-trees (write-optimised, append-and-compact). The choice isn't about speed but about which kind of amplification you're willing to pay.

Algorithms14 JUNE 2026

Attention, From the Inside Out

Attention is just a weighted average whose weights the data computes by asking itself questions. A worked tour through scaled dot-product attention, temperature and sampling, and what a representative 46B-active / 1T-total Mixture-of-Experts spec actually means — with live matrices you can poke.

ben ebsworth

Software

17 JUNE 2026· 5 min read · Ben Ebsworth

A* Search, Visually: the Heuristic Is the Whole Game

softwarealgorithmscomputer scienceai

The baseline: Dijkstra is search with no opinion

Dijkstra is A* with a blindfold

The bet: add a guess, get f = g + h

f(n) = g(n) + h(n)

(1)The A* priority key: cost-so-far plus estimated cost-to-go.

Read the two endpoints of that sum and you recover the whole family:

Set $h(n) = 0$ everywhere. Then $f = g$ , and you are back to Dijkstra. The bet is "I have no idea," so the search has no opinion.
Ignore $g$ entirely and sort by $f = h$ . Now you are doing greedy best-first search — always lunging at whichever frontier cell looks closest to the goal, never accounting for the cost already paid. It is fast and direct, and it cheerfully walks into dead-ends and detours because it never asks what a step cost.

A* is not a new algorithm. It is a dial between caution and ambition, and the heuristic is the dial.

Why admissibility buys optimality

h(n) \le h^*(n) \quad \text{for every node } n

(2)Admissibility: the heuristic never overestimates the true cost-to-go. This single inequality is what makes A* optimal.

f(n) = g(n) + h(n) \le g(n) + h^*(n) = C^*

Optimistic, not accurate

Consistency: settling each node exactly once

h(n) \le c(n, n') + h(n')

(3)Consistency (the triangle inequality for heuristics): the estimate can't fall faster than you actually travel.

Dijkstra: ~1,100; cells settled — floods every direction
A* (w=1): ~280; same optimal path, a quarter of the work
Weighted A* (w=3): ~90; fewer still, path now up to 3× longer

Weighted A*: trading optimality for speed, visibly

The same dial, relabelled

Choosing h: match the heuristic to the moves

Flip the Heuristic control between the three options and watch the grid's connectivity change with it:

Manhattan distance, $|\Delta x| + |\Delta y|$ , is the exact shortest distance on a 4-connected grid (no diagonals) with no walls. It is the tightest admissible heuristic there — A* with Manhattan on an open grid expands almost nothing but the path itself. Use it the instant diagonal moves are forbidden.
Euclidean distance, $\sqrt{\Delta x^2 + \Delta y^2}$ , is the straight-line distance, and it is admissible on any grid: nothing physical beats a straight line. On an 8-connected grid where diagonals cost $\sqrt{2}$ — which is what the lab charges — it is the honest lower bound, just a loose one, so it expands a touch more than it has to.
Chebyshev distance, $\max(|\Delta x|, |\Delta y|)$ , counts the moves needed when a diagonal step costs the same as an orthogonal one. On the lab's grid each diagonal actually costs $\sqrt{2}$ , so Chebyshev sits below the true cost — still admissible, just optimistic, which keeps the optimality guarantee intact while expanding a few more cells.

The heuristic is the only knob that matters

The same pattern, everywhere

g(m) = \min_{n} \big[\, g(n) + c(n, m) \,\big]

(4)The Bellman optimality equation for cost-to-come: A* and Dijkstra both solve this, differing only in the order they finalise nodes.

Reading further

Hart, Nilsson & Raphael (1968), A Formal Basis for the Heuristic Determination of Minimum Cost Paths — the original A* paper; short, and the source of the admissibility and optimality proofs sketched above.
Russell & Norvig, Artificial Intelligence: A Modern Approach, ch. 3 (Informed Search) — the canonical textbook treatment of $f = g + h$ , admissibility versus consistency, and where A* sits in the search family.
Dijkstra (1959), A Note on Two Problems in Connexion with Graphs — the two-and-a-half-page paper that defines the uniform-cost baseline A* generalises.
Amit Patel, Introduction to A* (Red Blob Games) — the definitive interactive teaching case; the best place to build the same intuition this post argues for, with knobs of its own.

Try it in the lab

All effects →

A* Search, Visually: the Heuristic Is the Whole Game

The baseline: Dijkstra is search with no opinion

The bet: add a guess, get f = g + h

Why admissibility buys optimality

Consistency: settling each node exactly once

Weighted A*: trading optimality for speed, visibly

Choosing h: match the heuristic to the moves

The same pattern, everywhere

Reading further

Try it in the lab

Gradient Descent

Orbits

A* Pathfinder

More from the blog

How Python Dicts Really Work: Open Addressing, Probing, and the djb2 Lie

B-Trees vs LSM-Trees: The Two Religions of On-Disk Data

Attention, From the Inside Out

A* Search, Visually: the Heuristic Is the Whole Game

The baseline: Dijkstra is search with no opinion

The bet: add a guess, get f = g + h

Why admissibility buys optimality

Consistency: settling each node exactly once

Weighted A*: trading optimality for speed, visibly

Choosing h: match the heuristic to the moves

The same pattern, everywhere

Reading further

Try it in the lab

Gradient Descent

Orbits

A* Pathfinder

More from the blog

How Python Dicts Really Work: Open Addressing, Probing, and the djb2 Lie

B-Trees vs LSM-Trees: The Two Religions of On-Disk Data

Attention, From the Inside Out