Theory¶

Minimax with strict tie-breaks¶

We solve states from the perspective of the side-to-move using exact minimax with memoization. The value v ∈ {+1, 0, −1} denotes win/draw/loss under perfect play. Among moves with equal v we break ties by distance-to-terminal (plies):

Win ≻ Draw ≻ Loss
Among wins/draws: prefer the shortest plies-to-end
Among losses: prefer the longest plies-to-end (delay the loss)

This induces a total preorder over legal actions. Our solver returns per-action q_values ∈ {+1,0,−1} and dtt_action (distance-to-terminal after taking the action), and the set of optimal_moves obeying the policy above.

Symmetry group (D4) and canonicalization¶

Tic-tac-toe on a 3×3 board admits the dihedral-4 (D4) symmetry group with 8 elements: {id, rot90, rot180, rot270, hflip, vflip, d1, d2}. We precompute index mappings for each transformation and use them to:

Canonicalize states by selecting the lexicographically smallest image across all 8 transforms.
Remap action indices under symmetry for augmentation and invariance checks.

For a board b and transform g ∈ D4, let g·b be the transformed board and g·a the transformed action index. The solver’s value is invariant: V(g·b) = V(b); optimal moves map equivariantly: g·ArgMax(b) = ArgMax(g·b).

Complexity and state graph¶

From the empty board, 5478 states are reachable (4520 nonterminal; terminal split: X=626, O=316, draw=16). This is a tiny graph and the exact backward induction + memoization completes in milliseconds on commodity hardware. Our features and datasets are O(#states) and fit comfortably in memory.