Kociemba's Algorithm: The Two-Phase Breakthrough #PID1.5

March 18, 2025

PID PID1 RubiksCubeSolver Kociemba algorithm two-phase solver pruning tables optimal solutions phase 1 phase 2 search space reduction lookup tables runtime analysis implementation performance tuning solver comparison

Thistlethwaite’s algorithm, covered in the previous post, established the foundational idea: partition the cube group G into a chain of nested subgroups and solve by moving through them in sequence. Four phases, each targeting a smaller subgroup, each using precomputed tables. The result is guaranteed solutions in at most 52 moves.

Herbert Kociemba’s 1992 insight was that four phases were unnecessary. The same structure – subgroup reduction, then IDA* search with pattern databases – works in two phases if you choose the intermediate subgroup carefully. The result is solutions averaging around 20 moves, computed in milliseconds. This is the algorithm used in virtually every practical cube solver today.

The Intermediate Subgroup H

Everything in Kociemba’s algorithm depends on the choice of the intermediate subgroup H. In Thistlethwaite’s notation this would be written around G_2 to G_3, but Kociemba’s H enforces all three of the following conditions simultaneously:

All 12 edges are correctly oriented (EO = 0 for all edges).
All 8 corners are correctly oriented (CO = 0 for all corners).
The four UD-slice edges {UF, UR, UB, UL} are all located within the middle layer (they may be permuted among themselves but none may occupy a top-layer or bottom-layer position).

The order of H is:

$$ |H| = 4! \times \frac{8!}{2} \times \frac{8!}{2} = 24 \times 20{,}160 \times 40{,}320 = 19{,}508{,}428{,}800 $$

The three factors correspond to: 4! arrangements of the UD-slice edges among themselves; 8!/2 arrangements of the 8 non-slice edges (the parity constraint within H restricts to even permutations); and 8! arrangements of the corners (all orientations are 0 within H, so only permutation varies). |H| is approximately 19.5 billion, which is |G| / 2,217,093,120 – the ratio |G|/|H| equals the size of Phase 1’s search space, for a reason that becomes clear below.

A critical property of H: within H, the cube can be solved using only the restricted generator set:

$$ S_2 = \{U, U', U2, D, D', D2, R, R', R2, L, L', L2, F2, B2\} $$

Quarter-turns of F and B are excluded because they disrupt edge orientation (condition 1) and corner orientation (condition 2). The restricted set has 14 generators, and the diameter of H under S_2 – God’s Number for the subgroup – is 18 moves.

The index [G : H] = |G| / |H| = 43,252,003,274,489,856,000 / 19,508,428,800 = 2,217,093,120. This index is the size of the coset space G/H, and it equals the Phase 1 search space.

Phase 1: Reaching H

Phase 1 finds the shortest sequence of moves (from the full 18-generator set) that takes the input state s into H. Membership in H is defined by three conditions, giving three coordinates:

Edge orientation (EO). Each of the 12 edges has an orientation bit: 0 (correct) or 1 (flipped). The parity constraint forces the sum of all 12 bits to be 0 mod 2, so only 11 bits are independent. The EO coordinate is an integer in [0, 2047] encoding these 11 bits. Phase 1 must bring EO to 0.

Corner orientation (CO). Each of the 8 corners has an orientation value in {0, 1, 2} representing the three possible twist states. The parity constraint forces the sum of all 8 values to be 0 mod 3, leaving 7 independent values. The CO coordinate is an integer in [0, 2186] encoding the 7-digit base-3 number. Phase 1 must bring CO to 0.

UD-slice edge membership (UD). The four UD-slice edges must occupy some 4 of the 12 edge positions. Their exact permutation within those 4 positions does not matter yet; only which 4 positions are occupied matters. This is a choice of 4 positions from 12, giving C(12, 4) = 495 possible configurations. The UD coordinate is an integer in [0, 494]. Phase 1 must bring UD to the specific value corresponding to all four slice edges in the middle slice.

The Phase 1 state space has cardinality:

$$ 2048 \times 2187 \times 495 = 2{,}217{,}093{,}120 $$

This equals [G : H] = |G| / |H|, confirming that the three coordinates exactly parameterize the coset space G/H: each distinct (EO, CO, UD) triple corresponds to a distinct coset of H in G.

The Phase 1 heuristic is a pattern database indexed by (EO, CO, UD). Building it requires a backward BFS from all H-states (states with EO=0, CO=0, UD=target) through the full 18-generator move set. Each BFS level d records, for each (EO, CO, UD) configuration first reached at level d, the value d as the minimum Phase 1 move count. Because the coordinates are independent and the BFS is exact, this table stores h_1(s) = d*(s, H), the exact minimum number of moves to enter H from s.

The full joint table of 2.2 billion entries requires approximately 1.1 GB if 4 bits per entry are used (Phase 1 solutions are always <= 12 moves, so values fit in 4 bits). For memory-constrained implementations, two smaller tables are used instead:

Table_EO_UD: indexed by (EO, UD), size 2048 * 495 = 1,013,760 entries (~500 KB).
Table_CO_UD: indexed by (CO, UD), size 2187 * 495 = 1,082,565 entries (~530 KB).

The combined heuristic is max(Table_EO_UD[EO, UD], Table_CO_UD[CO, UD]). This is weaker than the full joint table (it may underestimate more) but admissible and manageable in memory.

Phase 1 IDA* uses this heuristic to prune: from any state with coordinates (EO, CO, UD) and current move count g, if g + h_1(EO, CO, UD) > threshold, prune the subtree. Typical Phase 1 solution lengths are 7 to 12 moves.

Phase 2: Solving within H

Once the cube is in H, Phase 2 solves it using only S_2 = {U, U’, U2, D, D’, D2, R, R’, R2, L, L’, L2, F2, B2}. This generator set preserves membership in H: none of the 14 generators disrupt edge orientation, corner orientation, or UD-slice containment.

Within H, the cube state is fully described by three coordinates:

Corner permutation (CP). All 8 corners have orientation 0 (correct, by H membership), so only their permutation matters. The number of distinct permutations is 8! = 40,320. CP is an integer in [0, 40319], encoded using the Lehmer code of the permutation.

UD-slice edge permutation (EP_UD). The 4 UD-slice edges are in the middle layer (by H membership) and can be arranged in 4! = 24 ways. EP_UD is an integer in [0, 23].

Non-slice edge permutation (EP_NS). The remaining 8 edges (4 in the U layer, 4 in the D layer) can be arranged among those 8 positions. Within H, the parity constraint restricts these to even permutations, giving 8!/2 = 20,160 configurations. EP_NS is an integer in [0, 20159].

The Phase 2 state space has size:

$$ 40{,}320 \times 24 \times 20{,}160 = 19{,}508{,}428{,}800 = |H| $$

as expected. The three coordinates together uniquely identify every state in H.

Two precomputed tables guide Phase 2 IDA*:

Table_CP. For each of the 40,320 corner permutations, the minimum number of S_2 moves to solve all corners (reach CP = identity). Built by backward BFS over the corner permutation subspace under S_2. Size: 40,320 entries (~40 KB).

Table_EP. For each of 24 * 20,160 = 483,840 combined edge permutation states (EP_UD, EP_NS), the minimum number of S_2 moves to solve both edge groups simultaneously. Built by backward BFS over the combined edge subspace under S_2. Size: 483,840 entries (~500 KB).

The combined Phase 2 heuristic is:

$$ h_2(s) = \max(\text{Table\_CP}[CP],\; \text{Table\_EP}[EP\_UD, EP\_NS]) $$

Admissibility holds because each individual table is computed by exact BFS and therefore never overestimates. Taking the max preserves admissibility while tightening the bound.

Phase 2 IDA* finds the shortest S_2 sequence bringing the cube from its current H-state to the identity. Typical Phase 2 solution lengths are 8 to 13 moves.

Two-Phase IDA* and Total Move Count

The naive approach – run Phase 1 to find any Phase 1 solution, then run Phase 2 to solve from there – produces valid solutions but not short ones. A Phase 1 solution of length 12 followed by a Phase 2 solution of length 13 gives 25 moves total, exceeding God’s Number by 5.

The key optimization is coordinating the two phases under a shared total budget. The algorithm proceeds as follows:

Compute an initial threshold T = h_1(start).
For each Phase 1 solution of length k (found by IDA* with threshold T - current Phase 2 lower bound): a. Apply the length-k Phase 1 move sequence to get state s’ in H. b. Run Phase 2 IDA* with depth limit T - k. c. If Phase 2 finds a solution, accept the combined (k + Phase2 length)-move solution. d. If not, continue searching for other Phase 1 solutions of length k, or increment k.
If no solution is found within threshold T, increment T and restart.

The coupling between phases is: for a fixed total budget T, the Phase 2 budget is T - k where k is the Phase 1 solution length. Shorter Phase 1 solutions leave more budget for Phase 2. Longer Phase 1 solutions may reach H in a more favorable state for Phase 2, but at higher Phase 1 cost. The joint search explores this tradeoff.

The IDA* threshold sequence T = h_1(start), h_1(start)+1, h_1(start)+2, … guarantees that the first solution found within any budget T has total length at most T. As T increases, progressively longer solutions become acceptable. In practice, the joint search finds solutions of 18 to 22 moves for typical scrambled states within 0.1 to 1 second on modern hardware.

Coordinate Representation and Move Tables

A practical Kociemba implementation does not manipulate the full sticker or facelet array during search. Instead, each cube state is represented as a 6-tuple of integers:

Phase 1: (flip, twist, slice)
           [0,2047] [0,2186] [0,494]

Phase 2: (corner_perm, edge4_perm, edge8_perm)
           [0,40319]  [0,23]   [0,20159]

For each coordinate c and each generator m in the relevant move set, a move table stores the new coordinate value after applying m:

move_flip[flip][m]          -> new_flip        (2048 * 18 entries)
move_twist[twist][m]        -> new_twist       (2187 * 18 entries)
move_slice[slice][m]        -> new_slice       (495 * 18 entries)
move_cp[corner_perm][m]     -> new_corner_perm (40320 * 14 entries)
move_ep4[edge4_perm][m]     -> new_edge4_perm  (24 * 14 entries)
move_ep8[edge8_perm][m]     -> new_edge8_perm  (20160 * 14 entries)

Applying a move during IDA* search is six table lookups and no geometric computation. The search loop processes states as 6-tuples of integers, evaluates the heuristic via two more table lookups, and recurses. On a modern CPU, this loop evaluates several million states per second on a single core.

The total precomputed table memory:

Table	Entries	Size
move_flip	2048 * 18	~36 KB
move_twist	2187 * 18	~38 KB
move_slice	495 * 18	~9 KB
move_cp	40320 * 14	~560 KB
move_ep4	24 * 14	<1 KB
move_ep8	20160 * 14	~280 KB
Phase 1 heuristic (split)	~2M	~1 MB
Table_CP	40320	~40 KB
Table_EP	483840	~500 KB
Total		~2.5 MB

With the full 2.2-billion-entry Phase 1 joint table, total memory rises to about 1.1 GB. The split-table approximation keeps memory around 3 to 50 MB depending on implementation choices, which fits comfortably in cache.

Symmetry Reduction

The rotation and reflection symmetry group of the cube has 48 elements. Two cube states related by a symmetry are algorithmically equivalent: if you know the solution to s, you can compute the solution to any symmetric image of s by conjugating the move sequence by the corresponding symmetry element.

Formally: let phi be a symmetry of the cube (a rotation or reflection), and let s’ = phi * s * phi^{-1} be the image of state s under phi. If m_1 m_2 … m_k solves s, then (phi m_1 phi^{-1})(phi m_2 phi^{-1})…(phi m_k phi^{-1}) solves s’, where phi m phi^{-1} is the conjugated generator.

This allows the pattern databases to be indexed by symmetry equivalence classes rather than individual states. The cube group G with its 48-element symmetry group S_{48} partitions into equivalence classes of average size 48 (some states are more symmetric and have smaller classes). This reduces:

The Phase 1 state space from 2,217,093,120 to approximately 138,000,000 equivalence classes (reduction by factor 16, using only the 16 symmetries that preserve the UD axis).
The Phase 1 heuristic table from ~1.1 GB to ~70 MB for the joint table, making it feasible to precompute and store.

The 16-symmetry reduction uses only the subgroup of S_{48} that preserves the UD axis (which Phase 1 treats specially due to the UD-slice edge condition). The full 48-element symmetry group gives a stronger reduction but requires more complex coordinate computation.

Symmetry reduction is a practical optimization, not required for correctness. Implementations without it still produce correct results with the same solution quality; they use more memory and run somewhat slower.

Implementation Checklist

A working Kociemba implementation requires these components in order:

Cube state representation. Arrays cp[8], co[8], ep[12], eo[12] for corner permutation, corner orientation, edge permutation, and edge orientation. The encoding convention (which face is which, which cubie index corresponds to which physical position) must be fixed and consistent.

Coordinate encoding functions. Six functions converting the full state representation to each coordinate integer. These are used once per solve (at initialization) and during precomputation. They must be inverses of the decoding functions.

Move application on full state. Functions applying each of the 18 (or 14) generators to the full state representation. Used during precomputation. Not used during search (where move tables are used instead).

Move table precomputation. For each coordinate and each generator: encode the starting coordinate, apply the move to a reference cube state, encode the resulting coordinate, store. This is done once at startup.

Heuristic table precomputation. BFS from the solved state over the relevant coordinate subspace, recording minimum distances. Phase 2 tables (Table_CP and Table_EP) complete in seconds. Phase 1 tables take longer; the split tables take minutes, the full joint table can take hours.

Phase 1 IDA.* Recursive DFS with f-cost threshold: at each node (flip, twist, slice) with current depth g, if g + h_1(flip, twist, slice) > threshold, return the minimum exceeded f-value for threshold update. Otherwise, apply each of the 18 moves (using move tables), prune immediate inverses and redundant moves (e.g., U followed by U’ or two consecutive U2), and recurse.

Phase 2 IDA.* Same structure, operating on (corner_perm, edge4_perm, edge8_perm) with the 14-generator restricted set and h_2 heuristic.

Two-phase coordination. For each Phase 1 solution found (each time IDA* reaches a state with flip=0, twist=0, slice=target), call Phase 2 with remaining budget T - k. Track the best combined solution found.

Correctness testing. Test each component independently: verify coordinate encoding/decoding are mutual inverses; verify move tables by checking that applying each generator and then its inverse returns the original coordinate; verify heuristic tables by checking consistency with BFS at small depths; verify the full solver against known solutions for specific test states.

Performance in Practice

Benchmark data from reference implementations on modern hardware:

Target solution length	Avg search time	Notes
<= 22 moves (first found)	0.01 to 0.05 s	Most states
<= 21 moves	0.05 to 0.2 s	Near-optimal in practice
<= 20 moves	0.2 to 2 s	God’s Number bound
<= 19 moves	5 to 30 s	Most states solvable in 19
Optimal (18-20 moves)	Minutes to hours	Rarely needed

Search time for shorter solutions grows sharply near the optimum. This is because the IDA* search tree at depth d grows as roughly b^d (b ≈ 13 after move pruning), and decreasing the target length by one requires the search to prove that no shorter solution exists, which means exploring the full tree to the previous depth limit.

For a robot cube solver, 21 moves in 50 milliseconds is a reasonable operating point. The 50 ms search time is negligible compared to the mechanical execution time of 21 moves, which at 5 moves per second is 4.2 seconds. The solver is never the bottleneck.

Comparison with Thistlethwaite’s Algorithm

The reduction from 4 phases to 2 is not merely an implementation simplification. Kociemba’s H is the intersection of three conditions that Thistlethwaite’s algorithm satisfies incrementally over its first three phases. Thistlethwaite requires all edges correctly oriented (entering G_1), then all corners correctly oriented and UD-slice edges in the slice (entering G_2), then all pieces in correct orbits (entering G_3). Kociemba requires all three of the first two conditions simultaneously before starting Phase 2.

The consequence is that Kociemba’s Phase 1 search space (2.2 billion states) is larger than any individual Thistlethwaite phase, but Phase 2 begins from a state satisfying all three conditions at once, giving the Phase 2 search a much more constrained and tractable starting point. Phase 2 search space is |H| = 19.5 billion, but the diameter of H under S_2 is only 18, so the search depth is bounded.

Thistlethwaite’s algorithm produces solutions in at most 52 moves. Kociemba’s produces solutions averaging 18 to 22 moves – closer to the 20-move God’s Number bound. The improved solution quality comes from the tighter Phase 2 structure: because Phase 2 begins in a state satisfying all orientation and slice conditions simultaneously, it can solve efficiently without the intermediate steps that Thistlethwaite’s later phases require.

Where This Leads

The algorithm is a complete mathematical and algorithmic object. What remains is implementation: writing the coordinate conversion functions, precomputing the tables, implementing the two-phase IDA* search, and verifying correctness systematically.

The coordinate representation is the most implementation-sensitive component. Encoding flip, twist, and slice as compact integers, and ensuring the move tables correctly update each coordinate under each of the 18 generators, requires careful indexing. A bug in the move table for one coordinate type produces a solver that appears correct on easy states (which have short solution paths that happen not to exercise the bug) and silently fails on harder ones.

After the solver is working in simulation, the next phase is connecting it to a physical system: reading cube state from a camera, converting move sequences into robot arm commands, and handling the mechanical realities of position tolerance and motor speed limits. That is where the hardware design work begins.