// computers › Adversarial & Game-Tree

Monte Carlo Tree Search (MCTS)

Play many random simulated games and favour the moves that win most often.

repeat: Select (UCB1) -> Expand -> Simulate (random rollout) -> Backpropagate wins

Frequently asked questions

Is MCTS like mentally playing out random finishes to a board game?

Yes - that is the core intuition. To choose a move, you mentally play out dozens of random finishes from each option and favour the move that wins most often. MCTS does exactly this: it runs many simulated games, tracks each move's win rate, and uses those statistics to decide. It needs no expert scoring of positions - just the ability to play games out - which is why it cracked Go, where positions are far too complex to evaluate by formula.

What are the four steps of MCTS?

Selection: starting at the root, descend the existing tree by repeatedly choosing children with a rule (UCB1) that balances trying known-good moves against exploring under-tried ones. Expansion: when you reach a node that has untried moves, add a new child for one of them. Simulation (rollout): from that new node, play the game out to the end with random or quick moves, getting a win or loss. Backpropagation: add that result to the visit and win counts of every node on the path back to the root. Repeat thousands of times, then pick the most-visited root move.

What is UCB1 and why does it matter?

UCB1 (Upper Confidence Bound) is the formula MCTS uses during selection to balance exploitation and exploration. It favours moves with high win rates (exploitation) but adds a bonus for moves tried few times (exploration), so promising-but-uncertain moves still get attention. Without this balance, MCTS would either fixate on the first move that looked good or waste effort spreading attention evenly. UCB1 is what lets the search concentrate on strong moves while never completely ignoring the rest - the key to its efficiency.

Why use random simulations instead of evaluating positions like minimax?

For games like Go, the branching factor is so huge (around 250 legal moves) and good positions so hard to score with a simple formula that minimax - even with alpha-beta - cannot search deep enough to play well. MCTS sidesteps the need for a position-evaluation function: it just plays games to the end and counts wins. The law of large numbers does the rest - average enough random outcomes and the genuinely strong moves reveal themselves statistically. This is why MCTS, not minimax, cracked computer Go.

Why does MCTS get stronger with more simulations?

Each simulation is a noisy sample of how good a move is. With few samples the win-rate estimates are unreliable; with thousands, they converge toward the true values, and UCB1 increasingly concentrates simulations on the best candidates, sharpening their estimates further. So unlike a fixed-depth minimax search, MCTS is an 'anytime' algorithm - you can stop whenever time runs out and take the best estimate so far, and giving it more time always helps. The demo's simulation slider shows this convergence directly.

How did MCTS contribute to AlphaGo?

AlphaGo combined MCTS with deep neural networks. Instead of purely random rollouts, it used a 'policy network' to suggest promising moves (guiding selection and expansion) and a 'value network' to evaluate positions (replacing or supplementing random play-outs). MCTS provided the search framework that turned these learned intuitions into concrete move choices, exploring the most promising lines deeply. This marriage of statistical tree search with learned evaluation is what let AlphaGo defeat top human players, a milestone long thought decades away.