// computers › Interval & Divide-and-Conquer

Binary Search

Sort first, then find an item fast by repeatedly halving the search range.

sort the data; then mid = (low + high) // 2; keep the half that can hold the target.

Frequently asked questions

Is this the 'higher or lower' number-guessing game?

Yes - it is precisely that game. A friend picks a number from 1 to 100 and only says 'higher' or 'lower'. You guess 50, then 75, then 62, and each answer halves what is left. You always win in about seven guesses because halving 100 seven times gets you to one. Binary search is the same idea on sorted data: each comparison throws away half the remaining items, giving its O(log n) speed.

Why does binary search need the data to be sorted first?

Binary search works by deciding which HALF of the data to throw away at each step, and that decision relies on order: if the middle item is too small, everything to its left must also be too small. On unsorted data that guarantee disappears, so the discarded half might still contain the target. If your data isn't sorted you either sort it once (worthwhile if you'll search many times) or fall back to linear search.

How can a million items really be searched in about 20 steps?

Each comparison halves what's left: 1,000,000 then 500,000 then 250,000 and so on. The number of halvings needed is log base 2 of the size, and log2 of a million is roughly 20. That's the whole point of O(log n) - doubling the data adds just one extra step, so even a billion items need only about 30 comparisons.

What is the famous 'mid = (low + high) / 2' overflow bug?

For decades many textbook implementations computed the midpoint as (low + high) / 2. On very large arrays low + high can exceed the maximum integer a machine word can hold, wrapping around to a negative number and crashing or misbehaving. The safe form is low + (high - low) / 2, which never overflows. A famous 2006 post by Joshua Bloch revealed this bug had lived in widely used libraries for years.

Where do real systems use binary search beyond simple lookups?

It powers 'git bisect', which finds the exact commit that introduced a bug by halving a range of commits. Databases use it inside index pages. It underlies 'binary search on the answer' - a problem-solving technique where you guess a numeric answer and halve the guess range based on whether the guess was too high or low, used in scheduling, capacity planning, and competitive programming.

When is binary search actually the wrong choice?

When the data changes constantly: keeping an array sorted after every insertion is expensive, so a balanced tree or hash table is usually better. It's also overkill for very small lists, where plain linear search is simpler and the speed difference is invisible. And it can't be used directly on linked lists, because jumping to the middle element isn't a cheap operation there.

What data structure is binary search built on?

A SORTED ARRAY - one of the most fundamental data structures in computer science. An array stores items in a contiguous block of memory, so any element can be read instantly from its index (this is called random access, and it's O(1)). Binary search leans entirely on two properties of that structure: instant access to the middle element, and the items being in sorted order. The sorted order is what guarantees that if the middle is too small, the whole left half can be discarded. Take away either property and the algorithm fails - which is why it can't run on an unsorted array or directly on a linked list (where reaching the middle isn't cheap). It's a clear illustration of a general principle: each search algorithm is really defined by the data structure underneath it. Binary search needs a sorted array; hash search needs a hash table; HNSW needs a graph.