JVM.today

Java technology blog by Arturs Licis

We all love binary search.
Straightforward, easy to understand algorithm; finds your number in a sorted array
in an appealing *O(log n)* time. But is it *that* simple to implement?

As it turns out, it’s not. The way it was implemented in Java is a great example of how a seemingly simple code should be treated with care. The bug was introduced early in Java 1.2 but was fixed only for Java 1.6.

It was Sedgewick’s and Wayne’s course on algorithms when I’ve heard of this for the first time. Some time later I came over this Josh Bloch’s post which has all essential information about the bug.

Let me guide you through this story once again, and share some trivia.

This is how the binary search looks in Java 1.2 (JDK-1.2.2_007 sources):

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17

public static int binarySearch(long[] a, long key) {
int low = 0;
int high = a.length-1;
while (low <= high) {
int mid =(low + high)/2;
long midVal = a[mid];
if (midVal < key)
low = mid + 1;
else if (midVal > key)
high = mid - 1;
else
return mid; // key found
}
return -(low + 1); // key not found.
}

*Perfectionists, keep your anger on mixed tab/space indentation.*

So what’s the deal? Well, simply finding midpoint between two numbers denoting
position in an array: `int mid =(low + high)/2;`

. When `low + high`

exceeds max value of signed int (the only flavor you get in Java), it turns
into a negative value, and as a result, you get half of that negative value as ‘mid’.

Here’s a simple visualization of how it goes wrong:

There’re several solutions, but the one applied in Java is simple and efficient -
divide using bit shift operation, the *unsigned* one. As before, you get a negative
number as a result of overflow `(low + high)`

, but the next operation
won’t treat sign bit seriously, and will simply shift everything using zero as
a filler, thus getting us back on positive numbers’ side:

The bug was identified in Tiger (1.5), and was finally fixed in two years for 1.6. What I personally like most of this story are those comments under bug #5045882:

End of story? No. The same problem hit again just around the time when binary
search seemed to have been fixed for Java 1.6
(see bug #6437371),
this time in `TreeMap`

which also uses midpoint calculation:

Now, end of story? Not quite. Let’s pick some latest Java 9 source code, and do some research. No fancy tools, no AST analysis - basic grep (assuming variable names start with ‘h’ and ‘l’). Out of 85 results, 25 survived manual review to be presented below:

If you write tests, do boundary testing.