We all love binary search.
Straightforward, easy to understand algorithm; finds your number in a sorted array
in an appealing O(log n) time. But is it that simple to implement?
As it turns out, it’s not. The way it was implemented in Java is
a great example of how a seemingly simple code should be treated with care. The bug
was introduced early in Java 1.2 but was fixed only for Java 1.6.
Let me guide you through this story once again, and share some trivia.
The Bug and the Fix
This is how the binary search looks in Java 1.2 (JDK-1.2.2_007 sources):
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
publicstaticintbinarySearch(long[]a,longkey){intlow=0;inthigh=a.length-1;while(low<=high){intmid=(low+high)/2;longmidVal=a[mid];if(midVal<key)low=mid+1;elseif(midVal>key)high=mid-1;elsereturnmid;// key found}return-(low+1);// key not found.}
Perfectionists, keep your anger on mixed tab/space indentation.
So what’s the deal? Well, simply finding midpoint between two numbers denoting
position in an array: int mid =(low + high)/2;. When low + high
exceeds max value of signed int (the only flavor you get in Java), it turns
into a negative value, and as a result, you get half of that negative value as ‘mid’.
Here’s a simple visualization of how it goes wrong:
There’re several solutions, but the one applied in Java is simple and efficient -
divide using bit shift operation, the unsigned one. As before, you get a negative
number as a result of overflow (low + high), but the next operation
won’t treat sign bit seriously, and will simply shift everything using zero as
a filler, thus getting us back on positive numbers’ side:
History and Trivia
The bug was identified in Tiger (1.5), and was finally fixed in two years for 1.6.
What I personally like most of this story are those comments under bug
#5045882:
End of story? No. The same problem hit again just around the time when binary
search seemed to have been fixed for Java 1.6
(see bug #6437371),
this time in TreeMap which also uses midpoint calculation:
Now, end of story? Not quite. Let’s pick some latest Java 9 source code, and
do some research. No fancy tools, no AST analysis - basic grep (assuming variable
names start with ‘h’ and ‘l’). Out of 85 results, 25 survived manual review to
be presented below: