Post #259,336
6/19/06 8:51:42 AM
|
Binary search bug
Joshua Bloch writes [link|http://googleresearch.blogspot.com/2006/06/extra-extra-read-all-about-it-nearly.html|Extra, Extra - Read All About It: Nearly All Binary Searches and Mergesorts are Broken] that nearly all binary searches are broken for large arrays.
...The bug is in this line:
6: int mid =(low + high) / 2;
In Programming Pearls Bentley says that the analogous line "sets m to the average of l and u, truncated down to the nearest integer." On the face of it, this assertion might appear correct, but it fails for large values of the int variables low and high. Specifically, it fails if the sum of low and high is greater than the maximum positive int value (2^31 - 1). The sum overflows to a negative value, and the value stays negative when divided by two. In C this causes an array index out of bounds with unpredictable results. In Java, it throws ArrayIndexOutOfBoundsException.
This bug can manifest itself for arrays whose length (in elements) is 2^30 or greater (roughly a billion elements). This was inconceivable back in the '80s, when Programming Pearls was written, but it is common these days at Google and other places. In Programming Pearls, Bentley says "While the first binary search was published in 1946, the first binary search that works correctly for all values of n did not appear until 1962." The truth is, very few correct versions have ever been published, at least in mainstream programming languages.
Of course, this error cannot happen in a language like Smalltalk since there is no limited range of integers. In Smalltalk, if the result of an expression exceeds the range of small integers, the overflow is handled in the VM and some kernel classes and automatically promoted to some type that can handle the result. This means that the Smalltalk programmer is relieved from the annoying task to prove that integer values are inside proper ranges. This is a much better approach then Java, C/C++.
This is probably the tip of the iceberg of this class of errors because memory and data structures are only growing.
My conclusion from here would be that it is time to move away from using ints, longs, etc. and use the Smalltalk approach instead. I fail to see the benefit where the programmer has to choose between int and long.
|
Post #259,353
6/19/06 12:11:01 PM
|
The time to move away from C/C++ was 1995
My conclusion from here would be that it is time to move away from using ints, longs, etc. and use the Smalltalk approach instead. I fail to see the benefit where the programmer has to choose between int and long. The only time when a programmer should have to pick an exact memory size for a variable is when doing hardware level programming. At the OS level the speed advantage and the fact that a lot of it is hardware level gives a big advantage. That application programmers are still working in C and C++ is sad. And as somebody who is going back into C++ programming after not doing any since Windows 95 came out, let me tell you it has not gotten any prettier in the past decade. Jay
|
Post #259,371
6/19/06 2:37:36 PM
|
I'm glad you recognize that not all programming is web pages
The only time when a programmer should have to pick an exact memory size for a variable is when doing hardware level programming. Oh, you mean like, embedded software development. Like what I do every day. An environment where C++ shines (or, at least, has the potential to, were some large fraction of its potential practitioners of a mind to actually practice), and where a Smalltalk VM is about as useful as a toothache. I appreciate that fact that you recognize that one size still doesn't fit all.
jb4 "So don't pay attention to the approval ratings that say 68% of Americans disapprove of the job this man is doing. I ask you this, does that not also logically mean that 68% approve of the job he's not doing? Think about it. I haven't." — Stephen Colbert, at the White House Correspondent's Dinner 29Apr06
|
Post #259,355
6/19/06 1:08:34 PM
|
Not much of a problem
Specifying number size is annoying but if an array is holding more than 1 million entries, the application programmer should scrap it and use something else. Such as a hash table or database table.
More like that as memory grows, there'll be larger chunks in memory and more records on the hard drive.
Matthew Greet
Choose Life. Choose a job. Choose a career. Choose a family. Choose a fucking big television, choose washing machines, cars, compact disc players and electrical tin openers. Choose good health, low cholesterol, and dental insurance. Choose fixed interest mortgage repayments. Choose a starter home. Choose your friends. Choose leisurewear and matching luggage. Choose DIY and wondering who the fuck you are on a Sunday morning. Choose sitting on that couch watching mind-numbing, spirit-crushing game shows, stuffing fucking junk food into your mouth. Choose rotting away at the end of it all, pishing your last in a miserable home, nothing more than an embarrassment to the selfish, fucked up brats you spawned to replace yourself. Choose your future. Choose life... But why would I want to do a thing like that? I chose not to choose life. I chose somethin' else. And the reasons? There are no reasons. Who needs reasons when you've got heroin? - Mark Renton, Trainspotting.
|
Post #259,430
6/20/06 1:02:25 AM
|
T Bray had the same bug
[link|http://www.tbray.org/ongoing/When/200x/2003/03/22/Binary|reported back in March]
interpreted languages in general, eg. Perl, Ruby, PHP, REXX, Python, don't have this problem either....
Have fun, Carl Forde
|
Post #259,431
6/20/06 1:06:23 AM
|
Try to have an array that big...
in a language like Perl. Come on, I dare you.
And if you succeed, then I really want your computer!
Cheers, Ben
The great masses of people ... will more easily fall victims to a big lie than to a small one. -- Adolf Hitler
|
Post #259,513
6/21/06 12:54:34 AM
|
the difference between theory and practice is
greater in practice than it is in theory. :-)
Have fun, Carl Forde
|