but I don't think it'll address your criticism.
As far as I know, there is no mathematical proof that 2 delivers the most accurate result. Unlike OO, proofs either exist or don't exist in math. No fuzzy grey areas (unless perhaps "accurate" is subjective; CRC and I went round and round on this). I asked math newsgroups about such a proof once, and got no definitive answers.
There's a lot of background mathematics that you need to know to understand why the method of least squares is the most common method to best approximate a curve.
I learned this about 24 years ago from - "An Introduction to Linear Analysis" by Kreider, Kuller, Ostberg and Perkins, ISBN 0-201-03949-4. pp 290-294. It's a good text.
My HTML knowledge isn't good enough to enter the equations, but I'll try to summarize things.
First, consider an experiment repeated n times which measures a value of c, say the specific gravity of a substance. These n experiments give values x1, x2, ..., xn. Since there are experimental errors, we expect that none of these values will be exactly equal to the true value c. So how do we find the best approximation for c from these measured values?
The way to do so is to view the n experimental values as n components of a vector X = (x1, x2, ... xn) in an n-dimensional space Rn. Similarly, the true value c can be thought of as an n-dimensional vector cY = (c, c, ...c) where Y = (1, 1, ... 1).
It then follows that the best approximation, c'Y, is determined by "the requirement that c'Y be the perpendicular projection of X onto the one-dimensional subspace of Rn spanned by Y. But as we saw in the preceeding section, this projection is:
Y (X dot Y) / (Y dot Y) [where dot indicates the dot product of 2 vectors]
so that c' = (X dot Y) / (Y dot Y) = (x1 + x2 + ... +xn) / n
This, of course, is none other than the arithmetic average of the xi, and we now have a theoretical interpretation of the popular practice of averaging separate (independent) measurements of the same quantity."
This same analysis can be extended to curves. In doing so, we still end up taking the perpendicular projection and dot products and thus we end up minimizing the sum of the squares of the difference between the measured and the true value.
So it just falls out of the mathematics. It's not arbitrary.
There are other, less common, methods for curve fitting and minimizing errors. It depends on how the "error" is defined that determines which method is best. In the vast majority of cases, least-squares is most appropriate.
If you and CRC went round and round about this, I don't think I'll convince you, but I thought I'd try to answer. This is about all I have to say on the subject.
HTH.
Cheers,
Scott.