IWETHEY v. 0.3.0 | TODO
1,095 registered users | 0 active users | 0 LpH | Statistics
Login | Create New User
IWETHEY Banner

Welcome to IWETHEY!

New It's a standard fitting technique.
Many scientific measurements involve resolving multiple peaks from a noisy spectrum of overlapping peaks. The way it is done is via a least-squares error minimization technique, similar to that used to find the "best fit" line through a set of points. It can get tricky depending on what type of curve you want to use to describe the individual peaks (e.g. gaussian, lorentzian, etc.).

More information is available by doing a Google search on "peak fitting". :-)

HTH.

[edit - typo.]

Cheers,
Scott.
Expand Edited by Another Scott March 3, 2004, 05:53:15 PM EST
New We had a big argument about least squares once
I argued that an exponent of 2 was arbitrary and that it was used to make the math easier rather than necessarily produce a better result. I once tried to test an exponent of 1 with absolute distance, but didn't find it worked as well as 2. I never got around to testing say 1.5.
________________
oop.ismad.com
New It's not arbitrary...
But it's just one standard way of minimizing errors in a fit. There's a lot of rigorous mathematics behind it.

However, if, say, you're making hand grenades and they will expolde if the force on the detonator is too big, it doesn't matter if the average force in 10 trials is below the critical force, you're still dead if you exceed the critical value. :-)

Right tool for the job, etc., etc. :-)

Cheers,
Scott.
New re: "Proof"
There's a lot of rigorous mathematics behind it.

As far as I know, there is no mathematical proof that 2 delivers the most accurate result. Unlike OO, proofs either exist or don't exist in math. No fuzzy grey areas (unless perhaps "accurate" is subjective; CRC and I went round and round on this). I asked math newsgroups about such a proof once, and got no definitive answers.

I agree that a lot of math uses the technique of squaring, but that does not necessarily make it the best. Like I said, squaring seems to make the computations simpler. But simple computations and accuracy are not necessarily the same thing.

________________
oop.ismad.com
New I'll give a brief rundown...
but I don't think it'll address your criticism.

As far as I know, there is no mathematical proof that 2 delivers the most accurate result. Unlike OO, proofs either exist or don't exist in math. No fuzzy grey areas (unless perhaps "accurate" is subjective; CRC and I went round and round on this). I asked math newsgroups about such a proof once, and got no definitive answers.

There's a lot of background mathematics that you need to know to understand why the method of least squares is the most common method to best approximate a curve.

I learned this about 24 years ago from - "An Introduction to Linear Analysis" by Kreider, Kuller, Ostberg and Perkins, ISBN 0-201-03949-4. pp 290-294. It's a good text.

My HTML knowledge isn't good enough to enter the equations, but I'll try to summarize things.

First, consider an experiment repeated n times which measures a value of c, say the specific gravity of a substance. These n experiments give values x1, x2, ..., xn. Since there are experimental errors, we expect that none of these values will be exactly equal to the true value c. So how do we find the best approximation for c from these measured values?

The way to do so is to view the n experimental values as n components of a vector X = (x1, x2, ... xn) in an n-dimensional space Rn. Similarly, the true value c can be thought of as an n-dimensional vector cY = (c, c, ...c) where Y = (1, 1, ... 1).

It then follows that the best approximation, c'Y, is determined by "the requirement that c'Y be the perpendicular projection of X onto the one-dimensional subspace of Rn spanned by Y. But as we saw in the preceeding section, this projection is:

Y (X dot Y) / (Y dot Y) [where dot indicates the dot product of 2 vectors]

so that c' = (X dot Y) / (Y dot Y) = (x1 + x2 + ... +xn) / n

This, of course, is none other than the arithmetic average of the xi, and we now have a theoretical interpretation of the popular practice of averaging separate (independent) measurements of the same quantity."

This same analysis can be extended to curves. In doing so, we still end up taking the perpendicular projection and dot products and thus we end up minimizing the sum of the squares of the difference between the measured and the true value.

So it just falls out of the mathematics. It's not arbitrary.

There are other, less common, methods for curve fitting and minimizing errors. It depends on how the "error" is defined that determines which method is best. In the vast majority of cases, least-squares is most appropriate.

If you and CRC went round and round about this, I don't think I'll convince you, but I thought I'd try to answer. This is about all I have to say on the subject.

HTH.

Cheers,
Scott.
New Ouch. I'll have to reread it several times. My math is rusty
Can we stick with 2 dimensions for now?
________________
oop.ismad.com
New There is
It's what's known as a "variational problem", such as - find the shape that maximizes the area inside a fixed closed curve of constant length, having no nodes.

Least squares comes straight out of this type of analysis. Eventually it boils down to the "Gram-Schmidt inequality" or "triangle inequality" for vectors

|a|-|b| < |a+b| < |a|+|b|

The vectors here are the sequences of coefficients that represent the approximating function in some basis.

If you are interested I can provide more detail.


-drl
New The exponent 2 is NOT arbitrary
Coincidentally I happen to be reading a book on the history of statistics. So I can give you names and dates.

The method of least squares was invented by Legendre in 1805. (Gauss claimed to have invented it in 1795, but did not popularize it.) He argued for it on mathematical simplicity. It was easy to apply, obviously fair, and in several special cases it simplified to the obviously right thing.

However at that point least squares was one of a great many averaging techniques that existed. Why did it beat out Boscovitch's method which was already in widespread use and minimized the sum of the absolute values of the errors?

The next step was provided by Gauss in a poorly reasoned section of Theoria Motos Corporum Coelestium in Sectionibus Conicis Solum Ambientium in 1809. What Gauss proved was that the only error distribution that made least squares best was the normal distribution, and vice versa the normal distribution made least squares best. (The poor reasoning was a circularity where he assumed that least squares was best, which lead him to the normal distribution, which lead him back to least squares.)

What do I mean by that? Well suppose that the measurements can be expected to be normally distributed with some mean and variance. Then for any given variance, the mean that makes the observed data most likely is the result of a least squares fit. Furthermore it is then possible to figure out the variance that best fits the data, allowing fairly detailed analysis to proceed.

Of course it wasn't stated that way because the concept of variance wasn't yet developed, nor was the normal curve well enough known to be called "normal".

I should note that the relationship between the normal curve and being able to carry out this analysis in an obvious way is special. For instance Laplace had tried previously to figure out a good error curve from a priori considerations and then analyze the error. He did come up with a reasonable family of curves, but the analysis got hopelessly difficult, and the best answer depended on which curve out of his family you chose. (Contrast the normal where the unknown variance doesn't affect what mean is best.)

In 1810 Laplace improved on Gauss. In 1785 he already had a version of the Central Limit theorem. After reading Gauss, this motivated him to combine the two to point out that the normal distribution was the natural result if the error term was itself the sum of several other error curves. Therefore there were good reasons to believe that the normal distribution was likely to come up, in which case least squares was the best possible estimate.

In 1811 Laplace further proved that of all estimation techniques of the form of one linear combination divided by another, the one that leads to least squares has (for a large number of observations) the smallest expected error in estimating the mean of the actual distribution, no matter what the error distribution is.

In 1812 Laplace presented both of these results in the Théorie analytique des probabilités. By 1815 least squares had entered standard use in French astronomy and among geometers there. It was standard in England as well by 1825.

In 1823 Gauss updated Laplace's 1811 analysis to point out that if one is interested in minimizing the expected squares of the error in estimating the mean to the error, then you can remove the condition "for a large number of observations" from Laplace's result. (The expected square of the error is, of course, the variance.)

Let me summarize this. To the extent that the normal curve is likely to be encountered in practice, the least squares method is the best estimate. Furthermore of all simple estimation techniques to calculate (ie one linear combination divided by another), least squares is best (either by a particular measure, or by any reasonable measure that you want if there are many measurements). These facts earned the technique rapid acceptance and caused it to replace previous methods of estimation.

Furthermore there are other possible estimation techniques that you can try that will be better with specific hypotheses and criteria. But the calculations involved with any of these are far more complex than least squares, and if you have enough data, all reasonable ones converge on the same answers.

I should also note that in many applications there are good physical interpretations behind least squares. For instance in a spectrum, the sum (OK, integral) of the squares of the signal is the energy of that signal. So by minimizing the sum of the squared error you are minimizing the unaccounted for energy in the signal.

Is this enough for you?

Ben
"good ideas and bad code build communities, the other three combinations do not"
- [link|http://archives.real-time.com/pipermail/cocoon-devel/2000-October/003023.html|Stefano Mazzocchi]
New Physical events
Wouldn't the distribution of [link|http://www.blackcatsystems.com/GM/experiments/ex4.html|radioactive events] (standard distribution = square root) also lend itself to the physical interpretation?
New Not in an obvious way to me
The number of events observed is the energy delivered in a Poisson distribution of radiation.

I don't see that naturally leading to the physical interpretation, although the underlying phenomena measured (and described) in another way does.

Cheers,
Ben
"good ideas and bad code build communities, the other three combinations do not"
- [link|http://archives.real-time.com/pipermail/cocoon-devel/2000-October/003023.html|Stefano Mazzocchi]
New Let's see if I got this right
So what you are saying is that IF the variance follows a normal distribution, then 2 is theoretically the best matching exponent. However, if the actual variance is not a normal curve, then it may not be.

However, since most "natural" things follow a normal variance, least-squred is used because for the normal assumption, it is the simplest known way to calculate the answer.

One would have to know more about the nature of the actual error distribution and know that it is not a normal distribution if they want to possibly do better than least-squares.

I can live with that.
________________
oop.ismad.com
New Yes
You can strengthen your "it may not be" though. The 1809 result from Gauss proved that if you have a family of possible error distributions which are the same up to translation (ie move the center) and scale (multiply by a constant factor), then least squares can only be best if that family is the normal distribution.

That is, if the real error curve differs from the normal at all, then least squares cannot be the best possible technique. The best one is very likely, however, to be far more complicated than "use a different exponent". There would be little point in going through this effort unless you had good reason to believe that the distribution was not normal, and you had a pretty good idea what the real distribution was.

Cheers,
Ben
"good ideas and bad code build communities, the other three combinations do not"
- [link|http://archives.real-time.com/pipermail/cocoon-devel/2000-October/003023.html|Stefano Mazzocchi]
     Spectrum matching? - (tablizer) - (12)
         It's a standard fitting technique. - (Another Scott) - (11)
             We had a big argument about least squares once - (tablizer) - (10)
                 It's not arbitrary... - (Another Scott) - (4)
                     re: "Proof" - (tablizer) - (3)
                         I'll give a brief rundown... - (Another Scott) - (1)
                             Ouch. I'll have to reread it several times. My math is rusty - (tablizer)
                         There is - (deSitter)
                 The exponent 2 is NOT arbitrary - (ben_tilly) - (4)
                     Physical events - (ChrisR) - (1)
                         Not in an obvious way to me - (ben_tilly)
                     Let's see if I got this right - (tablizer) - (1)
                         Yes - (ben_tilly)

Why is that so damned familiar?
107 ms