Statistics Terms and Equations

We need to approach "error" from two directions. Every observation we make and report in chemistry is subject to error: we can make no claim to perfect measurement. Most students initially think of systematic error, normally a "mistake" introduced in the performance of a measurement. While this is a concern, we work to understand and eliminate such systematic error either in our design of the measurement or by understanding--and eliminating--our mental bias to "look for" a particular result. But once we do this, we will observe statistically random variations in our measurements that we can treat according to theory in statistics. This gets us to our end answer (the numerical measurement, or a value derived somehow from our direct measurements) along with an estimate of "how good" the measurement is.

What is the value we report?

Normally, we will make some number N observations where we take a direct measurement. The most common "best value" to report is the mean value:
$$x̄ = (Σx_i)/N$$
where x̄ is the mean, x_i are the individual measurements, and N is the number of measurements.
There are often cases where we may wish to discard data (see the Q Test) that looks discordant; obviously if there is a known systematic error, data may be included. In complex cases, you may want to apply different weighting to different data, but that is beyond the scope of our current work.

How good is our estimate?

Various means are used. The simplest is just to report the range of the data; the real value is "in there somewhere." Not a very satisfying answer.

Better is to calculate the "variance" in the data, as this is theoretically related to the expected Gaussian distribution of data if we had time to make an infinite number of observations.
The variance has the symbol S² and is defined by the equation:
$$S^2=1/{N-1}Σ(x_i-x̄)^2$$
If we had an infinite number of observations, the variance approaches the square of the standard deviation for the population. However, because we have (always) limited data, the better value to report is the estimated standard deviation of the mean which limits it to a description of our data rather than the theoretically infinite population of possible observations. The proper equation is
$$S_m=S/√N=1/√{N(N-1)}(Σ(x_i-x̄)^2)^½$$

S_m can be used as a direct statement of precision; usually we will see this expressed as a number in parentheses after the value. Here the numeral in parentheses represents an uncertainty in the last significant figure of the measurement, and should be truncated to reflect that. A more robust definition is that of the confidence interval, which can be defined at different levels of probability. Click the link to read the explanation and use the Student t distribution.

Statistics Terms and Equations

Explanation

What is the value we report?

How good is our estimate?

Navigation

Contact Info