Oregon State University OREGON STATE UNIVERSITY Open search box

Explanation

The core problem with reporting a mean and an estimated standard deviation is that while it does describe the statistical behavior of our finite data set, it doesn't directly answer the question of "how good is the answer?" To do this we have to explicitly correct for the finite number of observations: a normal distribution actually presupposes an infinite data set, which we clearly will never have. The correction, worked out by W. S. Gosset (who went by the pseudonym "Student") requires finding a value of the t distribution for the number of observations that describes the desired probability for which we want to know the "how good?" question. A full theoretical description is developed in Shafer & Zhang, Ch. 8 and pp. 433ff, or in Garland, Nibler & Schoemaker, pp. 48-50.

The Student t Distribution Table

P1 sided t.60 t.70 t.80 t.90 t.95 t.975 t.99 t.995
P2 sided t.20 t.40 t.60 t.80 t.90 t.95 t.98 t.99
df
1 0.325 0.727 1.376 3.078 6.314 12.71 31.82 63.66
2 0.289 0.617 1.061 1.886 2.920 4.303 6.965 9.925
3 0.277 0.584 0.978 1.638 2.353 3.182 4.541 5.841
4 0.271 0.569 0.941 1.533 2.132 2.776 3.747 4.607
5 0.267 0.559 0.920 1.476 2.015 2.571 3.365 4.032
6 0.265 0.553 0.920 1.440 1.943 2.447 3.143 3.707
7 0.263 0.549 0.896 1.415 1.895 2.365 2.998 3.499
8 0.262 0.546 0.889 1.397 1.860 2.306 2.896 3.355
9 0.261 0.543 0.883 1.383 1.833 2.262 2.821 3.250
10 0.260 0.542 0.879 1.372 1.812 2.228 2.764 3.169
-
20 0.257 0.533 0.860 1.325 1.725 2.086 2.528 2.845
-
inf 0.283 0.524 0.842 1.282 1.645 1.960 2.326 2.576

How to use the table

You will initially have calculated the mean of the data x̄ and the estimated standard deviation Sm from the data set after applying the Q test for discordance, if necessary. To calculate the confidence interval that this description provides, we will use the "two-sided" P2 to choose our probability. In chemistry, we will normally want to report a 95% confidence interval, so select the column indicating P2 = 0.95. It is, of course, possible to present results with different probabilities of meeting the "real" value, but this corresponds to close to two standard deviations on either side of the "true" mean.

We also have to account for "degrees of freedom," listed as df in the table, but often given the Greek symbol ν. If we had not found anything from the data set, this would be equal to N, the number of observations. However, we've already used the data to find the mean, x̄, so we have used up one degree of freedom, and "df" is now N-1. Note that in the case of a linear regression analysis we subtract a degree of freedom for every parameter the analysis returns. In a simple one-variable case, we get slope and intercept from the regression, and df = N-2.

We then apply the following equation:

Δ = tSm

where Δ is the 95% confidence interval, t is the value we read from the table, and Sm is the estimated standard deviation. The result is reported as:

x̄ ± Δ (95%, N=no. of observations)