Student's t Distribution: Confidence Limits

The core problem with reporting a mean and an estimated standard deviation is that while it does describe the statistical behavior of our finite data set, it doesn't directly answer the question of "how good is the answer?" To do this we have to explicitly correct for the finite number of observations: a normal distribution actually presupposes an infinite data set, which we clearly will never have. The correction, worked out by W. S. Gosset (who went by the pseudonym "Student") requires finding a value of the t distribution for the number of observations that describes the desired probability for which we want to know the "how good?" question. A full theoretical description is developed in Shafer & Zhang, Ch. 8 and pp. 433ff, or in Garland, Nibler & Schoemaker, pp. 48-50.

The Student t Distribution Table

P1 sided	t_.60	t_.70	t_.80	t_.90	t_.95	t_.975	t_.99	t_.995
P2 sided	t_.20	t_.40	t_.60	t_.80	t_.90	t_.95	t_.98	t_.99
df
1	0.325	0.727	1.376	3.078	6.314	12.71	31.82	63.66
2	0.289	0.617	1.061	1.886	2.920	4.303	6.965	9.925
3	0.277	0.584	0.978	1.638	2.353	3.182	4.541	5.841
4	0.271	0.569	0.941	1.533	2.132	2.776	3.747	4.607
5	0.267	0.559	0.920	1.476	2.015	2.571	3.365	4.032
6	0.265	0.553	0.920	1.440	1.943	2.447	3.143	3.707
7	0.263	0.549	0.896	1.415	1.895	2.365	2.998	3.499
8	0.262	0.546	0.889	1.397	1.860	2.306	2.896	3.355
9	0.261	0.543	0.883	1.383	1.833	2.262	2.821	3.250
10	0.260	0.542	0.879	1.372	1.812	2.228	2.764	3.169
-
20	0.257	0.533	0.860	1.325	1.725	2.086	2.528	2.845
-
inf	0.283	0.524	0.842	1.282	1.645	1.960	2.326	2.576

How to use the table

You will initially have calculated the mean of the data x̄ and the estimated standard deviation S_m from the data set after applying the Q test for discordance, if necessary. To calculate the confidence interval that this description provides, we will use the "two-sided" P2 to choose our probability. In chemistry, we will normally want to report a 95% confidence interval, so select the column indicating P2 = 0.95. It is, of course, possible to present results with different probabilities of meeting the "real" value, but this corresponds to close to two standard deviations on either side of the "true" mean.

We also have to account for "degrees of freedom," listed as df in the table, but often given the Greek symbol ν. If we had not found anything from the data set, this would be equal to N, the number of observations. However, we've already used the data to find the mean, x̄, so we have used up one degree of freedom, and "df" is now N-1. Note that in the case of a linear regression analysis we subtract a degree of freedom for every parameter the analysis returns. In a simple one-variable case, we get slope and intercept from the regression, and df = N-2.

We then apply the following equation:

Δ = tS_m

where Δ is the 95% confidence interval, t is the value we read from the table, and S_m is the estimated standard deviation. The result is reported as:

Student's t Distribution and Confidence Limits

Explanation

The Student t Distribution Table

How to use the table

Δ = tS_m

x̄ ± Δ (95%, N=no. of observations)

Navigation

Contact Info

Student's t Distribution and Confidence Limits

Explanation

The Student t Distribution Table

How to use the table

Δ = tSm

x̄ ± Δ (95%, N=no. of observations)

Navigation

Contact Info

Δ = tS_m