One of the principles of statistics is that in making a sampling that is truly random, it is possible that one of our finite samples will arise from the (statistically unlikely) outer bounds of the curve. It may be perfectly valid data, but its impact will be to unnecessarily skew the calculations of mean and of uncertainty. We can test whether a value that appears to be quite different can in fact be excluded for the purpose of calculating the mean and the experimental uncertainty.
The equation:
$$Q_n = |(x_s_u_s-x_n_e_a_r)/(x_m_a_x-x_m_i_n)|$$
Here, Q
n is the Q value for the suspect data point (x
sus), x
near is the closest second data point, and (x
max - x
min is the overall range of the data, suspect value included. If Q
n is
greater than or equal to Q
c in the following table (based on the total number of data points), then it is necessary to exclude the suspect value from any statistical treatment (finding the mean or standard deviation of the mean).
N = number of observations
|
3
|
4
|
5
|
6
|
7
|
8
|
9
|
10
|
Qc:
|
0.94
|
0.76
|
0.64
|
0.56
|
0.51
|
0.47
|
0.44
|
0.41
|
Cautions:
- You can only exclude one data point at most! You cannot iteratively apply the Q test to "winnow" data.
- If, in a small data set, Qn is close to Qc but not large enough to exclude it, the median value may be a better estimate than the mean.
- For more than 10 observations, a better criterion for exclusion is if the deviation from the mean of the others is >2.6S, where S is the estimated standard deviation of the mean of the others. This represents a 1% probability that the observation is statistically valis.