The Q test for discordant data

One of the principles of statistics is that in making a sampling that is truly random, it is possible that one of our finite samples will arise from the (statistically unlikely) outer bounds of the curve. It may be perfectly valid data, but its impact will be to unnecessarily skew the calculations of mean and of uncertainty. We can test whether a value that appears to be quite different can in fact be excluded for the purpose of calculating the mean and the experimental uncertainty.

The equation:

$$Q_n = |(x_s_u_s-x_n_e_a_r)/(x_m_a_x-x_m_i_n)|$$
Here, Q_n is the Q value for the suspect data point (x_sus), x_near is the closest second data point, and (x_max - x_min is the overall range of the data, suspect value included. If Q_n is greater than or equal to Q_c in the following table (based on the total number of data points), then it is necessary to exclude the suspect value from any statistical treatment (finding the mean or standard deviation of the mean).

N = number of observations	3	4	5	6	7	8	9	10
Q_c:	0.94	0.76	0.64	0.56	0.51	0.47	0.44	0.41

Cautions:

You can only exclude one data point at most! You cannot iteratively apply the Q test to "winnow" data.
If, in a small data set, Q_n is close to Q_c but not large enough to exclude it, the median value may be a better estimate than the mean.
For more than 10 observations, a better criterion for exclusion is if the deviation from the mean of the others is >2.6S, where S is the estimated standard deviation of the mean of the others. This represents a 1% probability that the observation is statistically valis.

The Q Test for Discordant Data

Explanation

The equation:

Cautions:

Navigation

Contact Info