Dr. Hanley's Blog: March 2005

The Q-test is a simply calculated test statistic that is presented in many textbooks on analytical chemistry as a sufficient criteria for throwing out “outliers” in data sets. Nearly every modern analytical chemistry textbook has a section on this test and it is presented early and often in the University level curriculum. I think it is presented for several reasons. First, it is an easy statistic to compute. There is very little work involved and (I think) textbook writers hope students will spend time pondering the statistics with all the extra time students have after completing the computation. Second, it is a relatively satisfying test because it allows people to eliminate unsatisfactory data. Even I have to confess there is something satisfying about getting rid of a data point that just seemed out of place. Third, it seems to grip the imagination of students in ways that t-tests and F-tests do not. Perhaps, they see it as useful somehow and, therefore, relevant.

Even I can concede all these points. Who can argue against an easy statistical test that catches the imagination of students and leaves a satisfying feeling on successful execution?

Well… me.

What is my problem with the Q-test? Well everything I just said and some. First, philosophically, I am deeply suspicious of people throwing away data. You may argue that the Q-test justifies the exclusion of data, so I am talking rubbish. I will get to this in a moment, but for now just take it as a philosophical disagreement. Second, the Q-test has an underlying assumption of normality. Third, there is a somewhat better test of outliers called Grubb’s test which is recommended in preference to the Q-test by ISO (International Standards Organization). Finally, in my experience as a practicing scientist I have seen far too much removal of inconvenient data. Much was completely unjustified (in particular a certain oil company where I worked which shall remain nameless).

Philosophical objection

Let me begin with an anecdote.

Once upon a time I was headed north along the freeway between the Boston and Manchester, New Hampshire. It was winter and really quite cold. I had a nice energy efficient diesel VW rabbit (Golf for you Europeans). Did I tell you it was cold? I think I did, it was damn cold. I was driving along minding my business and the car started slowing down for no good reason. It went slower and slower and rolled to a stop on the edge of the freeway. I got out and poked around under the hood. Did I tell you it was cold? Just in case you missed it, IT WAS COLD. I didn’t see anything wrong under the hood – didn’t really expect much. I contemplated getting out and walking, but it was too COLD. I got back in the car, turned it with the starter for a while and… it started. I managed to limp to a service station.

You are all wondering, “why is this person telling me this?” The real important question is why did my car stall and me nearly freeze to death? Too much water in my fuel is the answer. Working in an oil refinery a few years later I became suspicious that the real reason was that water is cheaper than diesel fuel and summer and fall diesel fuels can contain more water than winter diesel fuel. In cold weather, the water separates from the fuel, turns to ice, and plugs up the fuel filter. During the warmer months cutting the water content specification “close” or just ignoring the test because it takes too long makes the company money. So, if you throw out or “mend” the numbers (they called it “pencilling” at the oil refinery) you make everyone happy. The guys in the plant are happy because it is less work for them, management is happy because they make more money selling maximum water and every little bit counts… So the pressure is on. Throw out the offending number and ship the product.

Which brings us to the point of the anecdote, if you measure something every day for a month (say water content) and get a particular value then you get a high out-of-specification value, what do you do? Well… Do you apply the Q-test, reject the number, and not tell anyone? Hmm… I didn’t think so. A single value is sufficient to tell if your process is out of control. You do NOT throw that out. Why not? Not only is it bad practice, but you might cause someone lose their toes to frostbite.

Ok. I think I have convinced you not to use the Q-test indiscriminately. Clearly, there are situations where you don’t want to throw out “abnormal” data. What is murky is when it is appropriate to use. I can give you a guideline: Be sure you know what you are doing and why you are doing it, before throwing away ANY data. My philosophical position is throwing away data, “massaging” data, pencilling data, and all related activities can be extremely dangerous to your science. Think of children playing with dynamite. Be sure the little guys know what they are doing, hmm.

Statistical worries.

The problem of removing outliers in its most general sense is the idea that the outlier does not belong to the same distribution as the rest of the data. Sort of a dogs and cats kind of idea. Suppose you did an experiment weighing neighborhood animals. Further suppose that most of the animals in that neighborhood were dogs and relatively big dogs. If you were only looking at the mass you would probably notice a cat. It would appear as an outlier. The cat would weigh much less and when you saw that mass you might try to test whether it fit with the rest of the distribution. This is the concept.

Seems pretty clear, what is the problem? Well in reality there is seldom an a priori reason to remove a value. If we have prior knowledge that a cat has gotten into the dogpile, we would keep the categories separate and there is no need for statistical help. The Q-test is used in cases when there is no a priori reason to tell you that one of your points is a cat rather than a dog. Therein lies the issue. The Q-test tells you the probability that a particular data point is not part of the same distribution and the probability that you are making a wrong decision. The danger lies in the following two issues: bias in application of the Q-test and the rest of the “outlier’s” distribution.

Both of these are rather insidious. Bias is the most obvious. I have yet to see a student (or anyone else) systematically test every single data set for outliers. Rather, the Q-test is usually applied only when someone thinks the outcome of the test might make a difference to their conclusions. If you are doing the latter you are really up to no good. Not at all. Almost assuredly you are trying to establish tenuous conclusions on marginal data. If a single point makes that much difference, maybe you should think about your experimental design a little more carefully or run a few more experiments. Otherwise, you are probably just biasing the conclusions of a marginal experiment in the direction you would like it to go.

The more subtle problem with the Q-test is that that outlier may belong to another distribution. All distributions have “tails.” One may wonder where the rest of the outlier’s distribution is. Yes indeed. If you throw out the one outlier because it belongs to another distribution then philosophically you should remove all the data that belong to that distribution. You may wonder how to do this. I wonder about this too. No one has ever told me. I suspect it is not easy.

For interested students, this isakin to the concept of a and b errors.

What is reality?

This is a big question. As an experimentalist, I see reality as data. To my eyes there is no bad data, only incompletely understood data. Data may reflect artefacts, poor experimental design, or what have you, but… it is always correct. The interpretation may easily be flawed, but the data is always correct. Remember this.

The Q-test assumes that the data are normally distributed. The normal distribution is a theoretical notion (a very useful one, don’t get me wrong), not reality. I do not recall any of the large data sets that I have worked with really conforming to a normal distribution. They all tend to have broader “tails” than “expected.” I am convinced there is something to this, but don’t have the theoretical background to do much about it. I have heard that the Mathematician Mandelbrot (of fractal and Mandelbrot set fame) has been looking at aspects of this problem.

So why am I saying this? Just that data fitting a Normal distribution is less common than you might think. When you throw out data, what you may be doing is “forcing” your data to match a theory which is not applicable. This may be a case of bending reality (data) to fit a preconceived notion. Be careful.

What do you mean?

This is the most important point. What do you mean when you throw out data? Consider the weighing of pennies. You weigh a thousand of them and then throw out a subset. Does this mean these were not pennies? No, it just means they are different pennies. What about the person getting a 100 percent on an exam where the rest of the class was clustered tightly around a mean of 40? Is this person not a student? There is something different about the person, but what do you intend to do once you have made this distinction?

Which brings me to my final point? Throwing out a point should never be done. What you are doing is separating data based on whether it belongs in a single distribution. The process should not stop there. If it is not the same, what does this mean. Can you learn something from this?

Hopefully I have made my point. I am suspicious of the Q-test. It is too easy, it grips the imagination, and is a bit too satisfying for my taste. Use it cautiously if at all. Remember, these tests are not for “throwing away” data. They are for establishing the likelihood that it belongs to another distribution – not for establishing that the data is “bad.”

Dr. Hanley's Blog

Thursday, March 31, 2005

Quiz your analytical knowledge: Projects 1415

Tuesday, March 29, 2005

My problem with the Q-test (Long)