People often treat statistical hypothesis testing as if it were a magical ritual. If the right steps are taken in the right order, then the truth will be revealed. This is unfortunate because the logic of statistical inference, although not obvious, is not all that difficult to understand. And people who understand have a powerful tool for answering questions and making decisions. Sometimes people prefer to make small decisions on the basis of some random event. For example, we decide who starts a game by the roll of a die or who clears the dishes by the toss of a coin. But most of the time people prefer to act on the basis of real things, not random events. Statistical reasoning helps us to do that. A silly example will help explain how things work. What we need is a variable that probably isn't related to anything. Consider this one: whether the number of letters in a person's last name is odd or even, call it "odd/even." Now assume that we want to answer the question "the average weight of odds is different than the average weight of evens." While it's silly to do so, it would be easy to gather data on the proposition. We could conduct a survey of people asking them the number of letters in their last names and their weight in pounds. We would then categorize people as being either odd or evens and record that information along with their weight. If we had to be sure that we have the correct answer to our question, we would have to do a census by surveying everybody in the population, but usually we're interested in populations so large that interviewing everybody would be impractical. For that reason, we usually interview just a sample of the population. If we do our research with a reasonably sized random sample of people, a few hundred or so, we could calculate the average weight for each group of people. Although we have no reason to think that being an odd or an even has anything to do with a person's weight, we wouldn't expect the averages to be identical. That's because we're dealing with a sample of the population rather than the whole population. In other words, we probably would attribute the difference in the average to random events. Now we get really silly and do our survey over and over again -- hundreds of times. If we did that, most of the time the difference in the weights of odds and evens would be small. But sometimes it would be large -- large enough to make us think that the two groups really did have different weights. The way statisticians think, the differences would be large enough to lead us to believe that the average weight of all odds in the populations was different from the average weight of all evens about five percent of the time. In other words, we would see a "statistically significant difference" about five percent of the time even if there were no difference in the populations . (Of course, that means we'd be wrong. That's what statisticians call "Type I Error," which is the probability of concluding there is a difference when one doesn't exist. " Type II Error" is concluding that no difference exists when one really does.) Of course, we don't want to do our surveys over and over again. And -- thanks to statisticians -- we don't have to. They have figured out how to tell if a difference is large enough to be significant on the basis of properties of the sample. Essentially by assuming that the sample is a random sample, statisticians do lots of mathematics on the means, the variances, and things to determine significance. But we don't have to worry about that because computers do it for us. So what do we do if we are doing a real investigation where we believe there really is a difference in the population? That is, we have a hypothesis. A hypothesis we could be pretty certain of is that one the average men and women have different weights. We could conduct our survey recording each respondent's gender and weight and put the data in a computer to do the calculations that the statisticians have figured out. What the computer would do is calculate the averages (and other things) and give us a probability. This probability tells us the percentage of times that we could expect a sample in which the averages are as different as those we see if the sample came from a population where there really is no difference. That is, the probability tests the null hypothesis that the sample comes from a population where there is no difference. If the probability that the sample comes from such a population is low (usually that means less than five percent), we reject the null and accept the alternative hypothesis that the sample comes from a population where there really is a difference. In other words, if the finding is significant that means the sample probably doesn't come from a population where there are no differences. For that reason we conclude there really is a difference. To summarize: Statistical hypothesis testing helps us keep from making decisions on the basis of random events.
If you don't
understand something in this Web note, please e-mail
Dr. Sitton.
Revised 092811 — http://www.uamont.edu/FacultyWeb/sitton/crz/mrea/logic.html |