Go to the assignments page in Course Info and download the clean300.sav SPSS data file to your computer. (If you've forgotten how to download a file, you should review how to do that in the Descriptive Statistics 1 Lab Notes). Open the file in SPSS. It contains data from the Spring 2000 survey. Today we'll work with the Chi-square nonparametric statistic, which is covered in further detail in a Web note. The Chi-Square Goodness of Fit Test First, let's find out if the sample conforms to our expectations. One way of doing this is to examine the distributions of variables where we already know what the distribution should be. For example, when Com. 300 students conducted their surveys in the fall, they were instructed to interview an equal number of males and females. To find out if that happened, let's run frequencies on gender, which is Variable Q28 in the clean300.sav data. We covered running frequencies in the first lab on descriptive statistics, so we won't go over it here. (You may want to review that lab, however.) When we run frequencies on gender, we get output that looks like the following:
Obviously, this is very close to a 50-50 split by gender, but is it close enough to be attributed to the random chance we would expect because we are dealing with a sample? This amounts to testing the hypothesis: The proportion of males and females is not equal, which is a two-tailed hypothesis. Let's run a Chi-square Goodness of Fit test to test the hypothesis. To do this with our data set:
You should see a window like the one below.
Click to open a new window. When the window opens, scroll down variable list and highlight gender
(q28). Then click the arrow to move it into the test variables list. In
the "Test Variables" Box, make certain that "all categories equal" is
checked. Click OK. You should get output like that shown below.
Frequencies
This table shows that if we expect the categories of gender to be equal, there will be 226 males and 226 females. The residuals column indicates that there are four more males that would be expected by chance, and 4 less females. Thus, the expected and observed values are very close. But are they close enough to be attributed to chance? Let's look at the next table.
This table shows a significance level of .707, which is much higher than our "critical" level of .05. Therefore we accept the null hypothesis. That is, we conclude that the difference in the proportion of males and females is attributable to chance. Let's look at another variable where we might know what the distribution should look like before we gather the data. Consider Class Rank, variable Q1 in our data. When you run frequencies on Class Rank you should get a table like the one below.
Because this looks a bit odd, we decided to call the UT Office of Institutional Research and ask about the actual distribution of students by class rank. Let's assume the Institutional Research Office says that the student body is 33% Freshmen, 28% Sophomores, 22% Juniors, and 17% Seniors. Using these figures we can calculate expected values for our sample of 452. That is, we would expect 150 Freshmen (33% of 452), 127 Sophomores (28 % or 452), 99 Juniors (22% of 452), and 76 Seniors (17% of 452). Armed with these figures, we can do a Chi-square Goodness of Fit Test to see if the sample characteristics are in line with the known population figures. This tests the hypothesis that the frequencies are different in the population and the sample. To do this with our data set:
When the window opens, scroll down variable list and highlight Class Rank (Q1). Then click the arrow to move it into the "Test Variables" box. Then click on the word "values" in the "expected values" box. We'll insert the expected values we calculated above here.
When you've finished entering the expected values, click OK. You should get output like that shown below.
Chi-Square TestFrequencies
In the top box, we can see the observed frequencies from the data, which are the same as we got running the Frequencies Procedure in the first column. In the second column are the expected frequencies, which we entered. In the third column are the "residuals," which is statistician talk for the difference between the observed and expected values. From the residuals column we see that there are 31 fewer Freshmen than expected, 4 more Sophomores than expected, 14 fewer Juniors, and 41 more Seniors. These are large differences, but are they too large to be attributed to chance? To answer that question we look in the bottom box to find the significance level, which is reported as .000. Clearly, .000 is less than the critical value of .05 so we reject the null hypothesis that the expected values and the observed values are close enough so that their differences are attributable to chance. We accept the alternate hypothesis that the sample frequencies are different than the population frequencies. The Chi-square Test for Independence The Chi-square test for Independence operates by comparing observed frequencies to expected frequencies. Most of the time we don't know ahead of time what the expected frequencies will be, as we did in the examples above. Instead, the expected values must be calculated from the data that we have gathered. To see how this is done, let's test the hypothesis:
The independent variable in this case is sources of funding for a college education, which are indicated in Question 22. Six options for primary source of funding are offered.
Obviously,
this variable needs to be recoded if its categories are to indicate simply
whether or not students' college funds come from personal sources or from
grants and scholarships. To get the variable set up appropriately for the
stated hypothesis, we need to create a new variable with responses 1, 2,
3, and 5 in one category which indicates the funding is coming from
personal sources (or sources that may have to be paid back). Responses 4
and 6 are put in another category, which indicates that the students'
primary source for college funding comes from grants or scholarships. (If
you don't remember how to collapse categories, refer to Descriptive
Statistics Lab 1).
How to get the Chi-square statisticTo do a Chi-square go to
analyze, descriptive statistics, crosstabs. You'll see a window like the one below. This will open a window similar to the one below.
You will see rows
and columns. You want the independent variable to be the column and the
dependent variable to be the row. In this case, source of funds will be
the column and willingness to pay will be the row. (HINT: The variables have been placed correctly in the window above. It's important
to set up the data correctly in order to be able to correctly interpret
the data.)
All you want to do is click "Chi-square" which is in the upper left-hand corner of the window. Click continue and you will be taken back to the crosstabs window. Click the "cells" button (next to statistics), and a window like the one below will open.
In the
cell window you want to check "observed" and "expected" counts in the
upper left-hand corner and in the lower left-hand corner check column
percentages. Then click continue. This will take you back to the crosstabs
window. Click OK and you will get crosstabs with a Chi-square value. You
should get output like that displayed below.
How to read the output?
Look at the
Chi-square tests. The Chi-square value tells if that comparison is
significant. We are using the critical level of .05. Is there
significance? The Pearson Chi-square shows .122 significance for a
two-tailed hypothesis. Since this is a one-tailed hypothesis, we divide that by 2, which equals .061. That's greater than .05 so we say accept the null hypothesis the proportions in the two samples are not
different.
We'll need to use question 8, which is reproduced above, and
question 7:
In this case
variable variable 7 is the independent variable and variable 8 is the
dependent variable.
If you don't
understand something in this Web note, please e-mail
Dr. Sitton.
Revised 092811 — http://www.uamont.edu/FacultyWeb/sitton/crz/mrea/chi-squarelab.html |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||