Mass Communication Research
Descriptive Statistics Lab 1
Downloading SPSS Datasets, Running Frequencies
Recoding Data and Exporting Output

INDEX SYLLABUS SCHEDULE e-MEDIA COMM-STOP

This lab exercise details downloading SPSS data files from CourseInfo, running frequencies, recoding data to collapse percentage categories, and exporting SPSS output for use in technical reports. Before beginning, make sure you're on a PC with SPSS, available on most of the computers at UT labs and the library. You can look at the codesheet from the Fall 2000 survey to see the actual questions asked.

Downloading SPSS Data Sets
  1. Go to http://online.utk.edu/
  2. Open your blackboard and enter the Com 300 course.
  3. Hit the "Assignments" button on the left-hand side of your screen.
  4. Hit the link to the folder that says "SPSS data."
  5. Look for the Fall 2000 data set, and hit the link that says "CLEAN300.SAV."
  6. A box will pop up once you hit the link asking if you'd rather open the file or save it to disk. Choose the second option and save it to the desktop or to your disk. If you choose to save it to your disk, it should be in the A: drive.
  7. Double-click on the SPSS data file. This will launch SPSS.
Running Frequencies

When we launch the data set, the first thing we need to find out is if we correctly downloaded the data file. To do this, place the cursor on Analyze of the SPSS Menu, then pull down to Descriptive Statistics and then Frequencies. When you do this your screen should look like the figure below.

When you let up on the cursor, a window should appear that looks like the one below (without the variable in the right window).

The Fall 2000 SPSS data set should have 452 cases. We can check any variable to see if there are the correct number of cases, so let's choose gender, which is question number 28. We need to move Q28 to the right window. To do this, scroll down the variable list at the right side of the left window and highlight Q28. Then click on the arrow between the two windows. This will move Q28 from the left window to the right window. The resulting display should look like the figure above.

Make sure the box in the lower right corner that says "Display frequency tables" is checked. Click OK. SPSS will process the data and provide output for the information you requested. It should look like the figure below.


Statistics

q28 Gender
N Valid 452
Missing 0

As noted in the Descriptive Statistics for News Research Web note, frequency distributions are a convenient and comprehensive way of reporting the results from such a survey item. The SPSS output file below indicates that there are 452 valid cases.



q28 Gender

Frequency Percent Valid Percent Cumulative Percent
Valid male 230 50.9 50.9 50.9
female 222 49.1 49.1 100.0
Total 452 100.0 100.0

The table indicates that there are 230 males and 222 females in the sample.

Remember: We're supposed to communicate efficiently. One way of doing so is to turn the frequency distribution into a percentage, which is easier for the general public to understand. However, there are three columns indicating percentages: percent, valid percent, and cumulative percent. Which one should you use?

HINT: Use valid percent in most cases! (I'll explain WHY in a minute.)

The valid percent indicates the divisor is the number of valid cases. In this case, both the percent and valid percent columns indicate the same thing: Respondents were 50.9 percent male and 49.1 percent female. (Note: The cumulative percent column doesn't provide enlightening information when using dichotomous data.) Close your output file but do not save it.

Let's look at another example. This time let's examine how students rated their overall experience at UT (Q2). The first thing we need to do is to find out how the variable is coded. To do this, click on the tab labeled "Variable View" at the bottom left of the SPSS window. Then scroll down to the row with the variable label (Q2), and click on the cell under the column labeled "values." A gray button will appear at the right side of the cell. Click on that and a window like the one below will open showing how the variable is coded.

For Q2, you can see that it has the following codes:
1="excellent"
2="very good"
3="good"
4="poor"
5="very poor"
99="don't know/no answer (you'll need to scroll down to see the 99s)

Try another one, this time without the graphics to help you. Hit the tab labeled "data view" in the bottom left-hand corner of the screen. Place the cursor on Analyze, then pull down to Descriptive Statistics and then Frequencies. Move Q2 to the right-hand column (if there are variables in the right-hand column from previous analysis, you can remove them by clicking "reset."). Make sure the box that says "Display frequency tables" is checked. Click OK. You will receive output for the information you requested.

The SPSS output file indicates there are 452 valid cases, i.e. no one answered "don't know/no answer". In the next box (labeled Rate Overall Experience), the first column indicates responses on a Likert Scale — what level of measurement is being used? The frequency table indicates the magnitude and scope of this problem through frequencies and percentages: 47 students indicated their overall experience was Excellent (10.4 percent); 197 said it was Very Good (43.6 percent); 185 said it was Good (40.9 percent); 20 said it was Poor (4.4 percent), and three students said their overall experience was Very Poor (0.7 percent). Look at the output you obtained and make sure that you see these results.

HINT: Cumulative percent can tell us something in this case. Notice that the cumulative percent in the Very Good row is 54.0 — what is this telling us? This number is the result of adding the frequencies of Excellent and Very Good (47 + 197 = 254) and dividing that number by the valid cases (254/452= 0.5398 * 100 = 54.0 percent).

Why is this helpful? If we split the interval variable into two categories (which would convert it to a nominal-level dichotomous variable), we can communicate the findings better. In this case, we would make one category "good" ratings and one category "bad" ratings. (For an example of another way to do this, see recoding data to collapse categories).

By looking at the cumulative percent column, we could say "Almost 95 percent of UT students find the overall experience at UT to be good, with just over 10 percent indicating it's excellent." By a similar token, we could say "Slightly more than five percent of UT students indicate their overall experience is either poor or very poor." We get this by adding the poor and very poor valid percents together (4.4 percent + 0.7 percent = 5.1 percent).

Why not use the percent column?

The reason we prefer to use the valid percent column is the first "percent" column can sometimes be misleading, especially if there's missing data in the variable we're observing. Let's examine whether students believed Dr. Gilley's proposals would improve the value of a UT education (Q7), answered in the following fashion: (How is this variable measured? Why?)

  1. Improve a great deal
  2. Improve some
  3. Not improve
  4. Decrease Value
  5. Don't know/No answer
Once again, place the cursor on Analyze, then pull down to Descriptive Statistics and then Frequencies. Move Q7 to the right-hand column (move any previous variables back to the left-hand column). Make sure the box that says "Display frequency tables" is checked. Click OK. You will receive output for the information you requested.

Notice that there are only 426 valid cases, and 26 "missing cases" — this is data where the respondent refused to answer or didn't know an answer. These cases account for 5.8 percent of the 452 total cases in our SPSS data file.

Now look at the differences in percentages between the "percent" column and the "valid percent" column. What is this telling us? The percent is giving overall figures, while valid percent only includes valid cases where a correct answer was given, e.g. looking at the "percent" column, we see that 58 percent said Gilley's ideas would improve the value of a UT education "some", but that number increases to 61.5 percent when only counting valid cases. We'd report the valid percent with a note that answers to this question are based on a subsample.

Recoding Data for Collapsing Categories

Often we need to manipulate our data before we have the computer calculate statistics for us. SPSS provides a lot of flexibility for collapsing categories through recoding. There are lots of steps in recoding, so it's easy to make a mistake. After you've made certain that the recoding has worked the way you want, then you can proceed with your analysis. An example will make this clear.

Let's return to the example of students rating their overall experience at UT (Q2). As previously noted, we could look at cumulative percent to break the answers into dichotomous categories of "good" and "bad" ratings. However, we could also recode it.

Instead of just recoding the old variable (which we might want later), let's create a new variable that will give us values for "good" and "bad" ratings. Now, to create our new variable, we go to Transform, and then Recode and choose "into different variable." When you do this, a window like the one below will open.

When you let up on the cursor, a window something like the one below will open.

The first thing we want to do is send the variable we want to recode (in th the variable into the right-hand columm.

Next type the name of the new variable we're creating in the box labeled, "Name," in the "Ouput Variable" box. Let's name it "Ratings." Then click on "Change."

Next, click on the "Old Values and New Values" button on the left side of the window, which will open a new window like the one below. We want to collapse the categories into a dichotomous variable where the answers 1=excellent, 2=very good, and 3=good becomes a new response, where 1 represents "good" ratings. At the same time, we want to make the answers 4=poor and 5=very poor into a new response, where 2 represents "bad" ratings.

To fill in the blanks in the window, choose the option "Range" and enter "1" through "3." Next, in the box on the right side of the window labeled new value enter the value "1," and click the button labeled "Add."

Go back to the left side and enter "4" through "5;" go to the right side and enter the value "2" as the new value and click on the "Add" button. Since this is all the recoding you want to do, click on the "Continue" button. When you get back to the main recode window, click on "OK."

If you've done everything right, the computer will create a new variable named "Ratings." If you want to see it, you can scroll the window to the right when looking at the spreadsheet. You will find the new variable "Ratings" in the far right-most column, where it follow the names of the survey administrators from the Fall 2000 semester.

It's generally a good idea to run frequencies on a variable you recoded or a new variable you've created to make sure that everything has worked the way you want it to. You already know how to do that. Give it a try and make sure that you have output like that shown below.



RATINGS

Frequency Percent Valid Percent Cumulative Percent
Valid 1.00 429 94.9 94.9 94.9
2.00 23 5.1 5.1 100.0
Total 452 100.0 100.0

Running frequencies on our new variable (which we'll find at the bottom of our variable list in the left box), we'll notice that there are 452 valid cases and two categories for ratings: Good - 429 students (94.9 percent) and Bad - 23 students (5.1 percent). Now we have a dichotomous variable.

Exporting Output for Use in Other Documents

Often we'd like to put a copy of SPSS output into another document. This is especially useful when we'd like to transfer a SPSS frequency table to a technical report. Fortunately, it's not hard to do.

If you see something that you'd like to use in the output window, just click on it. That will put a red arrow beside the element. Then click on the "File" button at the upper left corner of the window and scroll down to "Export." You should see something like the figure below.

Click on "Export" and you'll see a window like the one below.

Click on the "Browse" button, and choose a folder where you want to save the output. After you've done that, click on "OK." That's all you need to do.

Now you can go to the folder you selected. There you'll find a file named "output.htm" that you can open with MicroSoft Word and copy and paste into another word document. That's a handy thing to do if you need to provide a table for a technical report.

Of course, there are more things you can do to export output and you can explore them if you like. But this will do the trick.


If you don't understand something in this Web note, please e-mail Dr. Sitton.

INDEX SYLLABUS SCHEDULE e-MEDIA COMM-STOP

©M. Mark Miller & Ronald W. Sitton 2009
Revised 092811 — http://www.uamont.edu/FacultyWeb/sitton/crz/mrea/spssdownload.html