Mass Communication Research
Intro to Statistics

INDEX SYLLABUS SCHEDULE e-MEDIA COMM-STOP

I. Statistics

    Statistics: the science that uses mathematical methods to collect, organize, summarize and analyze data.

    What you need to know: Statistics provide valid and reliable results only when the data collection and research methods follow established scientific procedures.

II. Descriptive Statistics

    Descriptive statistics are used to reduce data sets to allow for interpretation. The numbers function to organize what we know about a subject. Think of sports scores, homerun records, measures of educational excellence, trends in media economics or public opinion.

    The term distribution refers to a collection of numbers.

    Frequency distribution means a collection of scores, ordered according to magnitude and their respective frequencies. Think of the frequency of something's magnitude. How about UT's national standing in football? What about it's academic standing? What if the measure is financing education, as in funding levels compared to other states and universities? Where do we stand? What about the distribution of a population on a public opinion survey?

    See how we transform data into proportions or percentages? It helps us make sense of the world and make predictions.

III. Summary Statistics: The idea is to summarize the characteristics of a population of people by measures of central tendency or variability.

    A. Measures of Central Tendency: What is a typical score? The mean, the median, and the mode are identical in a truly normal distribution, in which case we would report the mean. When the distribution is skewed, we prefer the median, and when it is irregularly shaped, we prefer the mode.

      Mode: Scores occurring most often in a distribution of numbers. The only measure of central tendency appropriate for variables measured at the nominal level is the mode. This is only good common sense. If you have a dichotomous variable, e.g. gender, it's a good thing to know which score occurred most often. Using the same gender example, though, we'd notice that the mean, aka the average, wouldn't mean too much, i.e. if the mean is 1.8, is that a male or a female? 8/10 of a female? In other words, the mean doesn't make sense with nominal data.

      By a similar token, the median wouldn't tell us much of anything in nominal data. Using gender again, there are only two answers, 1 for males and 2 for females. So what's the middle score? If we had 35 respondents and 17 were males and 18 were females, the middle score would be a 2. But in actuality, this is just telling us there are more females than males in that sample, which is what the mode is telling us as it indicates the score occurring most often in a distribution.

      Median: The midpoint of a distribution (half the scores lie above it, half below it). Just look at the list. What's in the middle? The median is appropriate for variables measured at the ordinal level.

      Mean: The average of a set of scores. The sum of all the numbers in a list simply divided by the total number of scores in the list. The mean is an appropriate statistic for variables measured at the interval and ratio levels, but not for ordinal or nominal measures. And remember the mean is sensative to outliers, or extreme scores. Can you think of an example in public opinion on public affairs where the extreme might skew the sample? What if a large block of religious-right voters refused to answer or tell the truth on public opinion polls but showed up at voting booths in large numbers? The polls might be wrong due to the skewed sample. This we consider a subversion of the research process and not good for the measures of our democracy.

    B. Measures of Dispersion or Variability

      Range: The difference between the highest and lowest scores. Rarely used as the sole measure in mass media research.

      Variance: A mathematical index of the degree scores deviate from or are at variance with the mean. A small variance indicates that most of the scores in the distribution lie fairly close to the mean; a large variance represents widely scattered scores. Valuable in analyzing multiple populations with ANOVA.

      Standard deviation: Estimate of the scores about the mean. Approximately 66 percent of all cases will occur within the first standard deviation above and below the means. The second standard deviation contains approximately 95 percent of all cases above and below the mean. The third standard deviation contains approximately all of the cases.

IV. Inferential Statistics: Statistics allow researchers to make inferences, i.e. conclusions, about populations from which a sample has been taken. Tukey (1986) indentifies four purposes of statistics:
  1. To aid in summarization.
  2. To aid in "getting at what is going on."
  3. To aid in extracting "information" from the data.
  4. To aid in communication.
Inferential statistical methods are divided into two categories: parametric and nonparametric. According to Wimmer and Dominick (2000), the primary difference between the two is nonparametric statistics make no assumption about normally distributed data, while parametric statistics assume normality.

StatisticAppropriate dataAssumptionsStatistical analysis
Nonparametric
(Chi-Square)
Nominal
Ordinal
no assumption about normally distributed data
sample values are independent
"goodness-of-fit" test
Contingency Table Analysis
Parametric
("t")
Interval
Ratio
assumes data deviate equally from the meant-Test


Some of the inferential statistical analyses we'll cover in this course include nonparmetric procedures such as the chi-square "goodness-of-fit" test and Contingency Table Analysis, and the parametric procedures of t-Test, Analysis of Variance (ANOVA) and Correlation.

Not so long ago, statistics were calculated by hand. Today, we have powerful computers to do the calculations, thus speeding up the research process. Though computers can do the calculations amazingly fast, there is still the possibility of human error while entering the data, or as my chemistry teacher used to say, "The problem's not with the calculator, but the calculator operator."

We'll use SPSS for the majority of our work, but there are many statistical sites online, including: If you don't understand something in this Web note, please e-mail Dr. Sitton.

INDEX SYLLABUS SCHEDULE e-MEDIA COMM-STOP

©M. Mark Miller and Ronald W. Sitton 2009
Revised 092811 — http://www.uamont.edu/FacultyWeb/sitton/crz/mrea/statintro.html