Fall Semester 2002


Lecture Notes Two: Measures of Variability
Here's the minute paper of last time:

The deVoe Report (June 2, 1980) quoted then U.S. President Jimmy Carter as saying "half the people in this country are living below the median income -- and this is intolerable." What is disputable and what is true in this quote?
Here are some of the correct answers:
"President Carter must have meant that whatever the actual value of the median income was, it was intolerable that half of the nation lived below that income."
Indeed, that was the answer I was looking for.

The Carter quote was obviously taken out of context. We don't know what he said right before the quoted text, but he might have actually expressed the median income in dollars. Here's an extremely contrived version of this hypothesis, for illustration purposes:

"We have calculated the median yearly income and we found it to be (say) $600. We think this is a problem that needs to be addressed immediately; half the people in this country are living below the median income -- and this is intolerable."
There were some answers saying that Carter probably meant the mean. He did not mention the (arithmetical) mean, and that's probably because he did not want to say anything about it. He only wanted to make a comment about the median. He found it too low, and he expressed a concern that the value is too low to be the upper limit of income for half of the nation.

What makes this example intriguing is that, taken out of its context, it drastically polarizes our assumptions about what is said, placing the focus of our understanding on the wrong aspect. To see how this happens in another example (for your enjoyment) witness the following English sentence.

The ship sailed past the harbor sank.
How does this sound? Well, here's the same sentence in its original context:
A small part of Napoleon's fleet tried to run the English blockade at the entrance to the harbor. Two ships, a sloop and a frigate, ran straight for the harbor while a third ship tried to sail past the harbor in order to draw enemy fire. The ship sailed past the harbor sank.
I hope, perhaps, this makes the point. The question, and the quote, were tricky.

You need to watch for tricks like this in real life too.

Last time we looked at some measures of central tendency.

The homework is asking you to compare them.

The minute paper for today will be communicated in class.

This will be a more mathematical lecture.

1. The Arithmetical Mean.

The arithmetical mean is defined as the sum of the scores divided by the number of scores.

In equation form that is

Properties of the mean:
  1. The mean is sensitive to the exact value of all the scores in the distribution.

  2. The sum of the deviations about the mean equals zero.
  3. The mean is very sensitive to extreme scores.

  4. The sum of the squared deviations of all the scores about their mean is a minimum.

    In other words, this formula (in which zeta is an unknown)

    admits a minimum when zeta has this value
    We need to verify that.

  5. Under most circumstances, of the measures used for central tendency, the mean is least subject to sampling variation. If we were repeatedly to take samples from a population on a random basis, the mean would vary from sample to sample. The same is true for the median and the mode. However, the mean varies less than these other measures of central tendency. This is very important in inferential statistics, and is a major reason why the mean is used in inferential statistics whenever possible.
2. The Median.

The median is defined as the scale value below which 50% of the scores (or measurements) fall.

Properties of the median:

  1. The median is less sensitive than the mean to extreme scores.

  2. Under usual circumstances, the median is more subject to sampling variability than the mean but less subject to sampling variability than the mode.

3. The Mode.

The mode is defined as the most frequent score in the distribution.

Usually distributions are unimodal. When a distribution has two modes it is bimodal.

MEASURES OF VARIABILITY

1. The Range.

The range is defined as the difference between the highest and lowest score in the distribution.

2. Deviation Scores.

A deviation score tells how far away the raw score is from the mean of its distribution.

3. The Standard Deviation.

For a population of scores we have:

For a sample we have:
Alternative formula for the standard deviation:
Properties of the standard deviation:

  1. The standard deviation gives us a measure of dispersion relative to the mean. This differs from the range, which tells us directly the spread of the two most extreme scores.

  2. Like the mean, the standard deviation is sensitive to each score in the distribution. If a score is moved closer to the mean, then the standard deviation will become smaller. If a score shifts away from the mean, then the standard deviation will increase.

  3. Like the mean, the standard deviation is stable with regard to sampling fluctuations.

  4. Both the mean and the standard deviation can be manipulated algebraically. This is an important aspect, as it allows mathematics to be done with them for use in inferential statistics.

In lab tomorrow you should check the following experiment:

Squared deviations and the mean

We will "prove" today (in lab) that the sum of the squared deviations of all the scores about their mean is a minimum. In other words, the formula below, in which zeta is an unknown (or variable)

admits a minimum for
Let's prove that (and in the process calculate other things as well).

Here are the steps:

  1. Open up Excel. New worksheet.

  2. Enter these numbers: 1, 2, 3, 2, 4 in cells A1:A5.

    That's our data (the scores).

  3. In E1 write this formula
    =average(a1:a5)
    For me that is 2.4 (the arithmetical mean of these 5 numbers).

  4. Let's calculate the deviations to the mean.

    In B1 write the formula for the first deviation:

    =A1-$E$1
    Notice that the second element has an absolute reference to column E row 1.

    When we paste this formula that will become relevant.

    The first deviation is -1.4 so the formula works fine.

  5. Select cell B1. Drag the lower right corner of the cell to B5. Release mouse button.

    The deviations are calculated.

  6. Select cell B5. The formula inside it should be:
    =A5-$E$1
    Excel updated only the relative components of the formula.

  7. In cell B6 enter this formula
    =sum(b1:b5)
    The value should be 0 (zero).

    We have just verified that the sum of all the deviations about the mean is zero.

  8. Let's square the deviations one by one.

    In cell C1 write this formula

    =b1^2
    The value is 1.96 for me, the square of -1.4 as expected.

  9. Now let's paste this formula throughout.

    Select C1. Drag lower right corner to C5. Release mouse button.

    Squared deviations have been calculated.

  10. Let's finish this first part.

    In cell C6 enter the following formula

    =sum(c1:c5)
    That's the sum of the squared deviations about the mean, and it's 5.2 for me.

  11. In e2 write this formula:
    =count(a1:a5)
    That's 5. I leave it to you to label your spreadsheet nicely.

  12. Calculate the standard deviation in E3:
    =sqrt(c6/e2)
  13. Do it again like this in E4
    =sqrt(sum(c1:c5)/count(c1:c5))
  14. Do it again in E5 as follows
    =stdevp(a1:a5)
  15. Do it again in E6 as follows
    =stdev(a1:a5)
    That should clarify the difference between STDEVP and STDEV.

  16. Now let's work on the main point of this lab.

    In A9 through F9 enter 0, 1, 2, 3, 4, 5.

    We'll calculate the sum of squared deviations around each of these numbers.

  17. Enter this formula in A10:
    =($A1-A$9)^2
  18. Paste this formula to F10 (drag lower right corner to F10).

  19. Select cells A10:F10. Drag lower right corner to F14.

    The cells from A10 to F14 should now contain squared deviations.

  20. Think a bit about it.

  21. Now let's sum the squared deviations.

    In cell A15 enter this formula

    =sum(a10:a14)
  22. Paste this formula through F15.

  23. With A15:F15 selected click the Chart Wizard button.

    Choose "Line" and click "Finish". Still another way would be to

    1. select A10:F10, then
    2. press and hold the Control key, then
    3. select A14:F14, and then
    4. release the Control key. After this
    5. push the Chart Wizard button and
    6. choose the Scatter Plot type of chart.

  24. You're done. Notice the minimum around 2.4 (the arithmetical mean).

  25. You can now change the numbers and see the chart change.

    There will be a new mean, but that's where the minimum will also be.

  26. Please work through this on Friday in labs.

  27. You can use a longer sequence, with a different range.

  28. Please let me know if you have any questions or if you need help.

  29. At this point you should have finished reading Chapter 2 in Kirkup.

  30. Please take a look at Chapter 1 to make sure it's not entirely foreign to you.

  31. Next week we will start Chapter 3, and we will cover it selectively.

  32. So please look through all of it and identify points of potential difficulty.

  33. I will post the reading assignments for next week over the weekend.

Lab notes for tomorrow will be posted in the morning.


Last updated: Oct 31, 2002 by Adrian German for A113