Fall Semester 2003

Our fourth time in lab, Friday November 7, 2003
Nov 7, 2003

Due today
Reading Assignment: Chapters 4 and 5 in the text.

Starting today
LAB ASSIGNMENT TWO (the questions listed below).

Due next time
Reading Assignment: Chapter 6 (Statistical Inference) in the text.

What is the lab assignment?
Answer the questions below also indicating the page in the book where the answer can be found.

When is it due?
Check the What's Due? page for details.

What's the best approach to this assignment?
Read the book, page by page. The questions below are formulated in the order in which these topics are covered in the book. These questions will be used on the Midterm and the Final Exams.

Here are the questions

Chapter 4: Describing Your Data.

  1. What does Chapter Four aim to cover?

  2. What is descriptive statistics?

  3. How is that different from inferential statistics?

  4. What is univariate statistics?

  5. What is the definition of a variable given by the book?

  6. What’s the difference between quantitative and qualitative variables? Give some examples.

  7. What’s another term by which we denote qualitative variables?

  8. What is the difference between ordinal and nominal variables?

  9. What kind of variables does this chapter (4) work with primarily?

  10. What chapter works with the other major kind of variables later and what kind of variables does this question mention?

  11. What is a distribution (of values)? How is a frequency table useful in the study of it?

  12. What is the range of housing prices in HomeData.xls?

  13. What percentage of houses list for under $125,000 in HomeData.xls?

  14. How can you create a frequency table with StatPlus?

  15. How do you find what Excel’s FREQUENCY function does for you?

  16. Explain the use of bins (what’s that) in conjuction with frequency tables.

  17. What is the meaning (or purpose) of Figure 4-3.

  18. Complete the following sentence from the book: “Frequency tables are good at conveying specific information about a distribution, but they often lack ____________.”

  19. What is a histogram?

  20. Define the following terms: frequency, cumulative frequency, percentage, and cumulative percentage.

  21. What do we mean by “the shape of a distribution”? Why do we care about it?

  22. What is a “skewed” distribution? In how many ways can it be “skewed”? Can you give some examples?

  23. What is the opposite of a “skewed” distribution?

  24. What do we mean by a distribution’s “tails”?

  25. Why does the distribution of sample home prices in the HomeData.xls example seem to be positively skewed?

  26. In a positively skewed distribution which value is largest: the mean, the mode, or the median?

  27. In the example used in the book what information is revealed by breaking a histogram into categories? Do you agree with the book’s conclusion?

  28. What additional information is made available if we compare histograms side by side?

  29. What is the primary difference between the two histograms in the book when compared side by side?

  30. What does a stem and leaf plot show? What is its structure? How do you create it?

  31. Does a stem and leaf plot resemble a histogram? In what way?

  32. In what way the stem and leaf plot is at a disadvantage compared to a histogram? What can be done in those situations?

  33. Are there any advantages that the stem and leaf plot offers compared to histogram?

  34. What page in the book shows how a stem and leaf plot can be created with StatPlus? Does it show a picture of the outcome? Is the output static or dynamic?

  35. What is a stem multiplier?

  36. Does the stem and leaf plot built in the book show a house priced at $169,500?

  37. What is the term used in the book for “statistics that summarize key elements of the distribution”?

  38. Define the following terms: pth percentile, quartiles, interquartile range? What does the interquartile range indicate (roughly)?

  39. Apart from using a frequency table (which is cumbersome and time consuming for large distributions) what options does Excel offer for calculating percentiles and quartiles of a given distribution?

  40. What’s the main difference between the two Excel functions PERCENTILE and PERCENTRANK?

  41. Does the experiment on pages 138-140 support or refute the hypothesis that the houses in the northeast sector are more expensive? In what way?

  42. What are three measures that can be used to summarize the “typical” or “most representative” value in a data set? Which one of them is located at the 50th percentile? Does the number of values in the data set influence the calculation of any of these measures?

  43. One weakness of the mean is that it is influenced by extreme values. What is the example used in the book to demonstrate this aspect? What other measure of central tendency is more adequate for that example? What are the two competing perspectives (both justified) that the example mentions? What then is to be learned from this example?

  44. How do you calculate the 5% trimmed mean?

  45. When is the geometric mean most often used? When is the use of geometric mean completely out of question?

  46. Is the harmonic mean used at all? In what cases?

  47. What is the mode and when do you use it?

  48. What is the purpose of the experiment on pages 143-144?

  49. Why do we expect “the average home price in Albuquerque [to be] higher than the median value” in the experiment above?

  50. Define variability, range, and the deviation of each data point to the mean value. How are the deviations used in the definition of variance?

  51. What is the standard deviation? Why do we divide by (n-1) instead of just (n)?

  52. What do the Excel functions KURT and SKEW calculate?

  53. What does positive kurtosis mean (or signify)?

  54. Why is the variance so big (compared with the other values) in the experiment on pages 148-149?

  55. Interpret the skewness and kurtosis values you obtain in the experiment above.

  56. What are outliers? How do you obtain outliers (that is, what might cause them?)

  57. Are there any outliers in the BigTen experiment?

  58. What are possible solutions to the problem of outliers?

  59. How do you determine that a value is really an outlier? Define moderate/extreme outliers.

  60. What’s the purpose of Figure 4-24 in the book?

  61. Where can you find the concept tutorials?

  62. Who came up with and popularized the idea and use of boxplots?

  63. What is a boxplot? What do they show?

  64. In a boxplot what does it mean for the median line to be close to the first quartile?

  65. How many fences do you have in a boxplot (overall)?

  66. What do we mean by “severity” of outliers?

  67. Are the two whiskers of the same length in a boxplot? Who determines their length?

  68. What does the length of the whiskers show?

  69. Can you use the interactive tutorials to actually run experiments?

  70. How do you create a boxplot in Excel?

  71. What does the boxplot say about the Albuquerque data set?

Chapter 5: Probability Distributions

  1. What is the purpose of Chapter 5?

  2. What is a random phenomenon? Give an example.

  3. How does the book define the theoretical probability of an event?

  4. How do we quantify random phenomena through observation? What is the relative frequency?

  5. What does the law of large numbers state?

  6. What is a probability distribution?

  7. Give two examples of discrete probability distributions.

  8. Why does the book bring up the Poisson distribution?

  9. How many parameters does the formula for the Poisson distribution have?

  10. Give an example in which the Poisson distribution is usually applicable and explain why.

  11. How are discrete distributions usually displayed?

  12. How are continuous probability distributions different from the discrete probability distributions? Give an example.

  13. What is the probability of any specific value in a continuous probability distribution? (Next page: Why?)

  14. What is a PDF and where is it useful?

  15. Given a PDF curve how do we calculate the probability associated with a range of values?

  16. (Interactive Tutorial) What is the purpose of the first interactive tutorial (on pp. 173-175)?

  17. (Interactive Tutorial) What is a uniform distribution?

  18. (Interactive Tutorial) Give another example of a situation where a Poisson distribution would be applicable. What are the prerequisites in this situation?

  19. (Interactive Tutorial) On the last page of the tutorial, what is the probability of a value being between -0.96 and 0.99 (approximately -1 and 1). How about between 1 and 2.

  20. (Interactive Tutorial) Using this last page in the tutorial can you obtain a value of 1 (that is, 100%) for the probability of a value being in a certain range? What would that range be?

  21. (Interactive Tutorial) What is the purpose of the two horizontal scroll bars below the chart? What do you use them for?

  22. What is a random variable?

  23. How is a discrete random variable different from a continuous random variable?

  24. Explain the use of upper and lowercase in the context of probability distributions.

  25. Define observation, sample, and random sample.

  26. In most cases we want our samples to be random samples to give a true picture of the underlying probability distribution. Please consider the following problem:

    During World War II many economists, mathematicians, and statisticians were members of Columbia University's Statistics Research Group, which did high-level consulting work for the armed services. As part of this group's work, statistician Abraham Wald was asked where to place armor on planes. It seemed obvious to the aircraft engineers that armor was needed at the places most frequently hit, as found in a large sample of battle-proven airplanes. After studying the bullet holes of a sample of returning planes, Wald's conclusion was to place the armor where bullet holes were least frequently found in these planes, and that's what he recommended.

    Now the questions:

    1. Was his reasoning justified?
    2. Was there anything wrong with the aircraft engineers' sampling design?
    3. Did they overlook anything?

    Part of the challenge in statistics is to remove all bias from sampling. This is difficult to do and subtle biases can creep into even the most carefully designed studies. Here's another problem:

    ABC's 20/20 television broadcast on July 16, 1993 reported on a study in which individuals who had lived to be 100 years of age or more were queried in the hope of finding common characteristics. The implication was drawn that if a younger person worked at acquiring the characteristics shared by these centenarians, then the probability of reaching such an old age increased. Why was this study design inappropriate for the implication drawn?

  27. (Interactive Tutorial) What is the second tutorial about?

  28. (Interactive Tutorial) How does the observed distribution compare if you increase the number of shots taken?

  29. (Interactive Tutorial) The distribution of the shots around the target is described by a ________ density function because it involves two random variables.

  30. (Interactive Tutorial) How is the histogram in the tutorial describing the distribution of shots?

  31. What is the normal distribution?

  32. How many parameters does the normal distribution have?

  33. What are they?

  34. (Interactive Tutorial) What is the name of the third interactive tutorial in this chapter?

  35. (Interactive Tutorial) Name the distributions presented in the tutorial.

  36. What percent of the values in a normal distribution is located within one standard deviation to the right of the mean?

  37. What EXCEL functions can you use to work with the normal distribution?

  38. What does NORMDIST(40, 50, 4, TRUE) calculate?

  39. What does NORMDIST(40, 50, 4, FALSE) calculate?

  40. What does NORMINV(0.90, 50, 4) calculate?

  41. How do you check if your data is normally distributed?

  42. What is a normal score?

  43. What is a standard normal distribution?

  44. Explain what the normal probability plot is.

  45. (Interactive Tutorial) What is the name of the next file you work with?

  46. One step is missing in the instructions for the experiment on page 188. What is it?

  47. What does the normal probability plot indicate in this case?

  48. How do you calculate the expected batting average if you know the normal score?

  49. Does the normal probability plot show if the data is skewed or not?

  50. What does the title (Parameters and Estimators) of the section that starts on page 191 refer to?

  51. Explain what we mean by consistent estimators.

  52. Provide an answer to the following question (which appears at the bottom of page 191): "How large must a sample be to estimate accurately the value of mu?"

  53. Explain what we mean by the sampling distribution. Whose distribution is it?

  54. Explain what the standard error is and how it is related to the sampling distribution.

  55. State the Central Limit Theorem.

Last updated: Nov 4, 2003 by Adrian German for A113