What If You Just Have Summary Statistics?

Posted on May 2, 2012


This came from an AP-Stat listserv discussion April 2012:

Hello Tim, Corey, Dennis –

I also use Fathom and appreciate its power and flexibility in the educational setting. However, a serious limitation for me is Fathom’s inability of handling categorical data from summarized values as those from data tables in many AP textbook exercises. Question: Am I being unfair (or not accurate) in this assessment? And, most importantly, how do you handle this limitation?

Maxwell Pereira
FDR High School
Brooklyn, NY

It is true that you can’t just create a data table of aggregate values and then do everything with it that you would do to case-by-case data.

However, in many situations you can accomplish what you need to do using other reasonably straightforward techniques. To do a test for Goodness of Fit, for example, you typically use case-by-case data. But if they are not available, you can simply enter the populations of the cells. This is true for all of Fathom’s tests or estimates (except ANOVA, not sure why not!).

Here is a new “test” object with Goodness of Fit chosen:

A fresh, new goodness-of-fit test

Same test with names edited and cell counts entered. Now a p-value appears.

What I say here for categorical values is true for numerical values as well: to get a CI for a mean you can enter the sample mean, SD, and sample size into an estimate, and Fathom will compute it.

Where Fathom falls short is when you want these data but don’t want to do a traditional test or estimate. For example, suppose you have poll results: 500 out of 1000 men prefer chocolate. 550 out of 1000 women do. Are the underlying proportions really different? In Fathom, you can easily test that difference (normal approximation, using z), or get an interval estimate for the difference of proportions to any degree of confidence you want. But if you want to do a randomization procedure of some kind, you need a collection with 2000 cases and two attributes (sex and preference)—which actually makes sense given what you’re trying to do. (New post about how to do this.)

One of the most annoying situations in Fathom is where you just want a graph of the counts of some categorical data, such as the enrollment in each school year. Your data have the class names in one column and the enrollments in another. You put the class names on one axis, and you get a bar graph where every value is “1”. That is, you have one value for each class—but you want to see the enrollment.

You can, and here’s how: notice that there’s a formula at the bottom of a bar chart. It reads “count( )” by default, that is, Fathom displays frequency in a bar chart. But you can change that formula to anything you like. In this case, the kludge is to enter “mean(enrollment)“. Since there’s only one value, the mean is the value.

Same graph, except we’ve edited the formula at the bottom.

Posted in: How-to, work-arounds