In the apstat community thingy, Rudy Medina posted a problem, and Paul Myers posted a solution. I’ll show another solution, and maybe veer off into philosophy. Here’s the problem (from 16 August 2012):

A bus stop has 7 stops and 4 passengers. If every passenger is equally likely to get off at any stop, what is the probability that [exactly] 2 will get off at the same stop?

How do you simulate this? First we simulate the bus. We make a collection with four cases (one for each passenger) and give it one attribute, **stop**. This thing gets a formula such as **randomInteger(1,7)** or **randomPick(1,2,3,4,5,6,7)**. So we have four random integers representing the stops where the passengers got off.

Now we need to figure out whether two got off at the same spot, that is, are (exactly) two of the numbers the same.

There are a number of approaches, but it can be tricky. You might want to use measures and **uniqueValues( )** as we did when we did the Birthday Problem. But that will cause trouble: whenever **uniqueValues( stop ) = 3**, it means two people got off at the same stop, as in { 1, 2, 2, 6 }. But {2, 2, 5, 5} and {2, 2, 2, 5} both have **uniqueValues( stop ) = 2**, and the former (says Rudy) counts as a “success” in this probability problem. So you can’t just count up the number of times **uniqueValues** is 2 or 3.

Paul’s solution was to make a longish Boolean expression. It works, but here is another solution that uses a technique that you may like: analyzing the contents of a summary table.

- Using the “bus” collection with four cases, drag
**stop**to a summary table, holding down shift to force it to be categorical. - Right-click the table and choose
**Create Collection from Cells**. A new collection appears,**Cells from bus Table**. - Make a (case) table for that collection so you can see that it contains all the information from the summary table. Weird, huh? The numbers of people who got off at each stop is in the column
**S1**. - Make a measure in that new collection. Call it
**anyTwos**. Its formula is**count(s1=2) > 0**. - Collect measures many times, and look at the distribution of
**anyTwos**.

There are several cool things about this. One is that, OMG, you can collect measures made from a table made form some other collection? Yes. And the whole chain knows to rerandomize the original collection when you collect another measure.

Another is that the logic of using the table might be more straightforward to kids. We’re asking, are there any stops where exactly two people got off? And the formula we write is, **count(S1=2) > 0**. That is, how many stops are there where two people got off? Is that number greater than zero?

Still, it’s troubling that this is so hard. How would you know to do such a thing? If I get the energy (maybe after lunch) I will write about it on that other blog.

*How-to, measures, random-number functions, simulation*

August 17th, 2012 → 3:43 am

[…] post in the apstat community led to a post in my Fathom-masters blog, an that led here to a policy/philosophy nugget: as we modernize stats education—for example, to […]