Sampling until you get all six

Posted on June 4, 2013


A common problem goes like this:

There are six different Famous Statistician cards, one (randomly) in each box of Chocolate Sugar Bombs. How many boxes do you need to buy to get all six?

This is a good three-collection simulation in Fathom, and uses two little-used features, sample until and the uniqueValues( ) function.

  • Make a new collection (we’ll call it Cards), give it six cases and one attribute (card, say) with six different values, e.g., A, B, C, D, E, and F.
  • Sample with replacement from that source collection (Sample of Cards).
  • In the inspector that appears, set it up to “sample until.” Then give the sampling this formula: until uniqueValues( card ) = 6. That is, keep sampling until there are six different values in the sample collection.

Setup for "sample until"

  • Create a measure in the source or sample collection that counts how many are in the collection. Typically, this is called N and has the formula count( ).
  • Collect measures to find the distribution of counts. Now you can find the mean, or the 95th percentile, or whatever you want depending on how sure you want to be of having a complete set.

Here is a typical result:

Sample Until "Measures" graph

So if you buy 26 boxes, you have a really good chance of getting all 6. The average in this set of 200 trials is 15-ish.

(Hint: the measures can go slowly if you have the sample-collection table open. Get rid of it or iconize it so Fathom doesn’t have to redraw it so frequently.)

Of course you don’t have to use 6. And you don’t have to collect all of them. Notice the similarity between this setup and the birthday problem.