Sampling Without Replacement: Lottery Problem

Posted on June 14, 2012


Yay! A request! From Ben Ceyanes on the APstat listserve:

Ok, So I am brand new to fathom and to AP Stats this year and I am getting frustrated trying to figure out how to simulate a problem on fathom.  The problem says that a person buys 5 lottery tickets with 6 numbers ranging 1 to 49 on each but is surprised to find out that the winning 6 numbers are not on any of the bought tickets.

My problem is when I use Randominteger (1,49) I get repeats on numbers. I don’t think you pick the same numbers twice on a lottery ticket do you? How can I get the cases to give me 6 random numbers without replacement to carry out this simulation?  I can do it on the measures, but not the cases.  It’s probably quite easy but I have tried for two days and I am about to just give up!   HELP!!

We need to know a little more; how does the person pick his (or her) five tickets?

Do they do a quick pick from the machine for their tickets? If so, their own tickets might have duplicates. (And it will be more likely that there are no winners on any tickets.)

Or do they buy the tickets systematically, without overlaps? Then you know the tickets cover 30 of the 49 numbers.

Systematic Picks

Let’s do the latter case, which is easier. First we have to agree that it doesn’t matter which 30 numbers we pick for the tickets as long as they don’t overlap, so we might as well pick convenient ones. Let’s pick 1–30! We need to simulate the lottery company sampling 6 with replacement; we’ll find the probability that all six of the numbers are over 30.

The inspector for the sample collection. We’ve un-checked with replacement. We would probably also un-check animation.

  • Make a collection with one attribute, pick, and 49 cases. Give pick the values 1–49, systematically. There are two great ways to do this: you can type in the numbers; or you can make a formula using the special variable caseIndex. This will work fine, but for extra safety, you might also want to clear the formula after you make the numbers. That way, the values are just plain numbers, as if you had typed them.
  • Sample from the collection.The sample panel in the new collection’s inspector appears. Change it so you’re sampling 6, and un-check with replacement. See the illustration. (But turn Animation off.)

Now the sample collection will have 6 cases, and none of the picks will be the same. Now we want to see whether all of the numbers are over 30. The strategy is to count how many are over 30; later we’ll see how many of these samples have this number “6.”

This is a job for measures.

  • In that same sample collection’s inspector, go to the Measures panel (the second tab, reading Mea… in the picture).
  • Make a new measure, let’s call it losers.
  • Give it the formula: count( pick > 30 ).
  • Now Collect Measures from the sample collection. The new collection will be called Measures from Sample of Collection1 (or whatever the original collection was called).

Here is what I got from 1000 measures:

Number of “losers” — that is, numbers over 30 — for each of 1000 lottery picks.

None of our 1000 trials has six “losers.” That is, it’s really surprising that of the six numbers, none of them hit any of the 30 numbers we picked.

Your results will vary; when I expanded it to 10,000 cases, I got 20 sixes, for an empirical probability of 0.0020. This is not too far from the theoretical probability, which is (19/49)(18/48)(17/47)(16/46)(15/45)(14/44), which is about 0.0019.

Quick Pick Case

(I’ll describe this more quickly even though it’s more complicated.)

If the player uses Quick Pick for their five tickets, we have to be sneakier. Instead of sampling without replacement to determine the winning lottery numbers, we reverse the process. We assume that the lottery winners are 1–6, and sample without replacement (6 times) for one of your tickets. For that ticket, we count how many winners there are (and that’s a measure). (e.g., nWinners = count( pick < 7 ). ) We collect five measures, one for each ticket.

Then we define a measure for the collection of measures (tickets), which might be the sum of the number of winners ( bigSum = sum(nWinners) ). Collecting a large number of those, you look to see how many times you got zero for bigSum.

The key here is to note that in the first, easier case, we have three collections, a source, a sample, and measures.

In the second case, there are four: source, sample, measures, and measures of measures.