It’s a famous probability problem: *given N people in a room, what’s the probability that at least two of them share a birthday*? It’s famous because the answer is often surprising: a remarkably small number (23) will give you a probability over 50%.

How do you simulate it in Fathom? The key is going to be in a function you might not be aware of: **uniqueValues( )**. We’ll make a collection to represent the people in the room, and make a measure that will tell us if the birthdays are all different. Then we’ll collect measures to find the probability.

- Let’s try
*N*= 20. Make a new table with 20 cases, and one attribute,**birthday**. - Give it this formula:
**randomInteger( 1, 365 )**. - Make a measure called
**allDifferent**, with this formula:

**count( ) = uniqueValues( birthday )**. - Don’t believe me? Rerandomize until
**allDifferent**is**false**, then plot**birthday**. It won’t take long. You’ll see the duplicate. - Onward: Collect 1000 measures and plot
**allDifferent**. And/or make a summary table. The (empirical) probability will stare you in the face. See the picture below. - Add a case at a time and re-collect (replacing the measures of course). You’ll see that the proportion drops below 0.5 at about N = 23.

**What Just Happened?**

First of all, **randomInteger( 1, 365 )** gives you, well, a random integer between 1 and 365. If I had used **randomInteger( 365 )** I could have gotten zeros. No need to use actual dates, by the way; all that matters is that there are 365 different values to choose from. And we are ignoring leap years.

Now let’s look at the measure, **allDifferent**.

Its formula is **count( ) = uniqueValues( birthday )**.

What does this do? Well: **count( )** is the number of cases in the collection (20). And **uniqueValues( birthday )** is the number of distinct values there are in the column. So if there’s a duplicate, the number of unique values will be less than 20. In that case, the expression ( **count( ) = uniqueValues( birthday ) **) will be **false**. But if there are no duplicates, the number of distinct values will be 20 as well, and **allDifferent** will be **true**.

Notice that the formula is not a usual numerical computation. It’s a *Boolean* expression, which means it can have one of two values: **true** or **false**. Special Fathom gotcha: if you put these in expressions, they *do not get quotes*! That’s because **true** and **false** are legit values, not strings. So I can write **if (allDifferent = false)** … without quotes. But when you write **count( coin = “heads” )** you *do* need the quotes. “Heads” is a string.

**Extension**

Use the techniques described in the random walk post to record the probability as a function of the number of people. This uses an additional measures collection. Sample result:

Special hint, building on the Boolean comment above: one elegant expression you can use in a measure in the measures collection is **proportion( allDifferent )**.

*How-to, measures, random-number functions, simulation*

August 17th, 2012 → 1:34 am

[…] of approaches, but it can be tricky. You might want to use measures and uniqueValues( ) as we did when we did the Birthday Problem. But that will cause trouble: whenever uniqueValues( stop ) = 3, it means two people got off at the […]

February 9th, 2015 → 3:09 am

[…] Of course you don’t have to use 6. And you don’t have to collect all of them. Notice the similarity between this setup and the birthday problem. […]