Random Walk

Posted on May 26, 2012


The random walk is a thought experiment (we hope) that underpins a lot of formal statistics and error analysis. The idea is that for every step, you flip a coin. Heads, you move forwards; tails, you move back. You take a total of n steps. How far are you likely to travel?

If you’re experienced with these things, you know that the (net) distance is proportional to the square root of the number of steps. Let’s do a simulation to show, empirically, that this is true. We’ll see that this requires not only measures, but measures of measures. (There’s another way that uses measures of measures of measures, but let’s not go there!)

Our underlying plan is to make a source collection that represents a single walk. It will have one case for each step. The steps will have values of +1 or –1 for forward or back; that way, your position at the end of the entire walk is simply the sum of these steps. We’ll collect measures to get 1000 end-positions for that length of walk, then collect measure from that to get typical distances for each length. This diagram gives you the overview:

The key Fathom-Jedi move you’ll see is figuring out how to get the size of the walk into the final collection. That’s using the measures N and NN. I have forgotten to do this many times, grasshopper; when you forget you’ll just have to go back and put them in.

  1. Make a collection with ten cases. (This is for a random walk with ten steps). Make step random, either +1 or –1. One good formula is
    randomPick( 1, –1 ).
  2. Now make some measures:
    One of them is end; make this the position at the end of the walk. That’s sum( step ).
    Another is N, the total number of steps. Jedi step one. You’ll see why we need this. Formula: count( ).
  3. Collect 1000 measures. Plot end. Notice that you have two attributes in that measures collection, N and end. (Note: of course, set animation off. I set the number of measures to 1000 and set it to “Replace existing cases.”)

1000 runs of a 10-step random walk.

This is the distribution of where you wind up after ten steps. This is, of course, a binomial distribution. You could use randomBinomial( ) to shortcut this, but simulating the individual steps helps students understand what’s going on.

So far, straightforward. Now for the new stuff.

  1. In the measures collection, make a measure, spread, which is a measure of spread such as the standard deviation of end: a good formula is s( end ).
    In my class, we used MAD, the mean absolute deviation, which is mean( | end | ).
  2. The Jedi move, part 2: Make another measure NN, which should be the same as all those values of N. We need it so that the spread we get will “know” what the original number of steps was.
    A good formula: mean( N )
  3. Collect measures. (Note: be sure to select Collect Measures and not Collect More Measures! The latter will just replace the thousand measures you got before.) Now you should have a collection bizarrely named Measures from Measures from Collection1.
  4. Set this new, third collection to collect one measure, no animation, do not replace existing cases.

Let’s think about this. The spread—whether you use standard deviation, MAD, or some other measure of spread such as IQR—is a way of describing how far end is from zero, which in turn is a typical value for how far the random walker is from where he or she started.

Now we’re ready to make random walks with more steps, and record how far from the origin we typically get.

  1. Increase the size of the original collection (the number of steps) to 20 (i.e., add 10).
  2. Collect More Measures” in the newest, third collection. This records the spread for 1000 runs of a 20-step walk, along with NN, which is 20, the number of steps.
  3. Plot spread against NN.
  4. Continue to increase the size of the random walk by adding cases to the original collection, collecting measures, and looking at the graph. You’ll see a relationship between the number of steps (NN), the “spread,” spread. Work you way up past N=1000. More if you can!

SD of ending position for various walk lengths.

So the graph shows that the more steps you take (NN) the farther you’re likely to wind up from your start. But it’s not linear; there’s a kind of diminishing returns that takes effect. High-school students in precalculus generally recognize this as the “lazy parabola” or, if you’re lucky, the square root function.

Because we used standard deviation in this example, you don’t really need a coefficient. If you use MAD like me, you should use K * sqrt( NN ) to model the function, where K is the name of a slider which you use to adjust that coefficient and make the function fit. The illustration shows how the coefficient from our data is very close to 1.00.