The Drunkard=s Walk

Back to Contents

    Imagine sitting in a café and watching a long stretch of straight, level sidewalk across the street. You focus your attention upon Sopwith Osbourne Sozzledwitz III, who leans against a lamp-post directly across the street from your table . You need little observation to discern that Soppy has gotten himself thoroughly schnockered by imbibing some truly potent liquid refreshment. The curtain rises on this little scenario when Clancy the Cop nudges Soppy and tells him to move along. Completely blotto, Soppy can=t remember from one instant to the next in which direction he took his last step; thus, any given step that he takes has an equal likelihood of taking him to the left (as seen from your perspective) as to the right. Soused beyond belief, Soppy must use the fence bordering the sidewalk for support and thereby restricts himself to motion in one dimension B along the sidewalk. With each step that he staggers he covers one cubit (about the length of your forearm and outstretched hand). How far will Soppy stand from the lamp-post after he takes N steps?

    We can=t answer that question with a straightforward calculation, as we could, say, with regard to a young man striding down the sidewalk on his way to meet his girlfriend (at two cubits per step, N steps after passing the lamp-post he stands 2N cubits from the lamp-post). Soppy takes us out of the deterministic realm and into the world of stochastic dynamics, a world in which the sharp lines of certainty fluff out into the foggy paths of probability.

    Let=s start by defining the basic measure of probability. We apply probability to events, assigning a value of one to an event that certainly will happen and a value of zero to an event that certainly will not happen. If we have an event (Soppy taking a step) that has multiple, mutually exclusive possible outcomes (moving one cubit to the right, moving one cubit to the left in Soppy=s case), then we assign probabilities to those possible outcomes in accordance with the relative likelihood of their occurrence and such that the probabilities of all of the possible outcomes add up to one. In Soppy= s case we have said that he has an equal likelihood of stepping to the left or to the right, so we assign a probability of one-half to each outcome.

    In terms of actual observation the probabilities tell us nothing useful if we observe Soppy only once. But if we come to the café often enough and observe Soppy on each occasion, then the probabilities tells us that we will soon discern a pattern in Soppy=s movements. Suppose that we have observed Soppy on one hundred occasions. If we have kept a record of Soppy=s movements, we find that on fifty occasions (give or take a small number) he has taken his first step to the left and that on the other fifty occasions (give or take a small number) he has taken his first step to the right. We could have calculated that result by multiplying the number of occurrences by the probabilities of the two possibilities and found that our calculation differed only slightly from our observations. Indeed, the greater the number of occurrences, the smaller the relative difference we would have between calculation and observation.

    For the taking of two steps (N=2) Soppy has four possibilities; 1. two steps to the left, 2. one step to the left and one step to the right, 3. one step to the right and one step to the left, and 4. two steps to the right. To each of those possibilities we assign a probability of one quarter (obtained by multiplying together the probabilities of the steps that comprise it). But two of those possibilities bring Soppy back to the lamp-post, so out of one hundred observations of Soppy we find him two cubits to the left of the lamp-post twenty-five times (give or take a few, of course), at the lamp-post fifty times, and two cubits to the right of the lamp-post twenty-five times. Now we make a distinction between possibilities and outcomes: each of possibilities takes a probability of twenty-five percent, but for the outcomes we have a probability of twenty-five percent for two cubits to the left, fifty percent for zero cubits of net movement away from the lamp-post, and twenty-five percent for two cubits to the right. In that fact we can see two aspects to the calculation of probabilities.

    First, for possibilities compounded of other, more fundamental, possibilities we calculate the probabilities by multiplying together the probabilities of the component possibilities. For clarifying example, imagine that our little scenario takes place on a street that slopes gently down to our right, which gives Soppy a forty percent probability of taking a step to the left and a sixty percent probability of taking a step to the right. For the four possibilities of taking two steps we calculate the probabilities as: left-left=0.4x0.4=0.16; left-right=0.4x0.6=0.24; right-left=0.6x0.4=0.24; right-right=0.6x0.6=0.36. Thus we expect that out of one hundred occasions Soppy will take two steps to the left sixteen times (with the usual caveat), take one step to the left and one step to the right twenty-four times, take one step to the right and one step to the left twenty-four times, and take two steps to the right thirty-six times.

    Second, we can see for the outcomes of our modified imaginary experiment that Soppy ends up two cubits to the left of the lamp-post sixteen times, ends up back at the lamp-post forty-eight times, and ends up two cubits to the right of the lamp-post thirty-six times. So we calculate the probability of a given outcome by adding together the probabilities of all the possibilities that yield that outcome. If we represent the probability of Soppy= s taking a step to the left with p and the probability of his taking a step to the right with q, then we have the array of probabilities for N=2 as

(Eq=n 1)

    Now let N=3 (Soppy taking three steps). That gives us eight possibilities and four outcomes, which we can represent as

(Eq=n 2)

For N=4 (Soppy taking four steps) Soppy has sixteen possibilities and five outcomes, which we sum up as

(Eq=n 3)

If we continue that process, we find that the coefficients in each array of outcome probabilities coincide with the elements of the N-th row of Pascal=s triangle:

Pascal=s Triangle

For any value of N we can calculate those coefficients from

(Eq=n 4)

in which r represents the exponent that we put on p when we calculate the probability of one of the possibilities that go into the outcome that C(N,r) goes with. On any given line of the triangle, then, we start at the far left end with r=N and then proceed rightward with r=N-1, r=N-2, and so on until we reach r=0 at the far right end. Thus we calculate the probability of a given outcome as

(Eq=n 5)

    If we give N a large value, that calculation becomes very difficult to carry out. But in order to see the ends of the distribution we must move away from the lamp-post and that makes Soppy=s steps appear smaller. Make N large enough, oblige us to move back far enough, and the steps appear so small that the distribution begins to look more like a continuum. That understanding makes us think that, perhaps, we can replace Equation 5 with some smooth, continuous function that will give us a suitably close approximation to what Equation 5 would give us. To that end let us put the lamp-post at x=0 on a line that we call the x-axis (which runs along the sidewalk). For now the variable x can only take integer values and we define . With that we can combine Equations 4 and 5 straightforwardly and get

(Eq=n 6)

Just because of the sheer difficulty of calculating factorials we will want to replace that function with a smoother (and, dare we say, nicer) function of x.

    We could replace the factorials with their equivalents in Stirling=s formula, but I want to try something different. I want to start by differentiating Equation 6 with respect to x. (See Appendix II for a discussion of differentiating factorials.) For simplicity let= s start by assuming, as we did when we began this analysis, that p=q. We have, then,

(Eq=n 7)

We of course bear in mind the fact that the derivatives of the reciprocal factorials stand true to mathematics only so long as N represents an extremely large number. We divide that equation by P, multiply it by dx, and integrate the result, getting

(Eq=n 8)

(with alpha representing the constant of integration) which gives us

(Eq=n 9)

(with A=eα) which describes Gauss=s curve, the familiar bell-shaped curve.

    Before we can evaluate A in that equation we need to take a second look at what P(x) represents. In Equation 5 we define P(N,r) as the probability of Soppy reaching a certain outcome in his addled travel. Because Soppy must reach one of the outcomes available to him, the probabilities of all of the possible outcomes must add up to one; that is,

(Eq=n 10)

When we replace r with x that fact does not change, so we must also have

(Eq=n 11)

We can represent that sum on paper as a bar graph, with the rectangular bar at any value of x having the height P(N,x) and the width extending from x-1/2 to x+1/2. Each bar thus covers an area equal to P(N,x)A 1 and equation 11 represents the total area enclosed between the x-axis of our graph and the Acurve@ comprising the tops of the bars and the vertical line segments connecting them to each other. In that picture we can reconceive P(N,x) as a probability density, the probability per unit of distance measured along the x-axis that after taking N steps Soppy will end up at or near some value of x. As N grows vastly larger than one, the relative width of the bars appears to shrink toward the differential dx as a limit, so we can rewrite the sum of Equation 11 as an integral

(Eq=n 12)

Instead of taking the limit as dx approaches zero (one will never approach zero, obviously) in that integral, we take the limit as N tends toward infinity. Now we can substitute from Equation 9 to get

(Eq=n 13)

which gives us

(Eq=n 14)

    Though interesting in itself, probability also gives us the means to calculate averages and variances of quantities subject to it. For example, we can calculate the average location that Soppy occupies relative to the lamp-post as

(Eq=n 15)

So even after taking billions, even trillions, of steps or more, Soppy makes no net progress in getting away from the lamp-post, much to Clancy the Cop=s irritation. To calculate the variance or dispersion in Soppy=s location we have

(Eq=n 16)

Thus the root-mean-square progress of Soppy=s travels equals a little over half a step.

    If x represents a measurement instead of a simple count, such as counting Soppy=s motions in feet or meters rather than in steps taken, then we must divide x by some expression of the unit of measurement that we use in order to convert the measurement that x represents into a pure dimensionless number for the sake of calculating the exponential. Dividing by twice the variance works well because in many physical applications we can measure the variance of the measured quantity more or less directly. Representing the variance as σ2, we rewrite Equation 16 as

(Eq=n 17)

So now we have, in general, the probability density of the Gaussian distribution as

(Eq=n 18)

which we can apply to any suitable physical situation; to wit, any situation that we can relate to Soppy= s stumblings through an appropriate analogy.

    Any system that has two equally likely states available to it and has a process that periodically kicks it out of its current state will conform to the distribution of outcomes that we inferred from our analysis of the Drunkard=s Walk. Flipping a coin gives us a ready example. If you flip a coin N times, you now have the means to calculate the probability that it will come up Aheads@ M times (the same probability as that of Soppy taking M steps to the left or to the right out of N steps total).

    As a final example consider the result that emanates (literally) from the mutual annihilation of electrons and positrons. When an electron and a positron come together under their mutual attraction their particle properties negate, each set the other, and the energy that the particles carried re-emerges manifest in two gamma photons, each carrying 511 kilo-electron-volts. Suppose that we have aimed our telescope at a bright remnant of a recent supernova in order to observe the gamma photons coming from the annihilations of positrons that collisions among high-energy particles in the nebula create in relatively thick swarms. As the photons enter our detector, that device measures their polarizations according to the following scheme: if a photon=s electric-field vector points in a direction less than forty-five degrees from our x-axis, we call the photon horizontally polarized, and if a photon=s electric-field vector points in a direction less than forty-five degrees away from our y-axis, we call the photon vertically polarized. Now we know that if our telescope and others observe enough sets of N photons each, we can expect that the numbers of photons that come in vertically polarized will match the distribution that we would calculate from the formula for a Gaussian curve. Of course, we hope that the observations don=t match the calculation, because in the difference between observations and our calculations, in the biasing of the polarizations of the emitted photons, we may discover the existence and, possibly, the form of a new phenomenon and thereby discover something new about the nature of Reality.

    Again we discern something wonderful. Starting with an analysis of the motions of a staggering drunk we have obtained a mathematical description that we can apply with equal validity to the radiation emitted by a fundamental process of Nature. By using the logic of mathematics to shape our concepts, then, we obtain concepts that match the deep structure of Reality.


Appendix I:

The Formal Mathematics of Probability

    Contemplate a standard gambler=s die. Ideally it consists of a perfect, uniform cube with slightly rounded edges and corners. Its center of mass coincides precisely with it geometric center and its moment of inertia has the same value when measured about any of the three axes that pass through the geometric centers of the cube and of opposite faces. And on each face we have spots painted, thinly enough that the paint will have negligible effect on the mechanical properties of the die, representing the numbers one through six. The die gives us perfect six-fold physical symmetry broken only by minuscule dabs of paint.

    In standard practice someone gives the die a light toss, with primarily horizontal motion, onto a table covered with green felt. The die bounces off that surface, tumbling and beating the air as it spins, and comes down on the surface again. Each strike of the die against the table takes some kinetic energy out of the die and converts it into heat. The die=s motion decays until the die strikes the table and loses enough energy that it doesn=t have enough left to bounce off the surface. The die tips over or tips back, one face hits down flat on the felt and dissipates the last of the die=s kinetic energy, and the die comes to rest. How many spots decorate the face on the top side of the die?

    In concept we believe that we can take a complete description of the die=s position, orientation, and motions as it comes off the thrower=s hand, then apply Newtonian dynamics to calculating its position, orientation, and motions at any subsequent time, and thereby determine which face comes out on top when the die stops moving. But in conceiving that vision we have misled ourselves. The quantum theory tells us that we cannot know the initial conditions of the die toss with absolutely perfect precision: Heisenberg=s Indeterminacy Principle encodes the fact that the finer the precision with which we know a particle=s position, the coarser the precision with which we know the particle=s linear momentum. The phenomenon of sensitivity to initial conditions, upon which we found chaos theory, ensures that any imprecision in our determination of the initial conditions of the event will grow rapidly into a significant difference between what we calculate and what actually happens. We have no hope in concept, therefore, and certainly none in practice, of predicting how the die will end up.

    However, we can calculate how many times a given number will come up on the die in the course of many throws. Probability denotes the concept that lets us make such calculations and we have two kinds of probability B mathematical (a priori) probability and empirical (a posteriori) probability.

    Empirical probability comes from the data that we gather from actual experiments. Imagine a physical system that, when we disturb it, settles into one of a number of states. If we disturb that system N times and find that it settles into the state A nA times, then we calculate the empirical probability of the system settling into the state A consequent to future disturbances as PA=nA/N. In making that calculation we assume that N is substantially greater than the number of states available to the system and that we have no information relative to the system=s settling into one state or another other than the data from previous trials. In the case of a die we want N much greater than six. We can gather our data by throwing a single die N times or by throwing N dice once if we ensure that they don=t interfere with each other. Empirical probability simply measures the relative frequency at which an event occurs.

    Mathematical probability gets a little more complicated but ultimately if must match empirical probability because we intend it to describe stochastic reality. We start by contemplating a system that responds to an appropriate disturbance by settling into one of an exhaustive set of mutually exclusive, equally likely states. In the case of a die we exhaust the possible states with the different ways in which a given number of spots can come out on top and we can denote the subsets of that set by those numbers: {(1), (2), (3), (4), (5), (6)}. Those six states satisfy the criterion of mutual exclusivity because only one side of the die can come out on top at any one time. As for the equal likelihood of the states occurring after the die is tossed, we base that on the physical symmetry of the die: the perfect uniformity and symmetry of the die give us no basis for asserting that any one face is more or less likely to come out on top than is any other face. In the absence of distinction we must assert equality in the relative possibility of the system manifesting any given state.

    We make the connection between mathematical probability and empirical probability through a limiting process, such as the one we use in defining derivatives in the calculus. If we have a number S of equally likely states that a system can manifest after we disturb it and if we disturb that system N times and the state A is manifested nA times, then the probability of the state being manifest as a consequence of any given disturbance of the system conforms to

(Eq=n I-1)

    More formally we say that a definitely occurring event T (the toss of a die) has a probability function P that covers a domain that includes all possible outcomes of the event,

T={(1), (2), (3), (4), (5), (6)}

(Eq=n I-2)

in the present example, and a range contained in the closed interval [0, 1]. We subject that probability function to the conditions that

P(T) = 1

(Eq=n I-3)

and, if for events/possible outcomes Ai and Aj we have Ai1 Aj={(i)} when i j, then we have

P(A1c A2c ...c An)=3 P(Ai).

(Eq=n I-4)

    Consider the set T2 that we get when we roll two dice once. We have

(Eq=n I-5)

That matrix provides the mathematical foundation supporting the game of Craps. The matrix presents us with thirty-six possibilities, none of which has a greater or lesser likelihood than any other of coming manifest on any given throw of the dice, so the probability that we assign to any given possibility equals one thirty-sixth.

    Craps is scored not on possibilities, but on specific outcomes. In a given round of play the player bets a certain amount of money, the house matches the bet, and the winner of the round takes all of the money. The player tosses the dice and the round is scored in accordance with the number of spots displayed on top of the dice when they come to rest. If the spots add up to seven (six possibilities with a probability of 1/36 for each yields a probability of one sixth, in accordance with Equation I-4), the player wins. If the spots add up to two (Asnake eyes@, P(1,1)=1/36) or twelve (Abox cars@, P(6,6)=1/36) the player loses. If the spots add up to one of the other eight possibilities, their sum becomes the player=s point and the player must toss the dice again until either they roll their point again (and the player wins) or they roll a seven (and the player loses). What probability do we assign to the player=s possibility of winning any given round (in accordance with Equation I-1)?

    Suppose that our player throws a three on their first toss of the dice. To calculate the probability of the player=s winning the round we must determine the probability that the player, in subsequent throws, will throw a three before throwing a seven. None of the other numbers that the dice can display count toward winning or losing the round, so we ignore them. We thus have to consider only the two ways of making a three and the six ways of making a seven: out of eight possibilities of scoring, two give the player a win, so the probability of winning the point equals 1/4. Since the probability of throwing a three on the first toss of a round equals 2/36, the compound probability of throwing a three and then winning the point equals (2/36)(1/4)=1/72.

    We can make the same calculation for the other possible points and thereby determine the probability of setting and then winning a given point on the first toss of the dice. If we add those numbers together and add in the probability of winning by throwing a seven (1/6) or by throwing a two or a twelve (0), we obtain the probability that the player will win any given round, P=0.465151515....

    If that probability had equaled 0.50000..., then the player would win as often as lose and gain a dollar for every dollar bet and lost. But a probability of 0.465... means that out of one thousand rounds the player wins 465 of them, which means in turn that if the player bets one dollar every round, they lose seventy dollars over the one thousand rounds. Thus we calculate that on average the player loses seven cents from every dollar bet. If we multiply that figure by the number of dollars bet per round, then by the number of rounds played every day, add in side bets and other furbelows and then take into account the observation that few people fully understand probability, then we can see why Las Vegas, Nevada, did not remain a former railroad tank town and short slow-down zone on U.S. Highway 95.


Appendix II:

Differentiating Factorials

    We don=t usually think of the factorial as a differentiable function: indeed, we don=t think of the factorial as a function in the usual algebraic sense, though it certainly satisfies the definition of a function; that is, with any integer N we associate the product of all of the integers up to and including N (written as N!). But how would we differentiate something like that? It doesn=t seem to give us the usual algebraic handle.

    Assume that G represents an integer constant and that x represents a variable that can take only integer values. If G represents a very large number, then we can treat the difference between successive integers as equivalent to a differential for the purpose of calculating derivatives. Now we differentiate (G+x)! and (G-x)! to demonstrate how we can apply both incremental and decremental differentiation to factorials.

    Recall that the standard incremental derivative of some function f(x) with respect to x conforms to the statement that

(Eq=n II-1)

In differentiating (G+x)! we take dx=1 and get

(Eq=n II-2)

Note that in this case we don= t take the limit explicitely, because we properly cannot. Our differential, dx=1, cannot diminish toward zero, though it grows relatively small as G grows larger. In essence we have reversed the limit process, taking the limit as G approaches infinity, rather than as dx approaches zero.

    To differentiate (G-x)! we use the decremental derivative,

(Eq=n II-3)

In this case we decrement x by dx rather than incrementing it by dx. Again, taking dx=1 and taking the limit implicitly, we have

(Eq=n II-4)

If we had incremented x, we would have decremented the factorial, which would lead to a division by the factorial=s ultimate factor. But we don=t want to use a division (other than the one by dx) in calculating a differential, so we calculate the decremental derivative instead of the usual incremental derivative.

    Now we want to differentiate the reciprocals of those factorials. We can=t do it directly, so we will use the product rule and the fact that the derivative of one equals zero. For our general function we have

(Eq=n II-5)

For our first reciprocal factorial we have

(Eq=n II-6)

which gives us readily

(Eq=n II-7)

And for our second reciprocal factorial we have, of course,

(Eq=n II-8)

which gives us

(Eq=n II-9)

With those derivatives we have the means to transform the formula of the binomial theorem into that of Gauss=s curve.


Appendix III:

Evaluating the Gaussian Integral and its Moments

    In calculating probabilities, averages, and variances we must find the solution of an integral of the exponential of a negative square of the variable in question. We start with the basic Gaussian integral.

    In order to solve this integral we must put into a form that we can solve with techniques that we already know. For convenience we define

(Eq=n III-1)

If we plot the exponential function

(Eq=n III-2)

along the x-axis from minus infinity to plus infinity, we get the famous bell-shaped Gaussian curve centered on the origin of the axis. With the integral I we want to calculate the area between that curve and the x-axis.

    Let=s extend that exponential function over an x-y plane. We have then

(Eq=n III-3)

and now we want to calculate the volume enclosed between the x-y plane and the bell-shaped surface defined by that equation. For that calculation we have

(Eq=n III-4)

In the plane we can convert Cartesian coordinates into polar coordinates. To make that conversion we use

(Eq=n III-5)


(Eq=n III-6)

which transforms Equation III-4 into

(Eq=n III-7)

Of course, we can only justify that move by integrating over the entire plane and not merely part of it. That fact stands true to mathematics because in the Cartesian case we integrate over a square and in the polar case we integrate over a disc. The results necessarily differ from each other, but Aat infinity@ the difference becomes negligible. From Equation III-4 we thus obtain

(Eq=n III-8)

(Reference: Reif, F., AFundamentals of Statistical and Thermal Physics@, McGraw-Hill Book Company, New York, 1965 (LCCCN 63-22730), Pg 606 - 607).

    If we multiply the integrand of the above integral by x, we get the integral whose solution yields the average value of x. We can actually integrate this one directly to get

(Eq=n III-9)

That gives us what we, given the skew symmetry of the integrand, would expect.

    Solving this integral gives us the means to calculate the variance (or dispersion) in x and its square root, the root-mean-square value of x. Given that we have already solved the first integral in this sequence, we can solve this one easily: we merely integrate it by parts. We have, then,

(Eq=n III-10)

In that equation we know that the first term on the right side of the equality sign zeroes out because y=exp[x2] increases faster than y=x does, which means that y=exp[-x2] decreases faster than y=x does, so y=xexp[-x2] decreases to zero in the limit as the value of x tends toward infinity.


Back to Contents