Wave Packets

We know that when a field, such as an electric field or the probability field of the quantum theory, changes with the elapse of time, it must conform to the wave equation

(Eq’n 1)

and, thus, it must have the description

(Eq’n 2)

in the one-dimensional case. That equation describes a plane wave that propagates to the left or to the right parallel to the x-axis of our coordinate frame with a speed c such that kc=ω, with k representing the wave number (equal to 2π multiplied by the reciprocal of the wavelength), ω representing the angular frequency of the vibrating field, and A representing the amplitude of the wave. That amplitude does not depend upon the variables in the equation (it comes from the constant of integration in the basic solution), so it represents a constant that extends uniformly to infinity in both directions.

A purely monochromatic wave spanning an infinite distance does not represent a real situation, though laser beams come close (compared to the wavelength of the light, the length of a typical laser beam approaches the effectively infinite). We want to describe something more realistic, such as a pulse of radiation or an amplitude-modulated wave, so we use a linear superposition of functions such as that in Equation 2, but with different wavelengths. The terms in that sum satisfy the wave equation separately, so their sum also satisfies it. In the superposition constructive and destructive interference shape the component waves into a wave packet in accordance with Fourier’s theorem.

We can see how that process works if we consider the phenomenon of beats. If we have two waves of slightly different wavelengths propagating together in the same direction, the amplitude of the combination will rise and fall as the waves interfere constructively and then destructively. We can modify that pulsation by adding in more waves. As the set of the wavelengths segues from the discrete to a continuum we can, according to Fourier, alter the shape of the pulsation in any way we want.

Substituting Equation 2 into Equation 1 tells us that applying the differential operators to the field function gives us the same result as we get from multiplying the field function by the square of the wave number or the square of the angular frequency. That fact enables us to transform Equation 1 into the dispersion relation,

(Eq’n 3)

which, given that the wave propagates in the direction pointed by the wave vector, we can write as

(Eq’n 4)

If we subtract the angular frequency from that equation and multiply the result by elapsed time, we get the phase relation that we see in Equation 2,

(Eq’n 5)

We thus have the phase relation as the vector dot product of two
four-vectors, [**k**,ω]
and [**x**,t]. Those four-vectors specify a point in an eight-dimensional
phase space, the spatial point drifting through its three-dimensional realm as
the time increases. In more or less perfect symmetry, the wave vector, the
vectorized reciprocal of wavelength, would drift through its three-dimensional
realm if the frequency, the reciprocal of the period, happened to change.

So, let’s assume that we have a propagating entity, a wave
packet, comprising a continuum of component waves. With an amplitude density a(**k**)
in the wave number dimension (assuming for simplicity that we have directed all
spatial movement parallel to our x-axis) the wave gains an increment of
amplitude a(**k**)d**k** from the waves having wave numbers in the
minuscule range d**k** and we have our field function as

(Eq’n 6)

To obtain a description of the amplitude density we can invert that equation through the field function and some instant of time, ψ(x,0) for typical example;

(Eq’n 7)

Alternatively, we could invert Equation 6 through the field function at some point in space, ψ(0,t) for example, and integrate over time,

(Eq’n 8)

in which we use the dispersion relation to convert the frequency into the corresponding wave number in the argument of the exponential function.

If the amplitude density differs substantially from zero
only in a relatively narrow neighborhood around the point [k_{0},ω_{0}],
then we can write the field function as

(Eq’n 9)

in which we have the amplitude

(Eq’n 10)

If it turns out that A varies slowly with the elapse of time, then we can
treat the packet as a localized wave of frequency
ω_{0}
propagating with amplitude A, just as in Equation 2.

Assume that at the instant t=0 the amplitude function displays the shape of a Gaussian hill centered on the point x=0. We express that assumption algebraically as

(Eq’n 11)

Note that g has units of reciprocal distance squared. But that only describes
a stationary pattern. We want the pattern to move at the speed c=ω_{0}/k_{0},
so we must write

(Eq’n 12)

which makes x increase as t increases, thereby making the pattern move in the positive x-direction as time elapses. Thus we have the field function as

(Eq’n 13)

If we calculate the Fourier transform of that function, we get

(Eq’n 14)

in which y=x-ct for convenience. Take the argument of the combined exponentials and complete the square to get

(Eq’n 15)

in which we define, for convenience,

(Eq’n 16)

Since we also have dy=dβ, we can rewrite Equation 14 as

(Eq’n 17)

in which I multiplied and divided the equation by the square root of g in order to put the differential into the proper form to calculate the definite integral, which equals the square root of pi. So we have

(Eq’n 18)

which represents a Gaussian function in the k-coordinate (which we have
assumed runs parallel to the x-axis in the spatial coordinates) centered on k_{0}.
This formula, of course, necessitates that our wave packet consist of a set of
waves with different values of k and
ω
propagating together.

Does the Fourier representation of a pulse of radiation, as described above, give us merely a useful fiction or does it express a truth about the nature of Reality? Sometime in the 1980's, while visiting an aunt in Irvine, California, I conducted an impromptu experiment whose result supports the latter possibility: in that experiment I experienced the multifrequency structure of a wave packet directly. For a reason I can’t remember, I created a sharp pulse of pressure, which Fourier analysis represents as a sonic wave packet, by clapping my hands. I got back an echo, not as a clap, but as a chirp: the echo had returned to me as a series of waves, the lowest frequencies reaching me first and the higher frequencies reaching me progressively later. I had been standing on a small greenbelt that ran behind the homes and saw that the fence that returned the echo was made of wooden planks that overlapped each other, making the fence the acoustical analogue of a mirrored diffraction grating. The fence scattered the different frequencies in the clap so that they came back to me from different directions (from different parts of the fence) and, thus, at different times, giving me a little sonic rainbow. As a purely passive device, the fence could only separate waves that already inhered in the clap and not generate waves from the pressure pulse, so I infer that the Fourier representation of radiation gives us an authentic description of Reality.

So far all of our wave packets move at the same speed as do the waves that constitute them. But not all wave packets do that. In the quantum theory, for relevant example, wave packets that represent matter must move at a variety of speeds and yet the waves that constitute the packets, in order to satisfy the requirement of Lorentz invariance, must propagate at the speed of light. We thus define a new term: the wave packet, the amplitude envelope, moves at what we call the group velocity.

If an observer measures the amplitude of a wave packet passing through their apparatus, they will measure the rate at which the amplitude changes as a number that matches the sum of two components – the rate at which the amplitude changes inherently with the elapse of time and the product of the group velocity with the rate at which the amplitude changes inherently over distance. Thus, we have the measurement as

(Eq’n 19)

in which v_{g} represents the group velocity. The usual algebraic
turnaround dissolves that equation into

(Eq’n 20)

Imagine that we have two waves propagating in opposite directions at the speed of light. At our given point we measure the field intensity of each wave as

(Eq’ns 21)

For simplicity assume that A_{1}=A_{2} and define A_{0}=A_{1}=A_{2}.
At the given point we don’t measure the component waves; we can only measure
their combination,

(Eq’n 22)

Using Euler’s formula, exp[iα]=Cosα+iSinα, we express the sum of two imaginary exponentials as

(Eq’n 23)

In the present case we have

(Eq’ns 24)

so we have Equation 22 as

(Eq’n 25)

Because the cosine always yields a real number, we identify it with the
envelope of the combined waves and we identify the imaginary exponential with
what a radio engineer would call the carrier wave. We have for the component
waves separately ω_{1}=k_{1}c
and ω_{2}=k_{2}c,
so we can calculate the two propagation speeds that we have in their
superposition: for the cosine we have the group velocity

(Eq’n 26)

and for the imaginary exponential we have what physicists call the phase velocity

(Eq’n 27)

The group velocity will always come out less than the speed of light, which we require because we want the envelope of the compound wave to represent a real particle or body. But the phase velocity always comes out greater than the speed of light and that fact seems more than a little problematic. We usually save our work here by emphasizing that we associate the mass and energy of the relevant body with the envelope of the compound wave and, thus, with the group velocity. The phase aspect of the compound wave carries no physical properties and, thus, does not violate relativistic dynamics in moving faster than light. But then we must ask whether anything actually moves faster than light in that case. Suppose that we have two horizontal straight lines that cross each other at a very small angle and that one line moves vertically past the other. The lines move with a vertical velocity smaller than the speed of light, yet nonetheless, if we use a small enough angle, the point where the two lines cross each other will move faster than light. Of course, nothing in the system actually moves faster than light: the superluminal motion of the crossing point cannot convey any information, much less any mass or energy, so we classify it as an illusion. The superluminal motion of the phase of our carrier wave shows us the same kind of illusion.

Now we can differentiate the amplitude in Equation 25 as required by Equations 19 and 20. We obtain

(Eqns 28)

Substituting those equations into Equation 20 tells us that v_{g}=dx/dt,
which doesn’t give us anything very interesting, but it does confirm that our
description of the wave packet conforms to the description of the group
velocity.

Looking ahead a bit, take the field function as a continuous superposition of plane waves propagating parallel to the x-axis,

(Eq’n 29)

in which we have represented the dispersion relation by writing the angular
frequency of each wave as a function of that wave’s propagation number. We
require that the amplitude function f(k) have a single maximum at the
propagation number designated by k_{0} with a width (standard deviation)
of Δk.
Because of the symmetry of the envelope in k-space we can assert that the wave
packet consists of a continuous set of pairs of waves with propagation numbers k_{0}±δk,
so we can represent the field function as a sum of contributions that we
represent algebraically as

(Eq’n 30)

Thus we have the equivalent of a single plane wave (the exponential function)
modulated by an amplitude function (the cosine; f(k_{0}±δk)
that acts as a constant for each pair of component waves). The plane wave has
exactly the same description for all pairs of component waves, so it acts as a
carrier wave of the packet. If we have the dispersion relation
δω=v_{g}δk
for all values of δω
and δk,
then the maximum of the amplitude function will move at the group velocity v_{g}.
In general we have

(Eq’n 31)

In this case the higher order δ-functions zero out, leaving only the first-degree term; thus, for all values of δω and δk the maximum of the cosine function moves through location space with the elapse of time in the same way for all of the cosines in the wave packet. The motion of the maximum of the cosine function corresponds to the phase of the cosine remaining constant as time elapses. For simplicity we take that constant as zero, so we have the equation of motion as

(Eq’n 32)

We thus have the group velocity as

(Eq’n 33)

for the whole wave packet. That result stands true for all wave packets that do not disperse as they propagate.

To return to our derivation, we add more waves to the array described by Equation 25 to localize it and form it into a wave packet. Plotting the contributions in k-space, we assume that their distribution function follows a continuous, symmetric function that has nonzero values on both the positive and negative halves of the k-axis. If the description of the wave packet did not conform to that latter requirement, then all of the components would travel in the same direction at the speed of light and so would the wave packet: the description would conform to that of a pulse of light and not to that of any material particle or body. As a simple example assume that the amplitudes of the wave packet’s propagation numbers conform to a Gaussian distribution and then calculate the associated group velocity. To make the derivation easier we exploit one other criterion to which the wave packet must conform.

As noted above, we want our wave packets to possess the property of Lorentz invariance. That requirement that the wave packet conform to the requirements of Relativity necessitates that the component waves all propagate at the speed of light, the fundamental invariant of Relativity. To ensure that the wave packet itself moves at speeds less than the speed of light, we must assert that some of the component waves propagate in one direction and that the rest of the component waves propagate in the opposite direction. In order to see how that works we want to construct a description that appears to describe a real wave packet; to wit, one that actually represents a real particle.

Start with the simplest case, a wave packet that remains stationary and centered on the point x=0. In k-space we must have a function describing the propagation numbers of the component waves that remains symmetrical about the mean, which, in this case, equals zero. For the simplest function that satisfies those requirements we have the normal (or Gaussian) distribution function,

(Eq’n 34)

in which σ^{2}
represents the variance of the distribution. Note that this function has
inflection points at k=±σ
(the standard deviation), which means that at those points the graph of the
function, moving away from its maximum, stops flexing downward and starts
flexing upward (more technically, the second derivative of the function goes to
zero). We thus have the field function as the Fourier transform

(Eq’n 35)

Again we prepare the integral by completing the square in the argument of the integrand,

(Eq’n 36)

in which

(Eq’n 37)

Using the differential , we evaluate the integral and get

(Eq’n 38)

The inflection points of that function occur at x=±1/σ,
so we have the variance of x as μ^{2}=1/σ^{2}
and we now have the field function as

(Eq’n 39)

But that description gives us only the envelope of the packet: it does not show that the packet gives us a dynamic entity that consists of a very large set of waves. For our stationary wave packet we use pairs of waves identical to each other except that they propagate in opposite directions: in mathematical form, we have the description of each pair of those waves as

(Eq’n 40)

By adding up (or integrating) such pairs we get a stationary pattern that pulsates. Integration of the cosine, with respect to k or to x, gives us the Gaussian envelope, but what should we use for the angular frequency in our description?

In this case we don’t have to worry about assigning a propagation number to the packet, because the propagation numbers of each pair of component waves cancel each other out, leaving the packet motionless. But angular frequency can only take positive values, so what single value can we assign to the packet as its "beat" frequency? It may help us in answering that question to note that, if our wave packet represents the quantum-mechanical description of a particle, then multiplying the beat frequency by Dirac’s constant (ħ) gives us the value of the particle’s rest mass-energy.

Simply put, we use the average value of the angular frequencies of the component waves. For each pair of component waves, one with positive wave number and the other with its negative, carries a certain value of the angular frequency, which corresponds to an energy for the particle. The field function correlates with the probability of finding the particle in a state defined by each component pair. The Heisenberg principle tells us that the particle will not definitely occupy any given state until its properties get measured, so the field function must reflect that indeterminacy: for the angular frequency in the field function we must use the expectation value, the average value, of that number in the distribution of the component waves.

We can exploit the dispersion relation (ω=kc) and the symmetry of the distribution function to calculate the average value of the angular frequency,

(Eq’n 41)

In that calculation I have used only half of each integral, exploiting symmetry to make the calculation more transparent. Using that result, we can now rewrite Equation 39 as

(Eq’n 41)

Now we want to describe a wave packet that moves; specifically, one that moves in the positive x-direction at some speed V. To get our description we apply the Principle of Relativity and consider how our stationary wave looks to an observer moving past us in the negative x-direction at the speed V.

In that observer’s frame the wave packet appears to move in the positive x-direction at the speed V. The field function must reflect that motion and it must conform to the Principle of Relativity, so the field function that the moving observer describes must have the same mathematical form as does the field function of Equation 41. That fact means that the field functions only differ by the Lorentz Transformation of the coordinates in them.

The moving observer uses coordinates marked with a prime, so we have

(Eq’n 42)

and

(Eq’n 43)

in which

(Eq’n 44)

the Lorentz factor between the two coordinate frames. Thus we have the moving observer’s field function as

(Eq’n 45)

Note that I have not assumed that the distribution has the same standard deviation in both frames. At this stage we don’t yet know how the standard deviation differs between two inertial frames.

We gain additional perspective by examining the component waves that make up the wave packet. Each pair of waves as seen in the stationary frame would appear to observers in the moving frame to have undergone a Doppler shift: as we go from the stationary frame to the moving frame the wave propagating in the negative x-direction gets Doppler downshifted and the wave propagating in the positive x-direction gets Doppler upshifted. Thus Equation 40 becomes

(Eq’n 46)

because

(Eq’n 47)

so that

(Eq’n 48)

and

(Eq’n 49)

Using the dispersion relation we can rewrite Equation 46 as

(Eq’n 50)

Analyzing that result through Equations 41 and 45 tells us that σ’=γσ, which mimics the relativistic change in a particle’s mass-energy, as we expect.

The fact that the standard deviation increases for the moving wave packet tells us that the wave packet spreads out in k-space. We see this in Equation 34, where we can see that a bigger σ requires a bigger k to bring the exponential function to the inflection point. Likewise, Equation 45 shows us that in x-space the wave packet shrinks in proportion to the Lorentz factor, which corresponds to the Lorentz-Fitzgerald contraction of a body.

Finally, the envelope of the wave packet represents a probability distribution of the particle’s existence, shrinking in x-space as it expands in k-space. It thus represents an indeterminacy in those two realms. We now want to determine whether the Gaussian function minimizes the indeterminacy or whether we can find a function with even less indeterminacy.

We start with the purest abstraction of the wave packet and set up the calculations of the mean-square deviations, the variances, of the x-coordinate and of the k-coordinate;

(Eq’n 51)

and

(Eq’n 52)

For convenience we set the mean values of the coordinates equal to zero (<x>=0 and <k>=0), essentially centering our system in phase space. Under that proviso, we multiply those two equations together to get

(Eq’n 53)

in which the first integral on the right side of the second line of that equation comes from integrating the modified version of Equation 52 by parts. We can’t carry out the integrations without knowing the specific algebraic form of the field function, so to make the calculation tractable we invoke the Schwarz inequality,

(Eq'n 54)

and thereby obtain

(Eq’n 55)

In the last step in that calculation we integrated by parts and exploited the fact that

by definition. The xψ*ψ term in the integration yields a zero due to the odd symmetry of the term.

We define the uncertainty or indeterminacy in the particle’s properties as the standard deviations in the descriptions of those properties, the root-mean-squares (the square roots of the variations):

(Eq’ns 56)

So extracting the square root of Equation 55 gives us

(Eq’n 57)

Multiplying that result by Dirac’s constant (ħ) to convert the propagation number into the particle’s linear momentum turns it into the standard form of Werner Heisenberg’s indeterminacy principle.

Equality in that last equation gives us the condition of least indeterminacy. We know that the Schwarz inequality yields an equality if and only if the functions f(x) and g(x) have a relation f(x)=Cg(x), in which C represents a constant. Then we have

(Eq’n 58)

which we solve readily (divide by ψ, multiply by dx, integrate, and extract the antilogarithm) to get

(Eq’n 59)

In that equation N represents the antilogarithm of the constant of integration. The requirement that the field function have such a form that integrating its square over the entire number line yield the number one necessitates that C represent a negative number. We then calculate

(Eq’n 60)

which yields

(Eq’n 61)

We can then calculate the variance of the particle’s position to determine the value of C; we get from Equation 51

(Eq’n 62)

So we have

(Eq’n 63)

and

(Eq’n 64)

That statement gives us the algebraic form of the minimum uncertainty wave packet representing a particle flying free in space.

Appendix: Fourier Series

Between 1807 and 1822 Jean Baptiste Joseph Fourier (1768 Mar 21 – 1830 May 16) devised an infinite series that he used in solving the differential equation that he had devised in his work to create a theory of heat flow. Fourier discovered that he could represent any periodically repeating function as the sum of a sine, a cosine, and all of their overtones; that is, he discovered that

(Eq’n 1)

for all values of x lying in the domain [-π,π]. More importantly, he discovered how to calculate the coefficients of that series from f(x): without such a calculation the series would have remained more or less useless. To calculate the m-th coefficient of the series we need only multiply the function f(x) by (cos(mx)+sin(mx))dx and integrate the product over the whole domain,

(Eq’n 2)

To evaluate the expression on the right side of that equation we use the facts that

(Eq’n 3)

and

(Eq’n 4)

along with the trigonometric identities

(Eq’n 5)

(Eq’n 6)

and

(Eq’n 7)

We know immediately that all of the SineCosine terms drop out, because for all values of m and n we have

(Eq’n 8)

The remaining terms conform to

(Eq’n 9)

and

(Eq’n 10)

in which δ_{mn}
represents the Kronecker delta. Because only the terms in which m=n survive the
integration with nonzero values, we can match terms that have the same value of
n across the equality in Equation 2 and get for all values of n>0

(Eq’n 11)

(Eq’n 12)

and

(Eq’n 13)

With those equations we can convert any function into a Fourier series, subject to some conditions. The first few terms in the series produce a crude approximation to f(x) and the addition of more terms refines the approximation, conforming precisely to f(x) (converging to f(x)) in the limit as the number of terms approaches infinity, provided f(x) is continuous at x and if the following conditions are satisfied: we require that f(x) be a bounded function, have only a finite number of maxima and minima, and have only a finite number of finite discontinuities on the domain [-π,π]. Note that the phrase "finite discontinuity" denotes a discontinuity about which we find an interval in which the function is bounded (e.g. sin(1/x) at x=0).

To gain a clearer picture of what that means let f(x)
represent a periodic function, of period 2π, defined and bounded on the interval
[-π,π]. Assume that we can divide the interval into finitely many subintervals
and that on each and every one of those subintervals f(x) remains continuous and
monotonic. Then the Fourier series representing f(x) converges at each point of
continuity x_{c} to f(x_{c}) and at each point of discontinuity
x_{d} it converges to the mean value of its left and right limiting
values; specifically,

(Eq’n 14)

in which x x_{d}-0 refers to the value of x approaching x_{d}
from the minus side and x x_{d}+0 refers to the value of x approaching x_{d}
from the plus side. The requirement that the domain of definition of f(x) get
split into only __finitely__ many subintervals on each of which the function
is continuous and monotonic entails that the function have only finitely many
discontinuities and only finitely many extrema. Mathematicians call this the
Dirichlet condition.

We have implicitly assumed that f(x) has a range confined to the real number line. If the range of f(x) spreads out onto the complex plane, then we must modify Equations 2, 11, 12, and 13. We rewrite Equation 2 as

(Eq’n 15)

using Euler’s theorem,

(Eq’n 16)

and the coefficient relations,

(Eq’n 17)

(Eq’n 18)

and

(Eq’n 19)

We have the equivalent of Equations 8, 9, and 10 in the statement that

(Eq’n 20)

That leads, as above, to

(Eq’n 21)

We may notice in passing that Equation 20 bears a vague resemblance to the dot-product calculation of unit vectors on one another. That resemblance then suggests that Equation 21 is analogous to dot-multiplying a vector by a unit vector to tease out the components of that vector with respect to the axis to which the unit vector belongs. We can then say, as mathematicians do, that the function represents a vector in an infinite-dimensional space.

Before we take the next step I want to point out that mathematicians and physicists usually write Equations 15 and 21 in a more symmetrical form. It makes no difference in our Fourier series if we multiply one equation by a constant and divide the other equation by the same constant, so we "split the difference" in the coefficient on Equation 21 and write

(Eq’n 22)

and

(Eq’n 23)

Finally, we want to expand the domain of our function from [-π,π] to [-∞,∞] and devise a Fourier series that will describe a non-periodic function on that domain. Assume that we have f(x) defined on the domain [-D,D] and that f(x)=0 everywhere else on the x-axis. If we create a Fourier series representation of that function, we would simply get a repetition of f(x) over every domain 2D wide in the continuous chain of such domains. That ploy doesn’t give us what we want, so we try a slightly different approach.

On the domain [-D-L,D+L], in which L represents an arbitrary positive number, we define

(Eq’n 24)

The Fourier series that represents that function comes out as

(Eq’n 25)

in which we have

(Eq’n 26)

This still gives us a periodic function, but it separates the repetitions of the original function, f(x), with 2L-wide gaps. If we let L increase toward infinity, we move the repetitions out of existence, leaving f(x) alone as we wanted.

How do Equations 25 and 26 change in response to that
expansion of L? The quantity 2nπ/(2D+2L) becomes continuous in the limit as L
approaches infinity. That means that we can gather terms in Equation 25 into j
intervals of width Δn_{j},
thereby transforming the equation into

(Eq’n 27)

in which we have defined

(Eq’n 28)

and

(Eq’n 29)

Equation 28 gives us

(Eq’n 30)

which we substitute into Equation 27 to obtain

(Eq’n 31)

in which

(Eq’n 32)

In the limit as L→∞
the sum approaches an integral and G_{L}(x)→f(x),
so we have

(Eq’n 33)

in which we have

(Eq’n 34)

Equations 33 and 34 thus give us the analogues of Equations 25 and 26 on an infinite domain with the non-periodic function f(x). Those equations show perfect symmetry in an x-k space and constitute the Fourier transforms, respectively, of g(k) and f(x). These are the equations we use in physics.

Appendix: Average Values

When we have a distribution of some kind, in which we assign numbers to the elements in a set of indexed entities, we can’t always use the whole distribution at once, so we use average values of some of its parameters. Think, for example, of the calculation of a body’s center of inertia (also called its center of gravity or center of mass) as the average location of the mass that constitutes the body’s total inertia. Or think of the problem of the Drunkard’s Walk with a large number of drunkards staggering about a lamppost and of trying to calculate the average location of the drunkards after they take a certain number of steps.

In the latter example we take the number of steps that a drunkard takes to the left or to the right as our indexed parameter, using the negative and positive integers on the number line as the indices. We then assign to each index the number of drunkards that we expect to find on the spot on the sidewalk associated with that index after all of the drunkards have taken a certain number of steps (at random, of course, because these guys have gotten themselves totally soused). We calculate the average location of the drunkards by saying that the product of the average location and the total number of drunkards equals the sum of the products that we obtain by multiplying the index of each location on the sidewalk by the number of drunkards standing on that spot. Dividing the sum by the total number N of drunkards gives the average location of the drunkards,

(Eq’n 1)

in which m represents the number of steps the drunkards have taken and n(x) represents the number of drunkards standing on the location with the index x.

That calculation gives us <x>=0 (in which the angle brackets indicate the average value of the variable they enclose), regardless of how big m grows. But the drunkards start out all concentrated at the lamppost (x=0) and then spread out as m increases. We then want something to augment our average value of the indexed parameter, something that will tell us a little about how our distribution spreads out around its average location. Having considered the first power of the indexed parameter, we naturally consider next the second power.

We define the variance of a distribution as the average of
(x-<x>)^{2}, so if we can characterize our distribution by a probability
function (p(x)=n(x)/N in the example in Equation 1), then we have the variance
as

(Eq’n 2)

We represent the variance as the square of sigma because sigma itself, the root-mean-square of x-<x>, represents the standard deviation of the distribution. If the indexed parameter begins to resemble a continuum, then we replace the probability function by a probability density ρ(x) and the calculation of the variance becomes an integral,

(Eq’n 3)

We thus get a good measure of the width of our distribution. Better yet, the variance gives us the minimum value of the second moment of the indexed parameter. For a proof Equation 2 gives us

(Eq’n 4)

in the last step of which I have made use of the fact that the average of a sum equals the sum of the averages. Because an average acts as a constant in the calculation of another average, the second term in the last line of that equation goes like this,

(Eq’n 5)

so we get

(Eq’n 6)

The first term on the right side of that equation represents the average
(also known as the expectation value) of x^{2} and the second term
simply gives us the square of the average value of x. Rearranging that equation
yields

(Eq’n 7)

This equation describes how to calculate the average of the second moment of x and it clearly reaches a minimum value when <x>=0, which happens when the distribution achieves perfect symmetry about a point that we can identify as the origin of the relevant coordinate system.

For physical clarity, Equation 7 stands as an analogue of the calculation of a body’s moment of inertia. If we conceive a body as a distribution of particles (as we do in the atomic theory of matter), then we can calculate its moment of inertia relative to an axis running through its center of mass using a version of Equation 2 or Equation 3. If we have a line running parallel to that axis and passing a distance D from it, then we can calculate the body’s moment of inertia about that line as

(Eq’n 8)

in which M represents the total mass of the body and I represents the moment of inertia about the body’s center of mass. We can see that if we want to apply a torque to the body, we encounter the minimum moment of inertia when D=0.

Appendix: The Schwarz Inequality

Augustin Cauchy first presented the inequality for sums in 1821. In 1859 his student Viktor Yakovlevich Bunyakovsky noticed that he could obtain the integral form of the inequality by taking the appropriate limits. And in 1885 Karl Hermann Amandus Schwarz (1843 Jan 25 – 1921 Nov 30) obtained the general result for vectors on an inner product space.

The theorem applies to entities in an inner product space,
so we need to see what we have in that entity. An inner product space consists
of a vector space on which we define an operation called the inner product (or
scalar product or dot product) whose domain is the set of all ordered pairs of
elements in the vector space and whose range consists of the set of all numbers.
If we express a vector **x** in terms of its components relative to an
orthonormal basis of the vector space,

(Eq’n 1)

then we have the inner product of vectors **x** and **y** as

(Eq’n 2)

The Schwarz inequality then tells us that

(Eq’n 3)

or, in terms of components,

(Eq’n 4)

To prove and verify that statement we so define a real
scalar number N that we have, for vectors **x** and **y**,

(Eq’n 5)

By the rules of inner multiplication we have

(Eq’n 6)

We want to use a value of N that minimizes that expression, so we apply the operator d/dN to the equation and set the result equal to zero,

(Eq’n 7)

which we then solve for

(Eq’n 8)

Substituting that result into Equation 5 by way of Equation 6 yields

(Eq’n 9)

which stands true to mathematics if and only if Equation 3 does also.

We don’t always want to use a discrete array of vectors. As the indices in Equation 4 evolve from a set of discrete points on the number line to a continuum, the sums must evolve into integrals. If we plot the indices, designated by q, on the number line, then in the limit Equation 4 becomes

(Eq’n 10)

If the integrable functions are complex valued, then we must rewrite that equation as

(Eq’n 11)

We use that form in deducing Heisenberg’s indeterminacy principle.

habg