A Rederivation of the Quantum Theory

Isaac Newton’s deterministic mechanics may seem an unpromising start for a discussion of probabilistic dynamics, but the correspondence principle requires the existence of a connection between the two different kinds of physics. We can find that connection by considering a question that Newton sidestepped.

In Newton’s dynamic geometry, if someone knows the force exerted upon a particle over some elapse of time, then that person can calculate a description of the path that the particle follows as a consequence of coming under that force. The procedure consists of dividing the force by the forced particle’s mass to obtain a description of the particle’s acceleration and then integrating that acceleration twice with respect to the elapse of time in order to obtain descriptions of the particle’s velocity and position at any given instant. The two constants of integration correspond to the particle’s velocity and position at the instant when the integration begins, the system’s initial conditions.

That procedure yields a description of the particle’s trajectory as a set of points. With zero width, the path provides a perfectly deterministic description of whither the particle will go. Insofar as human-scale experiments call tell, that description correctly describes Reality. But now we come to the question that Newton never asked.

We have assumed that we can describe the force acting on a particle as a function of time and that statement implies that we can also describe the applied force as a function of position, as in Newton’s law of gravity. But, taking a risk of slipping into the pathetic fallacy, we must ask how the particle knows how much force acts upon it in a given instant. Of course, we must answer that the particle can’t know anything: it doesn’t know what position in space it occupies, it doesn’t know how far away the source of the force lies, it certainly can’t calculate the strength of the force, and it has no mechanism for adjusting its acceleration in response to any calculation. We have no alternative: we must assert the existence of something that spreads out from the source and imposes something upon the forced particle that determines the magnitude and direction of the force acting upon that particle.

Physicists call that something a forcefield. Various symmetries interpreted through Newton’s third law of motion give us a description of the magnitude and shape of the forcefield emanating from a point-like source. Most importantly here, the function that describes the forcefield has a value at every point in space and the values at neighboring points differ from each other in a way that makes the function smooth and continuous. If we confuse the forcefield with its algebraic description, then we can say that the properties of continuity and smoothness make the forcefield differentiable and integrable.

Integrating the force acting on a particle over some elapse of time yields the change in the linear momentum that the particle acquires between the two instants that serve as the limits of the integration: from that result we can calculate the particle’s velocity. Integrating the force acting on a particle over some distance that the particle travels parallel to the applied force yields a number that we call the change in the energy that the particle gains or loses between the two points on its path that serve as the endpoints of the integration. That number doesn’t seem to have much value in basic dynamics beyond its use in working out descriptions of collisions. But we don’t actually need to have a particle in the forcefield to calculate an energy: we can integrate the potential force in the field over any path we desire and thereby obtain a description of the potential change in the energy that the forcefield would impose on the particle as it moves between the endpoints of the path.

The forcefield takes us from Newton’s dynamics in the indicative mood into physics in the subjunctive mood. If we can define a set of points (usually taken to lie "at infinity") at which the potential energy equals zero, then we can calculate a definite potential energy at every point in the field, just as the field gives us a potential force at each and every point. Newton’s third law of motion necessitates that at any given point in the forcefield, with respect to the location of the source, the strength and orientation of the potential force must remain the same for all time; therefore, the value of the potential energy at each point can never change. That fact means that in going from point A to point B the forced particle gains as much realized (or kinetic) energy from the forcefield as it lost to the forcefield in going from point B to point A. That fact, in turn, means that for particles moving in a forcefield the sum of their kinetic and potential energies remains constant with the elapse of time: total energy obeys a conservation law similar to the one that Newton’s third law of motion represents, the conservation of linear momentum.

At a given point in the forcefield occupied by the forced body the force presents us with two aspects. First, we have the force that the field exerts upon the body, equal to the negative of the gradient of the potential energy. In the direction parallel to one of the axes of our coordinate grid we have

(Eq’n 1)

and we have identical equations describing the force exerted in the directions parallel to the other two of the mutually perpendicular coordinates axes of our grid. And second, we have the force of the body’s inertial reaction, described by Newton’s second law of motion as equaling the rate at which the body’s linear momentum changes with the elapse of time. In the direction parallel to one of the axes of our coordinate grid we thus have

(Eq’n 2)

in which T represents the body’s kinetic energy and v=dx/dt, the body’s velocity. Of course, we have an identical equation for each of the other two axes of the coordinate grid. If we combine those two equations, we get for the x-direction

(Eq’n 3)

All three versions of that equation remain valid at every
point on the path that the body actually follows as it goes from point A to
point B between two particular instants t_{0} and t_{1}. That
statement does not remain valid for points off the actual trajectory: we would
have to include an additional applied force term on the left side of Equation 3
in order to make the body follow a deviant path. But Newton’s zeroth law
requires that the functions in Equation 3 remain smooth and continuous with
respect to the coordinates, so if the body follows a path that differs from the
actual trajectory by a vector amount
δ**x**,
an amount that goes to zero at point A and point B, the equation valid on that
path must differ from Equation 3 by an amount that starts at zero and then grows
smoothly and continuously as δ**x**
grows.

If we multiply Equation 3 by dx, integrate the product over the body’s actual trajectory from point A to point B, and add the result to the corresponding integrals for the other two coordinate axes, we obtain a constant, the body’s total energy. If we carry out the same calculation on a deviant path, we get a bigger constant, which means that the constant that we got from integrating along the actual trajectory represents a minimum value. Thus, if we multiply Equation 3 by the deviation δx and let the value of δx approach the infinitesimal, we get a function that resembles the differential of a function at its minimum. In the x-direction we get

(Eq’n 4)

If we multiply that equation by dt and integrate the
product over the elapse of time between the instants t_{0} and t_{1},
we get

(Eq’n 5)

In that calculation the first term in the first integral zeroes out because δx=0 at the endpoints of the trajectory and, therefore, at the limits of the integration. And, because we have a definite integral, the constant of integration must equal zero (C=0). The total energy of the body, T+u, must remain constant (to satisfy the conservation law), so we know that

(Eq’n 6)

stands true to physics in any situation. If we subtract that equation from the parenthesized factor in the second integral of Equation 5, we get

(Eq’n 7)

Multiplying that equation by minus one and commuting the variation operator and the integration gives us

(Eq’n 8)

one form of the principle of least action. We can prove and verify the validity of that statement by differentiating Equation 8 with respect to the velocity of the body and reintegrating the resulting derivative with respect to the same variable:

(Eq’n 9)

in which A and B represent the points that the body occupies, respectively,
at the instants t_{0} and t_{1}. That equation expresses
Maupertuis’ version of the principle of least action. That result gives us two
consequences.

First, Equations 8 and 9 have the form of conservation laws; each states that the difference between the values of some quantity calculated at certain endpoints necessarily equals zero. That fact brings the finite-value theorem into play, necessitating that no body can ever enact an infinite quantity of action. Because a continuous distribution is infinitely divisible, a proper description of action must conform to a discontinuous distribution, which fact necessitates the existence of a finite minimum value for the increment of action. We designate that value with Planck’s constant and we thus have

(Eq’ns 10)

which equations express Heisenberg’s Indeterminacy Principle. Because energy and linear momentum both obey conservation laws, dT (or dE) and dp can never have infinite value, which means, in light of Equations 10, that neither dt nor dx can approach the infinitesimal, much less zero. That fact means that no one can know with perfect precision the location of the body at a specific instant or the precise instant when the body arrives at a given point.

Second, the derivation of Equations 8 and 9 necessitated
the use of a fantasy in which a body feigns following paths other than the one
that it actually follows between two events in spacetime (leaving point A at
time t_{0} and arriving at point B at time t_{1}). We assume
that the body follows the actual trajectory, but then we must ask how the
properties of the deviant paths ensure that the body follows the true path.
Again, we must assert that something spreads out from the body, something that
fills space like a forcefield but that interacts with a forcefield as a body
would do. Mathematically we can represent that something as a flock of imaginary
particles identical to the body that they represent, a swarm of ghosts that move
with the body and guide it along its trajectory.

Another factor necessitates the existence of that swarm of ghost particles. The versions of Equation 10 in the directions perpendicular to the body’s true trajectory put a halo of indeterminacy around the body. We cannot confuse indeterminacy with engineering uncertainty, the give or take that comes from rounding off to a tolerance. True indeterminacy means that there exists no possibility of perfect precision. In this aspect we can say that we can never know where the body actually exists, so we must think of a body as having only a fractional existence at each point.

In either case, a swarm of ghost particles or a fog of fractional existence, we must represent the body with a distribution function, a probability density for finding the body at any point at any given instant. One constraint on that function necessitates that integrating it with respect to volume over all space available to the particle yields the number of certainty;

(Eq’n 11)

Another constraint on the probability distribution ñ, the correspondence principle, tells us that if we could make Planck’s constant go to zero, then the distribution function must represent the classical description of a particle, which means that, in that limit, it must conform to a Dirac Delta,

(Eq’n 12)

Note that we must take care not to confuse the Dirac Delta with the delta used to represent the operation of variation in the calculus of variations.

Because the body’s motion must conform to Hamilton’s Equations,

(Eq’ns 13)

the evolution of the distribution function must conform to Liouville’s theorem. Recall that Liouville’s theorem states that the distribution function remains constant along any actual trajectory in phase space. Mathematically Liouville’s theorem states

(Eq’n 14)

In that equation I used Hamilton’s Equations to eliminate the term in square brackets. We can also use Hamilton’s Equations, as many physicists do, to rewrite the first line of that equation as

(Eq’n 15)

in which we define the Liouvillian operator as

(Eq’n 16)

in which {_,H} represents a blank Poisson bracket. Equation 15 can then become

(Eq’n 17)

We can solve Equation 15 by exploiting Taylor’s theorem in
a special way. We assume that the particle moves with a velocity **v** and
write

(Eq’n 18)

in which Q(vt) represents an operator that transforms the distribution function calculated at a time t’=0 into the distribution function describing the same particle or system at the time t’=t. Taylor’s theorem tells us that we can represent the function on the left side of that equation as an infinite series,

(Eq’n 19)

Differentiating that equation with respect only to time gives us

(Eq’n 20)

That corresponds to Equation 15, so we accept Equation 19 as correctly describing the evolution of the distribution function.

Consider the fundamental property of the distribution function. Because the function represents a probability density it can only take positive real values between zero and one. But such a tightly restricted range comes from a function of coordinates and dynamic variables whose domains may spread as wide as from minus infinity to plus infinity. Therefore, there must exist some function, ψ, of the coordinates and dynamic variables, which state function encodes all knowledge about the system to which it applies, such that the probability density corresponds to an even power of that function. We assert that

(Eq’n 21)

because we can always represent any higher even power as a square of
something. And because the square of
ψ
maps an infinitely wide, positive and negative domain onto a narrow, positive
range, we infer that it exists primarily as the exponential function of a
complex number. Thus we infer the **first postulate of quantum mechanics**.

The probability density mediates the conversion of the coordinates and dynamic variables pertaining to a particle into the probability of the particle actually manifesting a given value of those variables, to within the precision allowed by Heisenberg’s Indeterminacy Principle. The state function carries that same property. Suppose that at a given instant we want to measure the value of some quantity q pertaining to the particle. An appropriate value of q exists encoded within the state function, so we can make that value explicit by applying a suitable mathematical operator Q to the state function, thereby enacting the mathematical analogue of making an actual measurement.

As an example, consider a calculation of a particle’s angular momentum relative to some point, the origin of our coordinate system in this example:

(Eq’n 22)

In that equation I have expressed the fact that angular momentum corresponds
to the product of two other observable quantities (properly I should use the
vector cross product **L**=**r**x**p**). We apply the operators
corresponding to the two observable quantities sequentially to the state
function, extracting the linear momentum information first and then the radial
distance information. For that procedure to yield a proper value for the angular
momentum the application of the linear momentum operator to the state function
must yield a description of the linear momentum that the radial distance
operator will not act upon and also leave the state function unaffected so that
the radial distance operator can extract the proper information from it. We can
only get that result with the appropriate guarantee of validity if the action of
the linear momentum operator on the state function corresponds to multiplying
the state function by a constant. That statement must stand true to the quantum
theory for all operators that act on the state function, so we have in general

(Eq’n 23)

In that equation Q represents the operator and q represents the constant that it extracts from the state function. Thus the constant serves as an eigenvalue of the eigenfunction ø with respect to the operator Q.

So now we know that the state function has such a nature
that a mathematical operator exists for every observable quantity encoded in
that state function. In turn, the state function acts as an eigenfunction for
that operator and all others. Thus we have the **second postulate of quantum
mechanics**.

Because we must use an eigenvalue equation to describe a quantum system, we seem to have a perfectly deterministic description of an inherently indeterminate situation. That contradiction dissolves when we understand that any given quantum system has associated with it a large set, perhaps even an infinite set, of eigenfunctions. The complete state function describing the system must thus comprise a linear superposition of those eigenfunctions,

(Eq’n 24)

That statement gives us the **fourth postulate of quantum mechanics**.

If we apply an operator Q to that state function, we get

(Eq’n 25)

That equation doesn’t give us an actual number that we can ascribe to our
quantum system. To get something we can use we exploit the fact that the square
of the state function gives us a probability density and the fact that we must
add the probabilities of mutually exclusive outcomes to each other to get the
probability of one or another of those outcomes coming true to Reality. If we
associate with our system a set of eigenstates, then we have the average value
of the eigenfunctions q_{i}, what we call the expectation value, as

(Eq’n 26)

That statement, that we can only calculate average values from our theory,
gives us the **fifth postulate of the quantum theory**.

That equation necessitates that the eigenfunctions form an orthonormal set; in other words, the state function of Equation 24 must multiply by itself in the manner of a vector dot product, in which any component yields a nonzero result only when multiplied by itself. The orthonormality of the state function allows us to write Equation 26 as

(Eq’n 27)

In that equation I have represented the first manifestation of the state function as its complex conjugate. That comes about because the square of the state function must compress a wide domain into a narrow range, which necessitates that the state function come to us as a complex number, typically a complex exponential function, and the square of a complex number comes from multiplying the number by its complex conjugate. Here we see a reflection of the first postulate.

Because the operator applied to the state function must
yield that same state function multiplied by a number that represents the
measurement of a physical quantity, the operator must be linear; in other words,
if the operator has components, those components combine only through addition.
Further, the eigenvalues that come out of the state function must be real
numbers, because they represent measurements of real physical quantities, so the
operators representing physical quantities must be Hermitian; which means, the
operators must have a mathematical form that is self-adjoint: if the operator
involves a matrix, it must be its own transpose and if the operator is complex
it must be its own conjugate. Thus we obtain the **third postulate of quantum
mechanics**.

The coordinates and dynamic variables pertaining to a quantum system come together in the function that represents the action enacted by the system. The state function pertaining to that system must thus contain that action function as a means of encoding its factors. Thus we must have ψ=ψ(px) and ψ=ψ(Et) as immediate examples.

For an operator that represents a quantity that we can measure more or less directly (such as distance spanned or time elapsed) we use a simple multiplication of the state function by the relevant number. For an operator that represents a quantity that we do not measure directly (such as linear momentum or energy) we must use something more complicated mathematically, such as the process of differentiation. For example, we can extract the linear momentum from the state function by differentiating it with respect to location, so we have the eigenvalue equation as

(Eq’n 28)

We can solve that equation if we multiply it by dx, divide it by ø, and integrate the result. We get

(Eq’ns 29)

That solution still has a few pieces missing: the argument of any exponential function must be a pure number and the state function cannot increase or decrease monotonically. We can satisfy the second requirement simply by making the argument of the exponential imaginary, thereby making the exponential a complex number that evolves as a sinusoidal oscillation.

Figuring out how to satisfy the first requirement takes a little more thought. We can see immediately that we must divide the argument by something that carries the units of action. When we calculate the linear momentum we will use a multiplication by some constant to eliminate that conversion factor from our calculation, so it doesn’t matter what unit of action we use, as long as we remain consistent. As the obvious choice, we select the Planck unit of action (denoted by Planck’s constant); however, we also want to introduce a symmetry in which the state function returns to its initial value every time the system enacts one Planck unit of action, so we use the Dirac constant (ħ=h/2π) instead of Planck’s constant. Thus we rewrite Equation 29 as

(Eq’n 30)

and rewrite the eigenvalue equation (Equation 28) as

(Eq’n 31)

The same reasoning tells us that we can extract a value for the total energy of a system from the system’s state function through the eigenvalue equation

(Eq’n 32)

Unlike the case with linear momentum, we can take that equation further. In classical physics we know that the Hamiltonian function, when evaluated, yields a description of a system’s total energy. Mathematically the two functions differ from each other only in that in the total energy we express the kinetic energy as a function of the velocities of the system’s component particles and in the Hamiltonian function we express the kinetic energy as a function of the linear momenta of the system’s component particles;

(Eq’ns 33)

If we express the Hamiltonian function in terms of the linear momentum operator and make the appropriate substitution in Equation 32, we get

(Eq’n 34)

which we recognize as Schrödinger’s Equation, the fundamental equation of
quantum mechanics, the equation to which the state function must conform as it
evolves. And that’s how we obtain the **sixth postulate of quantum mechanics**.

Thus we devise a mathematical structure whose logic mimics the quantum nature of Reality. If we put into that structure certain numbers representing the initial conditions of a quantum system, then the structure’s equations will transform those numbers into a description of a subsequent state of the system, expressing it in numbers that match the numbers obtained from experiments on that system.

Finally I will note that the postulates are slightly out of order because I approached them in a way different from the way in which Robert H. Dicke and James P. Wittke listed them in the sixth chapter of their 1960 textbook "Introduction to Quantum Mechanics".

gabh