The Principle of Least Action

Energy has two aspects B kinetic and potential. Kinetic energy represents a numerical relationship between a body and an observer based on the relative velocity between the body and the observer. We thus expect to express a body=s kinetic energy in some function, T(v), of that relative velocity. Potential energy represents a numerical relationship between a body and a forcefield based on the position of the body relative to the source of the forcefield. We thus expect to express a body=s potential energy in some function, U(x), of the body=s position relative to some other body.

If we add a body=s
kinetic and potential energies together, we get the body=s
total energy, E=T+U. That number conforms to a conservation law, which means
that the number assigned to a given body does not change unless the body
exchanges energy with another body in a collision. As a scalar conforming to a
conservation law, energy does not depend upon how we describe it, so we can use
generalized coordinates (such as angular displacement and angular velocity) in
the formulae that we use to calculate its value. In terms of the generalized
location coordinates q_{i} we have the corresponding generalized
velocities as w_{i}=dq_{i}/dt.

A body whose location we describe with generalized coordinates also has a generalized momentum associated with it in accordance with

(Eq=n 1)

and it is subject to a generalized force in accordance with

(Eq=n 2)

Newton=s second law of motion also generalizes, equating the generalized force applied to a body to the rate at which the body=s generalized momentum changes with the elapse of time,

(Eq=n 3)

If the body=s kinetic
energy is also a function of q_{i} (as, for example, in the case of an
electrically charged body in an electrotonic field), then we must add
MT/Mq_{i} to
the exerted force, so, with a little algebraic rearrangement, we have Equation 3
as

(Eq=n 4)

in which equation we have defined the Lagrangian function as L=T-U and we have tacitly exploited the fact that true velocity-dependent potentials do not exist. Thus we obtain the Euler-Lagrange equations, one for each value of the index.

At every point on a body=s
trajectory Equation 4 stands true to Reality and for points off the trajectory
it does not. Consider the points that lie a minuscule distance
δ**q**
from the body=s trajectory such that
at the trajectory=s endpoints, **q**_{1}
and **q**_{2}, we have δ**q**_{1}=δ**q**_{2}=0:
the set of all values of δ**q**
describes a path that lies arbitrarily close to the body=s
true trajectory. If we multiply Equation 4 by the components of
δ**q**,
we get

(Eq=n 5)

which represents the calculation of an energy-like area between
the body= s true path and the
variation that δ**q**
represents. We can eliminate the first term on the right side of that equation
if we integrate the whole equation with respect to the elapse of time between
the instants t_{1} and t_{2}, when the body occupies the points
**q**_{1} and **q**_{2}. At those instants, we assert, the
variation δ**q**
goes to zero and thus leaves us with

(Eq=n 6)

In that equation I completed the variation of the Lagrangian function as I would complete its differential, by implicitly adding the time derivative through the fact that δt=0. That latter equality represents the fact that the body in question cannot reach a given point on its trajectory earlier or later than it actually does, because such a variation in the timing would require the exertion of additional forces that we have not encoded in the Lagrangian. Finally I multiply Equation 6 by -1 (for convenience), exploit the fact that, like the operation of differentiation, the operation of variation commutes with the operation of integration, and obtain the mathematical expression of Hamilton=s principle,

(Eq=n 7)

the principle of least action. In that equation the integral of the Lagrangian function with respect to time, evaluated between two specific instants, represents the amount of action that the body enacts through its motions between those instants. That equation sums up the reformulation of Newtonian mechanics that Joseph Louis Lagrange introduced in 1788.

In 1833 William Rowan Hamilton devised and introduced an alternative version of least-action dynamics base on a Legendre transformation (see appendix) of the Lagrangian function. He did so in order to express the least-action principle in terms of a quantity that we control rather than a quantity that we measure.

At its most fundamental the Lagrangian comes to us as a function of a body=s location and velocity, both quantities that we can measure (we measure velocity indirectly as the ratio of distance moved and time elapsed, which we measure directly). But we don=t actually control the velocity of a body; we control its linear momentum through the forces that we impose upon the body in accordance with Newton=s second law of motion. Rewriting Equation 1 in terms of the Lagrangian, we get the conjugate momentum as

(Eq=n 8)

the partial derivative of the Lagrangian with respect to the corresponding generalized velocity. That fact gives us a very simple Legendre transformation of the Lagrangian as a function of velocity into the Hamiltonian as a function of momentum;

(Eq=n 9)

The sum of w_{i}p_{i} equals twice the kinetic
energy of the system that we have described, so the Hamiltonian equals the sum
of the system=s kinetic energy and its
potential energy (H=T+U=E), which equals the system=s
total energy.

Now calculate the differential of Equation 9. We get

(Eq=n 10)

in which I have implicitly summed over the index i. To simplify that equation I have used Equation 8 to eliminate the second and fourth terms in the expression on the right side of the equality sign. Because the differentials in that equation have arbitrary values, that equation then gives us Hamilton=s Equations:

(Eq=ns 11)

It=s easy to convince ourselves that there=s effectively no difference between the Lagrangian and Hamiltonian approaches to dynamics. That opinion comes to us because in Newtonian dynamics a body=s linear momentum and the body=s velocity are related through a constant, the body=s mass. But in relativistic dynamics the relation between momentum and velocity does not remain constant and in the quantum theory the body does not have a well-defined velocity, so we must use a conserved quantity equivalent to it, the linear momentum. In both cases of modern physical theory physicists have used Hamiltonian dynamics to create the relevant theories.

Now we want to describe the total energy carried by a system of bodies purely in terms of that system=s Lagrangian function. To obtain such a description we merely subtract the Lagrangian from twice the system=s kinetic energy;

(Eq=n 12)

We can, of course, replace the Cartesian velocities in that expression with generalized velocities without affecting the equality with the system=s energy. We can differentiate that equation with respect to time to describe how the system=s total energy changes with the elapse of time. We get

(Eq=n 13)

That equation equals zero because the expression in parentheses takes us to the Euler-Lagrange equations (Equation 4), which zero out inherently. Thus Equation 13 expresses the law of conservation of energy, the statement that the system=s total energy does not change as time elapses.

In Equation 7 we can add an arbitrary constant to the Lagrangian function without changing the content or consequences of Hamilton=s principle. If we choose the system=s total energy as that constant, then we must restrict the varied paths in the derivation of the equations of motion to only those paths that represent the same total energy as does the true path the system follows. We can certainly do that, so, without changing the substance of Hamilton=s principle, we get

(Eq=n 14)

In making the transition to the second line in that equation I
had to change the limits on the integration from two instants of time to the two
points, A=A(t_{1}) and B=B(t_{2}), that the body in question
occupies at those instants of time. The third line gives us the abbreviated
action functional and the fourth line gives us the oldest form of the principle
of least action, Maupertuis=
principle.

In 1744 Pierre-Louis Moreau de Maupertuis (1698 Jul 17 B 1759 Jul 27) and Leonhard Euler (1707 Apr 15 B 1783 Sep 18 [Sep 07 Old Style]) discovered a version of the least action principle displayed in the fourth line of Equation 14. Both men produced the same equation, differing only in the fact that Maupertuis did not include the mass factor in his version. In devising their equation they perfected a discovery that began coming to light shortly after the turn of history (on the Christian calendar).

In the First Century Anno Domini, Heron (or Hero) of Alexandria stated that the law of reflection described in Euclid= s Catoptrica corresponds to the statement that light follows the shortest path, relative to other possible paths, from its source (e.g. a candle flame) to a mirror or pool of still water and thence to the observer=s eye. But that astounding result did not account for the other way of changing the course of light B refraction. In 1621 Willebrord Snel von Royen (1580 ? B 1626 Oct 30) worked out a proper mathematical description of refraction and in 1657 Pierre de Fermat (1608 Aug 17 B 1665 Jan 12) found that he could account for both reflection and refraction through the postulate that Alight travels between two given points along the path of shortest time.@

Maupertuis found Fermat=s principle unsatisfactory; he felt that distance and duration should stand on something like an equal footing in physics (almost anticipating Minkowski by over a century and a half). He knew that refracted light does not follow the path of minimum length, but discovered, on further analysis, that it follows a path of minimum weighted length, the weighting factor being the speed at which the light moves, thereby obtaining his version of Equation 14. Maupertuis founded his hypothesis upon the statement that ANature is thrifty in all its actions,@ a form of Ockham=s Razor (a term William Rowan Hamilton devised to denote the statement, Aentities are not to be multiplied beyond necessity,@ attributed to William of Ockham (ca. 1285 B 1349).

Where Maupertuis devised his principle of least action to describe the motions of light, Euler devised the same principle to describe the motions of mass-bearing bodies. On the assumption that the masses of bodies did not change, Euler was able to use the same form of the equation the Maupertuis used,

(Eq=n 15)

but primarily he used that equation multiplied by the bodies= masses and thus minimized the abbreviated (or reduced) action of Equation 14. By using the bodies= momenta, rather than their velocities, as the weighting factor on the element of path length Euler was able to discern that the principle encoded in Equation 15 only stands true to Reality when the system=s energy is conserved, a fact that comes from the assumption that a body=s speed depends only upon that body=s location in space.

Joseph Louis Lagrange (Giuseppe Lodovico Lagrangia; 1736 Jan 25 B 1813 Apr 10) developed most of the calculus of variations in 1760 and then used it to derive general equations of motion for bodies subject to various forces, which work he published in his Mécanique Analytique (1788). In the period spanning 1834 and 1835 William Rowan Hamilton (1805 Aug 04 B 1865 Sep 20) developed the principle of least action in its modern form by applying the calculus of variations to the time integral of the Lagrangian function (L=T-U) and obtaining thereby the Euler-Lagrange equations.

Though we describe the equations for Hamilton=s principle (Equation 7) and for Maupertuis= principle (Equation 14) as both encoding the principle of least action, we can discern significant differences between them. We can see three fundamental ways in which the two expressions of the principle differ:

1. In Hamilton=s principle we define the action as the integral with respect to time of the Lagrangian function, the difference between the bodies= kinetic and potential energies. In Maupertuis= principle we use the abbreviated action, the integral with respect to the generalized coordinates of the bodies= generalized momenta, which integral corresponds to twice the bodies= kinetic energies integrated with respect to time, the potential energies only coming into play implicitly through their effects on the bodies= momenta.

2. In our use of Hamilton=s
principle we must specify fixed endpoints where the varied paths and the actual
path come into coincidence, both in time (t_{1} and t_{2}) and
in space (**q**_{1} and **q**_{2}), but the varied paths
need not conform strictly to the law of conservation of energy. In our use of
Maupertuis= principle we need only
specify the spatial endpoints, **q**_{1} and **q**_{2},
and compensate the lack of fixed temporal endpoints by requiring that all paths
used in the calculation conform to the law of conservation of energy.

3. Application of Hamilton=s
principle gives us a description of a body=s
trajectory as an explicit function of time, **q**(t); that is, it gives us a
function that describes the body=s
location in space at any given time. On the other hand, application of
Maupertuis= principle gives us a
description only of the shape of the body=s
path without telling us when the body will occupy any given point on that path.

Finally, the principle of least action raises an issue that has vexed physicists for some time, the issue of teleology. In classical Newtonian physics bodies move in a deaf, dumb, and blind universe, each body=s motion affected only by what exists at the point the body occupies at the instant it occupies that point. Variational physics, on the other hand, seems to require that the body anticipate what it will experience at distant points at future instants. The principle of least action thus seems to require the spooky actions at a distance that Einstein disliked and equally spooky final causes, their temporal analogue (thereby making spookiness something like Lorentz invariant). Consider one expression of the problem, Feynman=s Question.

Richard Phillips Feynman (1918 May 11 B 1988 Feb 15) once asked Why does any given body move in a way that minimizes the average difference between its kinetic energy and its potential energy? Over a given elapse of time what determines the path that a body must follow to ensure that the average value of its Lagrangian function goes to its minimum? Can the body, in some sense, look ahead and plot its path accordingly? Or do we have a much simpler answer available to us?

Newton=s laws of motion do not raise any philosophical problems in this area because we express them through differential equations. Those equations tell us what happens to certain quantities localized to single points in space and/or to single instants of time, how those quantities change with changes in location or time. Newton=s second law of motion, for example, equates an applied force to a body=s inertial reaction and thus tells us that a force applied to a body at a given point in space produces an acceleration of the body at the instant it occupies that point. Given a suitable description of the body=s initial state (i.e. its position and velocity at some initial instant), we can integrate the equations of motion to obtain a description of the path the body follows as time elapses, in essence following the body as it enacts its motion on the path element by minuscule element.

On the other hand our use of integrals and the calculus of variations associated with the principle of least action raises suspicions of a teleology, an acceptance of the existence of final causes and their associated actions at a distance. In particular, the fact that we must specify the final state of the system as the upper limit of an integral seems to give the principle of least action a teleological character: in essence we must determine the body=s destination and arrival time before the body departs its initial state (the lower limit on the action integral). In devising the principle of least action we seem to have postulated that the body looks ahead and figures out which path to the final endpoint involves the least action and makes something like a conscious decision to follow that path.

Of course, nothing like that actually happens. Inanimate objects don=t calculate or make conscious decisions. So why do they act as if they do?

At this point we should remind ourselves of what physicists call Newton=s zeroth law, the postulate that Isaac Newton tacitly assumed into his premises when he devised the Principia B that we can describe Reality with mathematics; in particular, the mathematics of smooth, continuous functions. That postulate necessitates that mathematical operations that alter those functions also correspond to physical phenomena, that the equations containing those operations correspond to laws of nature. But we need also remind ourselves of Korzybski=s Axiom from General Semantics, which I paraphrase as Athe word does not correspond identically to the Platonic Form of the thing it denotes@. In like manner, we can say that the equation does not coincide identically with the physical law that it represents. Nonetheless, the concepts of theory mimic the percepts of measurement and we want an explanation.

In Newtonian dynamics we have a perfect here-and-now relationship between applied force and inertial reaction. But we can also represent the applied force as the gradient of potential energy and we know that calculating the relevant derivatives obliges us to know the value of the potential energy at neighboring points near the one occupied by the body whose motion we want to track. Do those facts oblige us to hypothesize that the body puts out ætherial feelers to determine the steepest descent Adownhill@? We may feel a temptation here to assert the existence of Werner Heisenberg=s indeterminacy principle as representing a smearing of a particle over space and time in a way that satisfies the above requirement, but Heisenberg=s principle gives us a smearing that is too localized to apply to the calculation of least action (though we will, later, deduce Heisenberg=s principle from the principle of least action). But we can resist that temptation by noticing that the calculation of the gradient of the potential energy does not represent an actual physical process: we calculated the potential energy in the first place by integrating the force acting over a region with respect to a feigned motion through that region, so calculating the gradient of the potential energy merely reverses that integration. The reality here comes manifest in the force and not in the potential energy (which simply gives us a convenient bookkeeping device).

Look again at the difference between the Newtonian determination of a trajectory and the equivalent Hamiltonian determination. In the Newtonian calculation we begin with initial conditions, the body=s position and velocity at the beginning of the trajectory. We then use first and second integrals with respect to the elapse of time to add increments to the body=s position and velocity in temporal order, in that way mimicking with our mathematics the body=s actual motion as it traces out its path to its destination. That fact subtly conditions us to expect our mathematical descriptions of Reality to act out what they describe. Thus, when we find that we must specify both the beginning and ending points of a body=s trajectory when we apply Hamilton=s principle to describe the body=s path, we intuit a process that looks at the body=s departure from Point A and arrival at Point B simultaneously. In the Newtonian case we ask AWhither does this body go from Point A in this forcefield?@ In the Hamiltonian case we ask AWhat path does this body follow in this forcefield to go from Point A to Point B?@ In the former case we want to devise a series of events and see whither they take us and in the latter case we lay out the path before the body even leaves Point A. In the former case we have a body adjusting its motion in response to a here-and-now force and in the latter case we have a line drawn in space for the body to follow. The former we accept without hesitation and the latter bugs us mightily. Like Professor Feynman, we ask, with some exasperation, Why does a body follow a path that makes the average difference between the body=s kinetic and potential energies a minimum?

When we first looked at the standard deduction of the principle of least action, we simply integrated the applied force and inertial reaction in Newton=s second law of motion with respect to distance that the implied body moves to obtain the equivalent potential and kinetic energies that the body possesses. Thus we found that the Lagrangian form of the principle of least action comes directly from Newton=s law. That the average value of the Lagrangian function that we calculate over the time interval between the body=s passing through Point A and its passing through Point B takes a minimum value comes to us as nothing more than an artifact of the integration of Newton=s law, the fact that the difference between the applied force and the body=s inertial reaction equals zero.

Look again at the Newtonian and Hamiltonian calculations. If we can vary the initial velocity of the body as it passes through Point A, we know that we can send the body through any other point in space. That fact stands true to Reality because every initial velocity sends the body onto a unique path and we have available to us an infinite set of initial velocities. A body can reach any point in space from any other point and between any two points in space there exists a path of least action. By specifying the initial velocity of the body in the Newtonian calculation, we determine the Point B through which the body will eventually pass. By specifying the Point B in the principle of least action, we merely select the path and thus the initial velocity that will take the body through that point from Point A. That the body will follow such a path we have already deduced from Newton= s laws, so we can dismiss the alleged teleology in the principle of least action as an illusion.

Appendix: Legendre Transformations

We usually find these in classical thermodynamics, where physicists use them to convert the basic description of a thermodynamic system=s internal energy into the enthalpy and the Helmholtz and Gibbs free energies, which express the energy available to do work when some variables in the system can change and others cannot. The name of the transformation comes from Adrien-Marie Legendre (1752 Sep 18 B 1833 Jan 10), the French mathematician who devised it. In dynamics we use a Legendre transformation to go from the Lagrangian to the Hamiltonian formulation of classical mechanics.

Mathematicians often find it desirable to express a function of some variable, f(x), as a different function, one that uses the derivative of f(x) with respect to x as its argument. If we define p=df/dx as the argument of the new function, then we write the new function as g(p) and call it the Legendre transform of the original function. We define the Legendre transform g(p) of the function f(x) as follows:

(Eq=n A-1)

In that equation the notation max_{x} denotes the fact
that we must maximize the expression in the parentheses with respect to x while
holding p constant. We determine the criterion that we must satisfy in order to
achieve that maximization by setting the derivative of the expression with
respect to x equal to zero;

(Eq=n A-2)

We see that we have maximized the expression already when we defined p as the derivative of f(x) with respect to x. We know that we have achieved a maximum because we can see that the second derivative of the expression with respect to x yields a negative number. However, we must take note that we have a well-behaved Legendre transform if and only if f(x) represents a convex function.

Reference to the function as convex implies a
geometric interpretation that we can extend to the Legendre transformation
itself. Putting it a little too simply, we call a function convex if it has a
positive second derivative. We have two ways in which we can plot a function as
a curve on a graph; we can draw it as a set of points (the usual way of doing
it) or we can draw it as the envelope of a set of straight lines, the set of
tangents to the curve. In the first way we simply calculate the y-coordinate of
the curve through a function that correlates it with the corresponding
x-coordinate: y=f(x). Through that relationship we can assert enough values of x
to calculate enough points (x, y) to reveal the curve. We base the second way of
drawing the curve upon the fact that for a straight line with a slope m=dy/dx
and y-intercept b we have the equation y=mx+b (note that this description gives
us a point-by-point description of the line). We want a straight line to
represent a tangent to the curve at the point (x_{i}, f(x_{i})),
so we write the straight-line equation as

(Eq=n A-3)

That has the form of a Legendre transformation, as we can see when we recall the definition of m. We can then invert the equation, determining the y-intercept for a given slope and reforming the equation as a function of the slope.

Of course, if we apply the transformation to the same curve twice we get the original description of the curve, so we recognize that the Legendre transform acts as its own inverse. Like the familiar Fourier transform, the Legendre transform takes a function of a coordinate, f(x), and converts it into a function of a different variable, g(p). However, while the Fourier transform consists of an integration with a kernel, the Legendre transform takes a simpler algebraic approach and uses maximization as the transformation procedure.

We can generalize the Legendre transformation to the mathematical process of convex conjugation, also known as the Legendre-Fenchel transformation. Physicists commonly use it in thermodynamics and in the Hamiltonian formulation of classical mechanics.

In thermodynamics we use the Legendre transformation to convert our basic description of the internal energy of a system into descriptions of the various kinds of free energy that come available when we change which properties of the system we can change to alter the energy in the system. We can get different results because in a system described by more than one variable we have the possibility of inexact differentials appearing in the calculation, which means that the result of integrating those differentials depends upon the path that our integration follows on a graph of the relevant variables. This fact tells us that the Legendre transformation is no mere mathematical sleight of hand; it has real consequences when we apply it to our description of Reality.

In classical dynamics we use the Legendre transformation to convert the Lagrangian function into the Hamiltonian function, as shown above.

Appendix: A Comment on Fermat=s Principle

In his 1924 paper extending the quantum
theory from light to matter Louis de Broglie asks us to describe a wave in four
dimensions through its phase number
θ.
If we have a world-ray that passes through the four-points P=(x_{1}, y_{1},
z_{1}, t_{1}) and Q=(x_{2}, y_{2}, z_{2},
t_{2}), then Fermat=s
principle tells us that

(Eq=n B-1)

Thus the phase of the ray is invariant, which means that we can represent it as the dot product of two four-vectors,

(Eq=n B-2)

in which Ω_{i}=Ω_{i}(x_{i})
represents the world-wave. Note that the index number takes the values from one
to four in this relativistic analysis. If we let r represent the direction in
which the wave propagates in space, then we have

(Eq=n B-3)

in which v represents the wave=s frequency and V represents the wave=s speed of propagation. Reconciling that equation with Equation B-2 gives us

(Eq=n B-4)

in which the index j takes the values 1, 2, and 3. Cos(x_{j},t)
represents the direction cosine between the direction of the wave=s
propagation and the direction defined by the x_{j}-axis. In that
equation Ω_{j}
represents the wave number of the world-wave.

If the frequency of the world-wave remains constant, then Equation B-1 takes the form of Maupertuis= principle,

(Eq=n B-5)

in which the spatial points A and B relate to the previously
defined four-points as P=P(A, t_{1}) and Q=Q(B, t_{2}). On that
basis we can now write Planck=s
relation, E=hv,
as J_{4}=hΩ_{4},
which implies that we must have J_{i}=hΩ_{i}
for all four values of the index. We thus rewrite Equation B-2 as

(Eq=n B-6)

In this way we use Relativity to extend the original quantum theory from a representation in energy only to a representation that includes linear momentum.

From that analysis Louis de Broglie devised his quantum postulate: AFermat=s Principle applied to a phase wave is equivalent to Maupertuis= Principle applied to a particle in motion; the possible trajectories of the particle are identical to the rays of the phase wave.@ De Broglie added, AThe hypothetical proportionality of J and Ω is a sort of extension of the quantum relation (Planck=s hypothesis), which in its original form is manifestly insufficient because it involves energy but not its inseparable partner: momentum.@ Thus the principle of least action provided a foundation upon which physicists built the full quantum theory of both light and matter.

habg