The Ergodic Theorem

Back to Contents

    Across the last third of the Nineteenth Century Ludwig Boltzmann developed much of the mathematical formalism of the statistical mechanics version of thermodynamics. In the course of that work he made an assumption that physicists and mathematicians came to call the ergodic hypothesis, the word ergodic coming from the Greek ergon (work) + hodos (road) and referring to systems that return to previously displayed states. In 1931 George David Birkhoff (1884 Mar 21 - 1944 Nov 12) published a proof that verified what then became the ergodic theorem. He based his proof on the recurrence theorem that Jules Henri Poincaré (1854 Apr 29 - 1912 Jul 17) had published in 1890. In this essay I want to look at Birkhoff=s proof and consider its applicability to statistical mechanics, but first I want to look at Poincaré=s recurrence theorem as a means of setting the stage for Birkhoff=s proof.

    First let=s take a quick glance at what we want to prove and verify. In its application to statistical thermodynamics the ergodic theorem tells us that for any uniform, extended stochastic system any average based on measurements of the system taken over the whole extent of the system at any instant in time equals the same average based on measurements of the system taken over a long elapse of time at a single point in the system. We need the truth of that theorem because our theories of statistical mechanics apply to the extent of a stochastic system at an instant in time, but our experiments measure the system over an extended time at a point in the system. The ergodic theorem, then, justifies our use of temporal averages to represent spatial averages (also known to physicists as phase averages).

Poincaré=s Recurrence Theorem

    Briefly, we want to prove and verify as true to mathematics the following: given X as a bounded, open region of an n-dimensional Euclidean space and given T as a measure-preserving transformation of X onto itself, we know that there exists a set S of measure zero in X such that if we have a point x not lying in S and we have U as an open set on X which contains x, then an infinite set of points x, T(x), T2(x), etc. must belong to U. In that formulation Tn(x) represents the result of applying T to x successively n times. That means that, given enough repetitions of T, x always returns to U.

    Put another way, we have a container X and a process T that rearranges the points within X in accordance with certain rules. Inside X we have soap film U and we know that if we apply the process T enough times any point originally lying on U will return to it. We actually infer that statement from the statement that we have a soap film S whose points do not return to S no matter how many times we apply T and the statement that S has zero size. Now let=s prove it.

    We begin with the stage itself, our container, what mathematicians call a finite measure space and what they symbolize as (X, Σ, μ). Our measure space thus consists of three mathematical entities:

    1. X denotes the set of points that comprise the space in question. Those points, displayed in however many dimensions we need, make up the space in which we carry out actual geometry or imagine particles moving.

    2. Σ denotes a sigma algebra, a set whose elements comprise measurable sets. We have a sigma algebra over a set X when we have a subset of the power set of X (the set of all subsets of X) for which;

        a. Σ contains the set X as an element,

        b. if a subset E of X lies in Σ, then so does its complement (X\E, the part of X that does not include E), which means that the empty set, the complement if X itself, lies in Σ, and

        c. the union of countably many sets that all lie in Σ also lies in Σ.

    3. and μ designates the measure on Σ over the set X. It takes values in the extended interval [0, 4 ] such that;

        a. the empty set has measure zero (μ([i])=0) and

        b. if E1, E2,... En represent a countable sequence of pairwise disjoint (that is, non-overlapping) sets in Σ, then the measure of the union of all of the sets Ei equals the sum of the measures of all of the sets Ei;

(Eq=n 1)

in which

(Eq=n 2)

    The ordered pair (X, Σ) constitutes a measurable (but not measured) space. The reason that we have defined it as we have done comes clear when we understand that the power set of X provides to Σ elements that conform to the following description:

    The subset E consists of an ordered sequence of singletons (points x in X) so chosen that each element lies adjacent to the next element in the sequence and the last element lies next to the first.

That set coincides with a closed curve in the topological space X. Poincaré wants us to focus our attention on a particular subset of all such curves, the one whose elements conform to an ordinary differential equation; that is, we select from all possible closed curves in X those that conform to an equation that does not admit multiple-valued solutions, which means that we have curves that do not intersect themselves or intersect other curves that conform to the equation.

    On the space (X, Σ, μ) we assert the existence of a measure-preserving transformation, T:X6 X, a transformation that takes a point in X to another point in X. If each application of T takes a certain amount of time, then sequential applications of T cause a point to trace out a trajectory. We call the transformation measure-preserving if the measure of a set in the system under consideration remains invariant under the transformation; that is, the measure of a set does not change when we apply the transformation to that set. For example, a system obeying Hamilton=s equations preserves volume in phase space in accordance with Liouville=s theorem, so we can apply Poincaré=s theorem to any system in which energy remains conserved.

    Now we state Poincaré=s recurrence theorem:

(Eq=n 3)

In plain English we have that statement as: the measure of (for every point x in the set E there exists an arbitrary number N such that the n-th transformation of x is not an element of E for all numbers n greater than N) equals zero. In other, clearer words, for any subset E of the sigma algebra Σ on the realm X we feign creating a set of all of the points in E such that Tn(x) is not an element of E and find that such a set has zero measure for any value of n greater than some arbitrarily chosen number N. That statement leads us to expect n repetitions of the transformation T applied to a point x in E to bring that point back to E. Note that in his statement Poincaré has not said that recurrence must happen, but that he has said that recurrence cannot not happen. He put the statement into that double-negative form because he used reductio ad absurdum to prove and verify it, more or less in accordance with what Sherlock Holmes said to Doctor Watson in AThe Sign of Four@; AHow often have I said to you that when you have eliminated the impossible, whatever remains, however improbable, must be the truth?@ In mathematics we can, indeed, prove and verify a negative.

    We begin the proof and verification of Poincaré=s theorem by defining a set An in accordance with the statement that

(Eq=n 4)

Thus we define the set An as the union of all of the sets that we generate by applying the inverse of T to the set E a number of times equal to and greater than n. If we conceive the application of the transformation T as equivalent to advancing the evolution of E forward in time, then An consists of the union of the past manifestations of E under the transformation. Of course we know that T0E=E, so we know that E itself is an element of A0. We also know that because

(Eq=n 5)

Ai must be a subset of Aj when j#i.

    Because we have made T measure-preserving, we know that the measure of our sets is T-invariant, so Equation 5 tells us that we must have as true to mathematics the statement that

(Eq=n 6)

for all i,j$0. To exploit that fact we take the measure of a set that we create by removing all of the elements of An from the set A0. Because An and its complement in A0 do not overlap (by definition), we can apply Equation 1 to get

(Eq=n 7)

with the zero coming to us by way of Equation 6.

    We know that E is an element of A0, so for any n>0 we know that E-An, the complement of An in E, is an element of A0-An. We thus infer that

(Eq=n 8)

We cannot infer a strict equality in this step because some of the component subsets in one or the other complement may overlap. But we do know that measure must always take a positive number, so, in light of Equation 7, we must infer that

(Eq=n 9)

for all n>0. That fact means that the complement of An in E is a null set for all of the sets An, so the union of those complements is also a null set,

(Eq=n 10)

    De Morgan=s rules of set theory tell us that the union of the complements of sets equals the complement of the intersection of the sets (see Appendix I):

(Eq=n 11)

Applying that rule to Equation 10 gives us

(Eq=n 12)

Thus all of the elements of E that do not have membership in all of the sets An comprise a null set. That statement verifies Poincaré=s theorem.

    But what does that statement have to do with recurrence? The set E comprises elements that we label x. We can apply the transformation T to those elements individually, causing them to trace out trajectories. Poincaré=s theorem tells us that the transformation will never take the trajectory outside the set E; that is, that Tn(x) can never not yield an element of E. Because multiple applications of T produce trajectories that do not intersect themselves, every trajectory must, sooner or later, return to its beginning: we guarantee that statement by noting that, at worst, eventually the convoluted trajectory will exhaust all of the elements of E and must, perforce, return to its beginning, the one point that it can touch without forming an intersection.

    Consider a simple example - a one-dimensional harmonic oscillator. Connected to a spring, a body oscillates to and fro along a straight line in location space and thus does not seem to conform to our requirement that its trajectory not cross itself. But in phase space, the union of location space and velocity space (or linear-momentum space in physics), the oscillator traces out an ellipse. If we imagine attaching more springs to the body and to other springs in the system, we can imagine the body=s trajectory in phase space becoming more curvaceous, in accordance with Fourier=s theorem, taking the body through more twists and turns, but always coming back to any arbitrarily defined beginning point. Eventually, if we add enough springs, we may put the body on a trajectory that sweeps out the entire volume of a chunk of phase space, but the body will, nonetheless according to Poincaré, return to its point of beginning.

    That theorem does not go unchallenged. Two criticisms come readily to mind, especially as the theorem applies to physics and its description of Reality. The resolution of the first will add force to the second.

    We obtain Poincaré=s theorem by analyzing, in the most abstract way, trajectories within a set, ordered arrays of mutually adjacent elements of the set organized in accordance with a simple differential equation (D(x,t)=0, in which x represents the elements of the set, t represents the ordering index, and D represents a differential operator). If we choose a volume in location space as our set, then we perforce choose the mathematical point, the most fundamental atom of geometry, as its element. And therein we may discern a problem.

    Since the time of the Ancient Greek geometers, two thousand years ago and more, mathematicians have defined the geometric point as a thing that has zero extent. If you imagine putting two such points adjacent to each other, then you find that they must necessarily coincide: if that statement did not stand true to logic, then either the points would require some extent that could non-overlap (which goes against our premise) or some other phenomenon would have to put some extent between them (and they would no longer lie adjacent to each other). Geometrically, then, an infinite set of adjacent points differs not at all from a single point (an infinite quantity of nothing still equals nothing). That array does not create a space with a unique, well-defined distance between points as a property. We need to redefine the point.

    A proper geometric point cannot have zero extent and it cannot give us something that we can divide into parts. We thus define the proper geometric/mathematical point as the infinitesimal of extent. That entity approaches zero extent, but never actually reaches it. Thus two adjacent points do not coincide and an infinite set of such points spans a finite, nonzero distance, thereby providing us with the conceptual basis for understanding space. That change gives us a necessary amendment of our concept of space, but it does not solve our problem with recurrence.

    The question we face asks how long it takes a given element of the set to traverse its trajectory under the transformation. If we define an instant as the infinitesimal of temporal elapse, then applying the transformation carries the element over a finite distance in a finite time. Because every trajectory has infinitesimal width, most of the trajectories that carry the element through a finite volume have infinite length and thus take an infinite time to traverse completely. If the maximum time to the recurrence of a trajectory goes to infinity, then the concept of recurrence dissolves into meaninglessness.

    In physics we have a straightforward restoration of meaning to recurrence. We carry out our physics in phase space, where Liouville=s theorem ensures that an element of volume remains invariant under all transformations consistent with Hamilton=s equations, thereby satisfying the criterion of measure-preserving transformation. The fundamental theorem of quantum dynamics tells us that, even though location space and momentum space separately consist of infinitesimal-based continua, their union in phase space consists of non-overlapping pixels of action the size of Planck=s constant. That statement entails that every trajectory in phase space have a minuscule, but nonetheless finitesimal, cross section and that fact, in its turn, means that even the trajectory that sweeps out the entire volume of a relevant region of phase space has finite length. Our phase element traverses any path of finite length in finite time, so quantum dynamics restores a finite maximum time of recurrence to phase space and thereby restores meaning to the concept of recurrence as presented in Poincaré=s theorem.

    That restoration exacerbates the second criticism of Poincaré=s theorem, which criticism accuses any theory that includes Poincaré= s theorem of violating the law of entropy. That accusation includes statistical thermodynamics, the theory based on applying Newtonian dynamics to vast numbers of minuscule particles.

    In the last quarter of the Nineteenth Century, when they first developed statistical dynamics, physicists faced the task of correlating the averages and variances from their calculations with the concepts of classical thermodynamics. When they came to entropy, which relates to Rudolph Clausius= statement that heat does not of itself move from a cooler to a hotter body, they found that it correlates with the number of ways to organize particles into a particular manifestation of their system: the necessary increases of a system=s entropy consequent to the removal of internal constraints from the system thus correspond to an increase in that number, which we commonly interpret as an increase in the disorder of the system=s component particles.

    Poincaré=s theorem implies that such a system may evolve toward greater disorder, but eventually it will spontaneously evolve back to a more orderly state. That implication stands as a strong criticism of a theory that otherwise matches classical thermodynamics perfectly. But that implication also stands on a tacit assumption, that all of the trajectories within the given set have the same recurrence time.

    Imagine that we have divided a box with a barrier. On one side of the barrier we put a single atom. We accept Poincaré=s theorem as it applies to that atom: after the elapse of a certain interval of time, the particle will return to its original location in the box and its original momentum. Put a second atom in with the first: it will have its own recurrence time, which may or may not equal that of the first atom, and the two atoms together will have a system recurrence time. No matter how many atoms we add to this system, it will have a recurrence time. As the atoms fly about in their part of the box they will enact every possible arrangement available to them, the number of which arrangements correlates with the system=s entropy.

    Now remove the barrier. The atoms fill the entire box and their recurrence times increase, because they now have available to them longer paths to follow. The system=s entropy also increases, as we expect. We can thus infer that recurrence time correlates with entropy. As the recurrence time in a system increases, so the system=s entropy increases. We may thus infer that Poincaré=s theorem does not conflict with the second law of thermodynamics.

    On the cosmic scale Poincaré=s theorem does not apply at all. The expansion of space negates Poincaré=s fundamental premise about the set subject to recurrence, that it does not change its size (or measure). That means that the expansion of the Universe ensures that recurrence will never happen, except in the most localized systems. Thus we cannot properly apply the recurrence theorem or any theories based upon it to the Universe as a whole.

Birkhoff=s Proof

    When we study theorems that concern the limits of probability and weighted means we encounter the ergodic theorem, which evolved out of Poincaré=s recurrence theorem. We imagine that we have, within a fixed boundary, an open region of some multi-dimensional space and we assume the existence of some process T that rearranges the points within that region in accordance with certain rules, the primary one being that the process does not change any measure that we attribute to the points in that space (such as, for example, volume). In that circumstance we know that we can find somewhere in the defined region a subset of points, which we label M, that has measure zero and has the property that a point in the region not lying in M but, rather, in the neighborhood U will under suitable application of the process T return to U with a definite positive limiting frequency. We calculate that frequency by defining a function φk(x) as a function that equals +1 when Tk(x) belongs to U and that equals 0 when Tk(x) does not belong to U, adding up all the values of φk(x) for all values of k from one to n and dividing by n, then taking the limit as we let n go to infinity. We thus obtain a number between zero and one that tells us the relative amount of time that our point lies in U.

    Birkhoff=s ergodic theorem applies to a measure-preserving transformation T(x) that operates on the points x lying in the interval between zero and one, thereby mapping that interval onto itself (that is, T(x) always has a value between zero and one). If we have a function f(x) that is Lebesgue integrable over the interval between zero and one, then the sum of the terms f(Tk(x))/(n+1) for the index k lying between zero and n taken to the limit as n goes to infinity is also Lebesgue integrable almost everywhere on the interval between zero and one. That fact tells us that the probability described above exists on the domain we have chosen.

    George Birkhoff began his proof of the ergodic theorem with a brief review of the recurrence theorem, whose proof his proof parallels. He then proved and verified a lemma, which he then used as his basis for the statement of the ergodic theorem, the statement that there necessarily exists a temporal probability that a general trajectory passes through a given subvolume within the larger volume of a closed analytic manifold. That theorem may seem intuitively obvious and we may take it as such so long as we remember that physicists use the phrase Aintuitively obvious@ to denote the possible presence of logical booby traps.

    Birkhoff asks us to imagine a closed analytic manifold of volume V. That manifold possesses an invariant volume integral and contains trajectories described by n differential equations,

(Eq=n 13)

for i=1,2,...n. As an aid to the imagination we assume that n identical particles exist in the manifold and that each follows one of the trajectories in accordance with Equation 13. For each of the n different trajectories Xi represents the time integral of the force acting upon the i-th particle due to that particle=s location and the locations of the other particles. That integral should give us a linear momentum for each particle, but because the particles all have identical properties we can simply divide out their masses and use their velocities instead.

    Next Birkhoff invites us to imagine a set of points that comprise a surface σ within that manifold (imagine a soap film stretched across the inside of a bottle), a surface that all of the trajectories in the manifold cross. Poincaré=s theorem guarantees that any trajectory that crosses that surface once will cross it again, if only because we can make the point at which the trajectory crosses the surface the beginning point of the trajectory, the point to which the trajectory absolutely must return. With ti(p) representing the time that elapses between successive crossings of the surface by the i-th trajectory, Birkhoff defines a mean time of crossing of the surface for the general trajectory as

(Eq=n 14)

That equation applies to trajectories that cross all points p in the surface, except possibly for some points comprising a subset of measure zero. Thus Birkhoff recalls the recurrence theorem in the form that he needs.

    For his own proof he begins by asserting the following lemma (a theorem proven for use in the proof of another theorem): On the surface defined above, which is invariant under the transformation T (except possibly for a subset of measure zero) we have a measurable set Sλ. If for any point p in that set we have

(Eq=n 15)


(Eq=n 16)

stands true to mathematics. (Note that sup = least upper bound and inf = greatest lower bound). The integration with respect to the points p over the set Sλ on the surface σ simply calculates the area of that set on the defined surface. We also have Sλ= , the complement of Sλ on the surface σ, such that if for any point in that complementary set we have

(Eq=n 17)


(Eq=n 18)

stands true to mathematics. Birkhoff proved and verified only the first of those cases (Equations 15 and 16), arguing that Athe proof of the second case is entirely similar.@

    In that lemma Birkhoff has argued that the average crossing time for the point p in a certain region of the defined surface approaches a number λ as a least upper bound. He then asserts the statement that consequently there exists a time t(p) such that integrating that time over the area of the region yields a number greater than or equal to the result of multiplying λ by the area of the region. In the second part of the lemma he applies that reasoning to the complement of the region and found λ as the greatest lower bound of the average crossing time for the points p in that complementary region. He then asserts the statement that consequently there exists the same time t(p) that, when integrated over the area of the complementary region, yields a number less than or equal to the result of multiplying λ by the area of the original region.

    We define distinct, non-overlapping measurable sets Uj on the set Sλ such that for any point p in Un we have

(Eq=n 19)

for some number λ. That equation says that for every ti(p)<λ we must have in the set a ti(p)>λ by a greater amount, so that λ represents a kind of average less than the true average of ti(p). We note that p does not lie in any of the other sets Uj. For every point p in Sλ Equation 19 holds true for infinitely many values of n, so that all such points belong to at least one of the sets Uj. If we integrate the crossing times of the trajectories passing through Uj with respect to the points p comprising Uj, we get

(Eq=n 20)

But the union of the sets Uj gives us a measurable part of the set Sλ and it increases toward a limit in which the union contains every point is Sλ. Taking that ultimate union as a limit gives us Equation 20 as

(Eq=n 21)

Having thus proven and verified his lemma, Birkhoff asserts that the recurrence theorem that he wants results directly from it.

    We start by saying that the measurable invariant set of points on the above defined surface for which points

(Eq=n 22)

for infinitely many values of n constitutes the set Sλ to which the lemma applies. Likewise, the measurable invariant set of points on the surface for which

(Eq=n 23)

for infinitely many values of n constitutes the set Sλ' to which the lemma applies. Though we have assumed that Sλ' is the complement of Sλ on the surface σ, it may not necessarily be so. Nonetheless, Equations 22 and 23 necessitate that both of those sets taken together exhaust the surface σ.

    Among the trajectories passing through the surface σ we know that one has a least time of crossing (λmin) and Poincaré guarantees that one has a maximum time of crossing (λmax). If we allow λ to increase, then the set Sλ defined by Equation 22 grows smaller and the set Sλ' defined by Equation 23 grows larger. As the value of λ approaches that of λmax the measure of Sλ must go to zero and Sλ' must then coincide with the entire surface σ. If the value of λ decreases and approaches the value of λmin, the measure of Sλ' must go to zero and Sλ must then coincide with the entire surface σ.

    If Sλ' is not the complement of Sλ on our surface, then Sλ and Sλ' must have a non-empty intersection Sλ* of positive measure, which intersection must also be invariant to applications of the transformation T. But in accordance with Birkhoff=s lemma we must have both

(Eq=n 24)


(Eq=n 25)

which can only stand true to mathematics if the equality holds true. That fact entails that

(Eq=n 26)

for all points in Sλ*. For different values of λ the corresponding sets Sλ* are distinct from each other (except for a set of measure zero), so there can exist on the surface σ only a numberable set of such sets, because each has positive measure. Except for those values of λi whose sets Sλi* are not empty, Sλ and Sλ' are complementary parts of the surface σ (except for a set of measure zero).

    We now select two values of λ not belonging to that numberable set, with the proviso that λ1<λ2, and consider the points in Sλ1 that do not belong to Sλ2. Those points comprise an invariant measurable set S1λ2 such that

(Eq=n 27)

Because S1λ2 is essentially identical to the part of the complement Sλ2' not lying in Sλ1, we also have

(Eq=n 28)

standing true to mathematics. We infer from those statements the fact that for all points in S1λ2, except for a set of measure zero, the sum of ti(p) divided by n oscillates between λ1 and λ2 as n tends toward infinity. We can make the difference between λ1 and λ2 arbitrarily small and thus infer that for all points on our surface σ, except for a set of measure zero, the oscillation as n goes to infinity is less than an arbitrary δ>0.

    Thus, Birkhoff tells us, the stated recurrence theorem stands true to mathematics.

    Birkhoff also notes that if the sum over the index of ti(p) denotes the time to the n-th crossing of the surface prior to a given crossing, the same result holds true to mathematics as n tends toward negative infinity with the same limit except for a set of points of measure zero. That statement follows from our ability to rewrite Equation 27 as

(Eq=n 29)

in which we replace the limit of positive infinity with negative infinity and replace the point p in S1λ2 by Tn(p). Of course, we have the same coming true of Equation 28.

    We can extend this theorem of recurrence in a straightforward way. Instead of using a single surface as a reference, we can use any measurable set σ* embedded in a numerable set of distinct ordinary surface elements. In that case t*(p) denotes the time elapsed from the departure from a point p on σ* to the trajectory= s next crossing of σ*. To prove the ergodic theorem we need a set σ* that cuts every trajectory except those corresponding to equilibrium and others of total measure zero. If we have a numerable set of distinct ordinary surface elements σ1, σ2, etc., then the set that we want is the set σk as the limit as k grows endlessly, with

(Eq=n 30)

in which σ12 denotes the set of points of σ2 not on a trajectory cutting σ1, σ123 denotes the set of points of σ3 not on a trajectory cutting σ1 or σ2, and so on.

    If v denotes any measurable volume within the volume V of our analytic manifold and if {t(p)} denotes the interval during which the point on the trajectory that comes from the point p on the set σ* lies in v before the point T(p) in σ* is reached, then we have the following: In all cases we have

(Eq=n 31)

Also {t(p)} satisfies that same functional equation as does t(p);

(Eq=n 32)

Thus we can apply the same reasoning that we used before and assert that, except for a set of points of measure zero,

(Eq=n 33)

stands true to mathematics. At the same time we have

(Eq=n 34)

In accordance with Equation 31 we thus have as true to mathematics

(Eq=n 35)

    That equation gives us readily

(Eq=n 36)

If we substitute the equivalent expressions from Equations 33 and 34 into that equation and cancel out the factor of 1/n, we get

(Eq=n 37)

The sums in that quotient represent the total time elapsed measured from the trajectory=s departure from a fixed point and the time elapsed while the trajectory is in the subvolume v. We can thus rewrite Equation 37 into the form that Birkhoff gave it, thereby getting

(Eq=n 38)

in which P represents the time probability that any moving point, except those comprising a set of measure zero, will lie in the region v. Birkhoff called that the ergodic theorem.

    Birkhoff concluded his proof by stating what he saw as an obvious fact, that τ(p) and {τ(p)} as we have defined them in Equations 33 and 34 satisfy functional relations of the following kind:

(Eq=n 39)

in which the integral on the left is a Stieltjes integral and μ(Sλ) represents the measure of the set Sλ. That equation is simply a form of Equations 16 and 18 with the terms reversed. The integral on the left gives us the average recurrence time integrated with respect to the measure of the set Sλ and the integral on the right tells us how much time the trajectory spends in the set Sλ. We translate that statement into more or less plain English by saying that for any ergodic system spatial averages equal temporal averages. For example, in a monatomic gas confined to a container the average of all the particles= velocities at a given instant equals the average of the velocity of a single particle over a long lapse of time. The importance of that equality to physics comes clear when we understand that our theories yield spatial averages while our experiments yield temporal averages: the ergodic theorem is necessary to enable proper comparisons of theory and measurement.

    We of course have a caveat we must consider. In our gas example we know that we cannot assert the ergodic equality if the pressure and temperature of the gas differ from place to place or change with the elapse of time: the equality might occur, but it would be entirely accidental and not necessary as the theorem requires. Thus, we can see that the ergodic theorem only applies to systems that have achieved thermodynamic equilibrium.

    We can extend that notion deeper and assert that the ergodic theorem only applies to systems in which the laws governing those systems do not change from place to place or from time to time. That statement brings Nöther=s theorem into play and through it necessitates that any system in which the ergodic theorem holds true must necessarily conserve, respectively, linear momentum and energy. Any system that obeys Hamilton=s equations also obeys those conservation laws, so any Hamiltonian system, represented in phase space, conforms to the ergodic theorem.

Appendix I: De Morgan=s Rules

    Set theory gives us a mathematical discipline in which we apply Boolean algebra to collections of things that we may or may not conceive as properly mathematical. For very common example, we identify all men as a subset of the set of all mortal beings, identify Sokrates as an element of that subset, and conclude that he is, therefore, an element of the set of mortal beings (All men are mortal; Sokrates is a man; therefore, Sokrates is mortal). We may also use some sets to make analogies with others, a primary example being the overlapping circles and other closed curves introduced in 1880 by John Venn (1834 Aug 04 - 1923 Apr 04).

    A set [S] consists of elements chosen in accordance with some rule or criterion. If we define a set to consist of the natural numbers, we know that the set will not contain geometric entities (such as circles, triangles, etc.) or physical objects (such as animals, stones, trees, etc.), but contains every positive integer. We can subdivide a set into subsets, [A], [B], and so on by specifying additional criteria for including some entities as members and excluding others. In the set of natural numbers we can create a subset [P] containing only prime numbers (those numbers that no one can divide evenly by any natural number but itself). The act of identifying that subset also defines its complement in the set of natural numbers, the set of the not-prime or composite numbers (those numbers that we can divide evenly by at least one natural number other than itself). We thus have the complement of [P] as

(Eq=n A-1)

the set that we create by removing all of the elements of [P] from [S]. We see immediately that

(Eq=n A-2)

the empty set.

    We also have two fundamental operations that we apply to sets B union and intersection. To form the union of two sets, [A]c[B], we put the elements of both [A] and [B] into a new set. To form the intersection of two sets, [A]1[B], we put into the new set only those elements that exist in both [A] and [B]. If we take a set comprising the square numbers lying between zero and thirty, [A]=[1, 4, 9, 16, 25], and a set comprising the multiples of five lying between thirteen and thirty-three, [B]=[15, 20, 25, 30], then we have, for example,

[A]c[B]=[1, 4, 9, 15, 16, 20, 25, 30]

(Eq=n A-3)



(Eq=n A-4)

Clearly those processes obey the commutative law;


(Eq=n A-5)



(Eq=n A-6)

    If we combine more than two sets, we find that union and intersection obey the associative law;


(Eq=n A-7)



(Eq=n A-8)

and the distributive law


(Eq=n A-9)



(Eq=n A-10)

If you take a set comprising all of the multiples of four lying between zero and twenty-two, [C]=[4, 8, 12, 16, 20] and the two sets [A] and [B] defined above, you only need to play with them for a few minutes to see how those laws work.

    We also know that set membership is transitive. If the set [X] is a subset of [Y] and if [Y] is a subset of [Z], then [X] is a subset of [Z]. That rule coincides with the classic syllogism, as we can see by using the famous example again: Sokrates is the only element comprising [X], the set [Y] consists of all men, and the set [Z] consists of all mortal beings.

    Now we come to de Morgan=s rules, which Poincaré used in his proof of his recurrence theorem. We express them concisely as

(Eq=n A-11)


(Eq=n A-12)

That is,

    1. the complement of the intersection of two sets coincides identically with the union of the sets= complements; and

    2. the complement of the union of two sets coincides identically with the intersection of the sets= complements.

In recapitulating Poincaré=s proof I used Equation 11 to transform Equation 10 into Equation 12. But Equation 11 is merely Equation A-11 with more than two sets.

    First let=s ask whether Equation A-11 makes sense. The right side instructs us to create a new set by combining all of the elements in [S] not in [A] with all of the elements of [S] not in [B]. The only elements that we do not include in that new set are those elements that lie in both [A] and [B]; that is, the elements that comprise the intersection of [A] and [B]. The set that consists of all of the elements of [S] except those elements that lie in both [A] and [B] is just the complement of that intersection, as the left side of Equation A-11 indicates. We thus prove and verify Equation A-11. And we can carry out a similar analysis to prove Equation A-12.

    We extend that analysis by considering many subsets [Ai] of the set [S] and their complements. Consider the union of those complements: we create that set by combining all of the elements of [S] not in [A1] with all of the elements of [S] not in [A2] with all of the elements of [S] not in [A3] and so on. The only elements that we do not include in that new set are those elements that lie in each and every set [Ai]; that is, the elements that comprise the intersection of [A1] with [A2] with [A3] and so on. Our new set is thus the complement of that intersection; that is,

(Eq=n A-13)

In like manner we can extend Equation A-12 to more than two subsets;

(Eq=n A-14)


Equation A-13 is what we used to convert Equation 10 into Equation 12 in our proof above.

Appendix II: The Stieltjes Integral

    Also called a Riemann-Stieltjes integral, it is a generalization of the basic integral that Bernard Riemann (1826 Sep 17 B 1866 Jul 20) defined as the limit approached by a Riemann sum as the subdivisions of the domain of summation/integration shrink to arbitrary smallness. Thomas Joannes Stieltjes (1856 Dec 29 B 1894 Dec 31) extended that concept to an integral of one function with respect to some other function.

    Let f(x) and φ(x) denote two real-valued, bounded functions (that is, functions whose values do not exceed certain fixed finite limits; essentially, functions that do not go toward infinity). Let those functions exist on a closed interval [a, b] and partition that interval according to


(Eq=n B-1)

The Stieltjes integral then corresponds to

(Eq=n B-2)

in which ξi denotes any point that lies in the subinterval [xi, xi+1]. We note the caveat that if f(x) and φ(x) each have a discontinuity at the same point x, the integral does not exist. If the integral exists and we have difficulty in solving it, we can resort to integration by parts,

(Eq=n B-3)

At the end of his proof of the ergodic theorem George Birkhoff equated a Stieltjes integral of λ with respect to the measure of the set Sλ to the integral of recurrence times over the set Sλ directly. From that integral we extract the conventional physicists= statement of the ergodic theorem: the average value of a measurable quantity over a system at any given instant equals the average value of that quantity measured of some small part of the system over a suitably long time.

(See also: Lebesgue Integration under AThe Table of Integrals@ )

Appendix III: The Poincaré Manifold

    Both Henri Poincaré and George Birkhoff, in presenting the proofs discussed in this essay, ask their readers to contemplate a manifold on which simple differential equations describe the motions of points as time elapses. I want to devote some space here to looking at such a manifold in more detail.

    As Poincaré and Birkhoff used it, the word manifold denotes a set whose elements have a relationship of adjacency to each other that enables us to differentiate functions of those elements; that is, the elements are ordered in much the same way as points on a map. Conventionally we conceive those elements as points comprising at least part of an n-dimensional topological space. That space then coincides with the manifold. We want the ability to differentiate that space so that we can use simple differential equations to describe trajectories on it.

    To create a specific example let=s begin with 3-dimensional location space, the space that we occupy. We impose upon that space our usual Cartesian coordinate grid in order that we may describe each and every point in that space with a trio of real numbers. We want to approach this topic purely as a problem in mathematics, not in physics, so we will not imagine any particles or bodies occupying this space. We will instead imagine mathematical ghosts of a relatively simple physical system (Isaac Newton=s dynamic geometry in full abstraction) B a mass on a well-anchored spring. In this case we have Hooke=s law in the form

(Eq=n C-1)

But that differential equation is second order in the derivative and both Poincaré and Birkhoff want us to use first-order differential equations to describe trajectories on the manifold. We can repair that error because we already know the solution of that equation;


(Eq=n C-2)

in which A represents the amplitude of the oscillation (the maximum value of x that the system reaches), represents the angular frequency of the oscillation, and φ represents the phase that the oscillation manifests at t=0. Thus we have the description of a point that oscillates over a distance 2A parallel to the x-axis at a rate of 2πω cycles per unit of elapsed time. To advance our cause we differentiate Equation C-2 with respect to time and get

(Eq=n C-3)

    If we add a second spring perpendicular to the first (and ignore the effect of pulling either spring sideways: we can imagine making the springs free to slide in any direction perpendicular to their respective axes), we get pairs of equations:

(Eq=n C-4)


(Eq=n C-5)

in which and . In general we have

(Eq=n C-6)

Thus we can describe the system with simple, first-order differential equations, as Poincaré and Birkhoff require, but only if we reconceive our manifold by adding velocity space to location space; that is, by reconceiving our manifold as a phase space. In that space we have the Hamiltonian function,

(Eq=n C-7)

which gives us Hamilton=s equations,

(Eq=n C-8)

The truth of those equations to our system entails the system=s conformity to Liouville=s theorem, which tells us that if a set of points defines a volume in our phase space, that volume does not change as the points follow their trajectories. That invariance of volume elements satisfies one of the Poincaré-Birkhoff criteria for the manifold.

    Let us now consider a four-dimensional phase space, a space comprising two dimensions of location space and the corresponding two dimensions of velocity space. We describe a single trajectory in that phase space with Equations C-5 and their time derivatives. Depending on the values of the phases and the angular frequencies, those equations describe a straight line, an ellipse, or, especially, a Lissajous figure in location space and similarly in velocity space. Nathaniel Bowditch (1773 Mar 26 B 1838 Mar 16) first investigated the Lissajous figures in 1815 and in 1857 Jules Antoine Lissajous (1822 Mar 04 B 1880 Jun 24) carried out a further investigation of those figures by using a thin beam of light reflected off two mirrors attached to tuning forks vibrating in mutually perpendicular planes to trace them on a blank wall. The feature upon which we want to focus our attention stands prominent in the observation that the curve expressing each Lissajous figure crosses itself at least once. At any such crossing point the derivatives (the velocities) of the line segments that cross each other differ, in direction if not in magnitude, so in phase space the curve does not actually intersect itself. We can see an analogy with the meeting of two freeways: on a map they appear to intersect each other, but an observer on the site sees one freeway passing over the other, the extra, third, dimension making the difference between crossing and intersecting. Thus we satisfy another of the Poincaré-Birkhoff criteria.

    We can apply that analysis to other curves conforming to simple differential equations. Again, if the curve crosses itself in either location space or velocity space, the derivatives of the segments that appear to cross differ from each other and thereby prevent the curve from intersecting itself. So our use of the rather small set of the relatively simple Lissajous curves as an example does not invalidate the generality of our result.

    If ω1 and ω2 have a rational number as their ratio (e.g. ω1=Nω2, in which N represents any rational number), then the Lissajous pattern repeats itself. The recurrence time for that repetition equals the minimum time required for both sinusoidal functions to pass through an integer number of complete individual cycles (that is, the time required for the functions to sweep through the angles 2πM1 and 2πM2, in which M represents an integer). We can calculate the recurrence time (t=) in much the way in which we calculate the conjunction times of planets: we take the difference between the planets= angular speeds, the relative rate at which one planet overtakes the other (as seen from that planet), and calculate how long it takes to bring the faster planet through 2π radians relative to the slower planet; that is,

(Eq=n C-9)

which gives us, in this case,

(Eq=n C-10)

As the value of N approaches unity the recurrence time gets longer, which means that the trajectory becomes more complex and touches more points in the manifold.

    If the angular frequencies have an irrational number as their ratio, the trajectory never comes back to its beginning point (more precisely, the recurrence time goes to infinity). The trajectory will eventually touch each and every point in the manifold, but it will take an infinite elapse of time to do so. Thus we cannot properly speak of recurrence in this case: a process that cannot occur once certainly cannot recur. As we have seen, this fact does not necessarily invalidate Poincaré= s theorem.

    If we attach more springs to the free ends of our primary springs such that the stiffness coefficient of the i-th spring has an integer-square ratio with the stiffness coefficient of the primary spring (that is, k2=4k1, k3=9k1, k4=16k1, etc.), then those springs will vibrate at frequencies that are integer multiples of the primary spring=s frequency (that is, ω2=2ω1, ω3=3ω1, ω4=4ω1, etc.). In that statement we have tacitly assumed that the accelerations imposed upon one spring by the other spring do not affect that spring= s frequency of oscillation. We can actually achieve that independence of frequencies by so mounting the springs independently that they move rheostats as they vibrate and then using the combined electrical outputs from the rheostats to move a stylus over a plane representing the location-space part of our manifold.

    In the case thus described the free end of the last spring will trace out a path that we describe, in accordance with Fourier=s theorem, with a non-trigonometric function of time. Because all of the added springs have frequencies that are integer multiples of the primary spring= s frequency, then after the elapse of one period of the primary spring the system will return to its initial state and the overall cycle then repeats. If the springs comprise an infinite set, then the free end of the last spring may touch all of the points in the manifold in one period of the primary spring=s oscillation (bearing in mind, of course, that in an infinite set there is actually no last spring, so we must conceive that spring as a kind of limit).

    So now we have the function x=f(t), which gives us our T:X6 X, the transformation that carries points in the manifold into other points in the manifold. However, we must notice that while Poincaré and Birkhoff asserted their transformation T as operating discretely, the function f(t) gives us a continuous transformation of our point, so we need to take into account the limiting nature of the transition of n going to infinity. If our point lies within the manifold and conforms to the description of f(t), then it will not go outside a region that in any given direction spans a distance equal to twice the sum of the amplitude coefficients of the sines and/or cosines comprising the parts of the function relevant to that dimension. We assume that the transformation T does not coincide with the period of any of the primary springs in our system or with the recurrence time that we calculate through Equation C-10 or its more-dimensional analogues, so we can state that for a given point x in the manifold Tx x.

    Select a set of points Pa within the manifold. Some of those points occupy and mark the vertices of an N-dimensional right prism that contains all of the points in the set. As the system evolves, as we apply T repeatedly to the points, the volume enclosed in that prism remains unchanged, in accordance with Liouville=s theorem. We can thus imagine our set of points as acting very much like a school of fish on a reef: they change relative position within the school, the school elongates, shrinks, and distorts in various ways, but the fish nonetheless remain together as a group. After each application of T the points occupy new locations on the manifold and those locations comprise the set TmE that we used in proving Poincaré=s theorem.

    We defined An as the union of all of the sets T-kE for all values of k lying on and between n and infinity. Thus An constitutes the set of all of the locations that our initial set of points E have occupied between negative eternity and n applications of T ago. That set comprises, of course, in accordance with Poincaré=s theorem, all of the points that correspond to the energy encoded in the Hamiltonian function, the points that the system can reach without violating the conservation law pertaining to energy. In this case the energy acts as a container, defining the boundary of our set.


Back to Contents