Boltzmann’s H-Theorem

Part II

In Part I we considered a system of N weakly interacting
particles, in which analysis N represents a number very much smaller than
Avogadro’s number (6.022 x 10^{23}, the number of Carbon-12 atoms in 12
grams of pure carbon) so that we can use discrete mathematics of the kind we use
to analyze the Drunkard’s Walk. We distributed the particles among N boxes (such
as cells in phase space) in such a way the n_{i} particles went into the
i-th state, which is marked by each particle carrying an energy E_{i}.
Using a straightforward analysis, we then determined the number W of microstates
that comprise any given macrostate of the system conforms to the equation

(Eq’n 1)

We then determined the natural logarithm of W as

(Eq’n 2)

which we used to define a form of Boltzmann’s H-function. Because a system left to itself will evolve toward a state with the maximum possible value of W, we have as true to physics the fact that

(Eq’n 3)

which is a simple form of Boltzmann’s H-theorem, which Boltzmann claimed is equivalent to the second law of thermodynamics, the law of entropy.

The analysis in Part I bears only a slight resemblance to the conventional H-theorem, but we can take it as preparation to work on the real thing. Now we want to let N take on values so very much greater than Avogadro’s number that our system differs negligibly from a continuum. Thus we replace the discrete sums of Part I with continuous sums; that is, integrations.

In the discrete (small N) example we can calculate the difference between probabilities associated with adjacent energy states in accordance with

(Eq’n 4)

but we have no motive for wanting to do so. In the near-continuous (large N) example the differences between adjacent energy states become so small that those states blend into an array little different from a continuum. We must thus replace Equation 4 with an equivalent differential probability,

(Eq’n 5)

in which d^{3}**x**=dxdydz and d^{3}**v**=dudvdw=(d^{2}x/dt)(d^{2}y/dt)(d^{2}z/dt).
In that expression f(**x**,**v**) represents a probability density for a
particle to occupy a certain minuscule element of volume in a semi-abstract
six-dimensional location-velocity space, the element being centered on the point
[x,y,z;u,v,w]. Unlike the case with Equation 4, we have a motive for using
Equation 5: it enables us to calculate the density of particles occupying a
particular substate,

(Eq’n 6)

The number of ways to organize particles in a near continuum reflects an infinite set, so now we must use Boltzmann’s transport equation to carry us further in this analysis. When the system exists in its equilibrium state the distribution of particles within it no longer changes with the elapse of time; that is, the total derivative

(Eq’n 7)

in which **v**=[u, v, w], the acceleration **a**=[du/dt, dv/dt, dw/dt],
and the velocity derivative L_{v}=[∂/∂u,
∂/∂v,
∂/∂w].
Because the velocity and the acceleration do not represent differentiable
fields, we can rewrite that equation of continuity as

(Eq’n 8)

Having tacitly assumed that N represents a constant, we can also write that equation as

(Eq’n 9)

by dividing N out of Equation 8.

If the system does not exist in its equilibrium state, then the time derivative of the particle density does not equal zero. Instead, the derivative expresses the fact that more particles enter a given substate than leave it, or vice versa, as the system evolves toward equilibrium. That evolution proceeds through collisions, two particles coming to the same point in space at the same time and bouncing off each other. To describe how collisions make the system evolve, Boltzmann included in his version of Equation 9 a collision term,

(Eq’n 10)

In that equation:

1. **v**_{1} represents the initial velocity of
a particle that strikes a particle moving at velocity **v**, thereby giving
the two particles altered velocities **v**_{1}' and **v**’.

2. The upper-case omega represents the solid angle surrounding the collision zone. In a simple monatomic gas we expect all of the properties of the aggregate to conform to full isotropy, so we can replace the integration over solid angle with a simple multiplication by 4π steradians.

3. The lower-case gee represents the magnitude of **v**_{1}-**v**,
the relative speed at which a given collision takes place.

4. I(g,θ)
represents the cross section of the collision. Naively we calculate that cross
section as the area of a circle whose radius equals twice the radius of one of
the particles; thus, if one particle’s center of mass follows a trajectory that
passes through the disc enclosed by that circle, the particles will collide. But
Reality is not quite that simple. The cross section must also account for the
angle (θ)
through which the **v**_{1} particle gets deflected by the collision
(which automatically determines the angle through which the **v** particle
gets deflected, in accordance with the law of conservation of linear momentum).
We generally assume that the particles act like hard spheres, which means that
the angle of deflection depends upon how far the trajectory of one particle’s
center of mass passes from the other particle’s center of mass. If we represent
that impact parameter with r and the effective radius of one of the particles
with R, then we expect a collision to result in a deflection angle of
θ=2arcCos(r/2R).
Of course, the particles actually interact through forcefields, so they are
squishier than that implicit billiard-ball model implies. That fact means that
the greater the speed of the collision, the smaller the particles’ effective
radii become, so the cross section must also be a function of that relative
speed.

5. Once we have established the criteria by which
collisions occur, we need to calculate the number of opportunities in a unit
volume for those collisions to occur. To get a measure of those opportunities we
take the number of particles with velocity **v** in the unit volume (that is,
the density of the particles) jointly with the number of particles with velocity
**v**_{1} in that unit volume, expressed in the product f_{1}f,
as indicating the rate at which particles are knocked out of the **v**-state.
The gas also contains particles with velocities **v**_{1}' and **v**’,
whose collisions knock one of the particles into the **v**-state; we can
understand these as the reverse of collisions that knock particles out of the **
v**-state. A proper description of those particles gives us the product f_{1}'f’,
which indicates the rate at which particles get knocked into the **v**-state.
The difference between that rate and the previous one gives us the net rate at
which the density of particles increases or decreases in the **v**-state.
When that rate equals zero, the system has reached equilibrium.

Once we put all of that together and carry out the
integrations, we get a description of how the system evolves through a
description of how the particle densities in the system’s available states
change with the elapse of time. That the system __does__ evolve leads to a
problem called Loschmidt’s paradox.

In 1876, four years after Ludwig Boltzmann introduced his H-theorem, Johann Josef Loschmidt (1821 Mar 15 - 1895 Jul 08) offered a critique of that theorem. Loschmidt saw in the H-theorem a logical contradiction that we now call Loschmidt’s paradox or the reversibility paradox. In Loschmidt’s analysis Boltzmann had deduced a description of an irreversible process from reversible principles of physics. Loschmidt assumed, quite reasonably, that the time-symmetric laws of basic physics cannot yield irreversible phenomena. With regard to collisions, for example, we can imaging filming a collision between two billiard balls: if only the collision is visible in the film, then a person who had not seen the original collision would be unable to tell whether we were running the film forward or backward in the projector. Loschmidt put that fact into a grander perspective.

He suggested that we imagine contemplating the particles in a given parcel of gas as they evolve from a low-entropy state to a high-entropy state. Imagine that at some given instant we reverse the velocities of all the particles. We thus create a new, but perfectly legitimate, state of the gas: the particles could actually have the positions and velocities that we have specified in that reversed gas. But then the particles will go through their motions and collisions in a perfect reversal of the gas’s original evolution and will, thus, go from a high-entropy state to a low-entropy state, which is impossible according to the second law of thermodynamics.

Thus the theoretical irreversibility of statistical thermodynamics is actually an illusion, albeit a very good one, as the following imaginary experiment demonstrates. Imagine a board that we have subdivided with crisscrossing slats into N cells. Into each cell we place a die in such a way that all N dice display the same number X. Place a transparent sheet of plastic over the board so that the dice cannot come out of their cells and then give the board a vigorous shake. When the dice stop moving one sixth of them, give or take some small fraction, will display X and the other five-sixths will display the numbers on the dice’s other faces. Subsequent shakes of the board will not change that fact: the specific dice that display X will change and the give-or-take fraction will change by some minuscule amount, but after each shake a little fewer than 17 percent of the dice will display X. We appear to have a perfectly irreversible process.

Consider, though, the case of N=2. In that case we know that both dice display X with sufficient frequency that casinos use the game of Craps to remove astounding amounts of money from large numbers of people (yeah, there’s an irreversible process for you). On average, once every thirty-six shakes of the board we expect to see the dice returned to their original configuration. If we use more dice, then the average number of shakes required to bring the board back to its initial state as described above equals the N-th power of six. If we shake the board once every second, then the average restoration time just exceeds the current age of the Universe (13.7 billion years) if we have 870 dice. Thermodynamic systems contain vastly more particles than that and each particle has available to it far more than six states, so the restoration time for any state of such a system, even given that the system gets disturbed much more frequently than once per second, extends so far beyond any conceivable time frame that we have an effectively irreversible process when we allow that system to evolve out of some simple state.

That last statement indicates how Boltzmann introduced irreversibility, such as it is, into statistical physics. He assumed that his imaginary system begins in a macrostate that consists of relatively few microstates. Because all of the microstates of an ideal gas have the same probability of manifestation, the system would appear to evolve from the simple macrostate to a more chaotic macrostate because the more chaotic macrostate and others closely resembling it contain the vast majority of microstates. A gas with a large enough number of particles would act much like a continuum, manifesting the irreversibility of classical thermodynamics from perfectly reversible dynamics.

Of course, the laws of physics are not perfectly reversible, though Loschmidt could not have known. In order to have perfect reversibility we must have infinite precision and the quantum theory tells us that effectively such precision cannot exist. Even if we succeed in reversing the particles’ velocities with infinite precision, we cannot do so with their collisions: the quantum fuzziness that we see reflected in Heisenberg’s indeterminacy principle guarantees that none of the collisions will occur as a perfect reversal of its first occurrence. Thus the gas will not return to its original low-entropy state. But could such a thing happen anyway, not as Loschmidt suggested, but by some other spontaneous arrangement of particle velocities?

Suppose, for example, that in some small region of the gas particles begin coming together at a higher than normal rate. The density of the gas in that region begins to increase and, thus, the rate of collisions between particles occupying that region begins to increase in square proportion to the increase in density. As a consequence the rate at which particles get knocked out of the region increases while the rate at which particles enter the region does not change. The density declines back toward the average density of the gas, bringing the gas back to maximum entropy with the density change in the small region representing a minuscule fluctuation in the gas’s overall entropy.

In that imaginary experiment I have tacitly assumed, as is
the standard practice in considering the statistical thermodynamics of gases,
that the total volume occupied by the particles themselves is very much less
than the volume of space that the particles occupy (think of tennis balls with
kilometers of distance between them). As more particles enter the small region
of location-velocity space of that experiment the probability density of the
occupation of that region increases as more boxes of our abstract space get
occupied; that is, the probability of finding a particle in any given unit
volume of the space increases. At the same time the probability density of
particles being ejected from the region, h(**x**,**v**), also increases.
The differential change in h(**x**,**v**) equals the relative change in
the occupation probability density, -df(**x**,**v**)/f(**x**,**v**),
because of the fundamental nature of the probabilities.

Imagine a very large chamber containing a large number of ping-pong balls, one of which has a black dot painted on it. Screens constitute the upper and lower sides of the chamber, their mesh allowing air to flow through the chamber while preventing the balls from leaving the chamber. Air blown vertically through the chamber levitates the balls and makes them move much as particles in a gas do.

We can calculate the probability p that one of the balls occupies a small volume around a given point at some randomly chosen instant, getting a simple sum of the probabilities associated with each of the balls by itself. We can also calculate the probability q that we can, at some random instant, reach into the chamber and grab the black-dotted ball (call that a transition probability). We can interrelate those two probabilities through their differential increments and/or decrements.

If we add more unmarked balls to the chamber, we increase the probability p by adding new terms to the original sum. We increase the probability of finding a ball in the designated volume, but we also decrease the probability q of grabbing the dot-marked ball. Thus, when we use a positive number for dp we must use a negative number for dq. So for N balls in the chamber the change dp also changes the probability that any given ball is the dot-marked ball by dq=-dp/N. Because p is a linear function of N, we can rewrite that expression as dq=-Adp/p, in which the coefficient A represents the conversion of a reciprocal probability into a reciprocal unit-less number.

In terms of probability densities, then, we have the indefinite integral

(Eq’n 11)

Now we want to calculate the average value of the transition probability density over the whole abstract space occupied by the system, so we have

(Eq’n 12)

This is the H in Boltzmann’s H-theorem and now we want to calculate its time derivative. Because the time derivative and the integration commute with each other, we have

(Eq’n 13)

In the next step we replace df/dt with the bilinear collision operator from Equation 10 and get

(Eq’n 14)

in which we have the collision kernel

(Eq’n 15)

To simplify the matter slightly, we know that (lnf+1)=(lnf+lne)=ln(ef), in which e represents the base of the Naperian (or natural) logarithms. We also know that in an isotropic gas f and ln(ef) are not functions of omega. We now have four integrals that we must consider for inclusion in Equation 13:

(Eq’ns 16)

Because the natural logarithm does not explicitly contain any of the specific variables over which we integrate the bilinear collision operator, we can draw it under the double integral, thereby transforming the single integral that involves a double integral into a triple integral. And because each of Expressions 16 describes one aspect of the particle collisions, we must combine all of them and divide the result by four before substituting the result into Equation 13. Noting that we must give the first two of Expressions 16 negative signs (because they describe particles entering the collision region rather than being ejected from it), we thus get the dissipation functional,

(Eq’n 17)

In devising that equation I made us of the fact that

(Eq’n 18)

in which I was able to cancel the factors of e in the final step.

Substituting Equation 17 into Equation 13 then gives us

(Eq’n 19)

We know that (f_{1}'f’-f_{1}f) and the natural logarithm will
always have the same algebraic sign, so we know that D(f) will always be a
positive number. Thus

(Eq’n 20)

always. Now let’s carry our analysis forward and try to reach Boltzmann’s actual goal.

When the system achieves equilibrium, dH/dt=0; that is,
the value of H can’t change, otherwise the system would not be in equilibrium.
In that state we must have ln(f_{1}’f')=ln(f_{1}f). We can
rewrite that latter statement as

(Eq’n 21)

That equation looks very much like the statement of a conservation law: the amount of some quality coming out of an interaction equals the amount of that quality going into the interaction. In this case the natural logarithm of the probability density of a particle occupying any given point in phase space is the conserved quantity.

The only other conservation law that’s relevant here is conservation of energy, which we write as

(Eq’n 22)

If every particle ponders the same mass m, then

(Eq’n 23)

We can add any multiple of Equation 22 to Equation 21 and get

(Eq’n 24)

In a monatomic gas of N particles at an absolute temperature T, the equation of state (pV=NkT) tells us that the average energy of a particle equals kT. The state f(kT) also makes the greatest contribution, per constant increment of energy, to the total energy in the gas: there are more particles in the lower-energy states but they carry less energy and there are particles that carry more energy but there are fewer of them. It’s natural to use f(kT) as a base state.

Because Equation 21 gives us the equivalent of a conservation law, we can treat ln(f) as we do energy; specifically, we can define a ground state, so we set

(Eq’n 25)

In order to maintain consistency with Equation 24, we can describe any other state from the perspective of the state ln(f[kT])+bkT=K(T)+bkT by writing

(Eq’n 26)

in which I have represented the relative constant K(T) as the logarithm of another constant, C. That equation gives us

(Eq’n 27)

We determine the value of the constant C by normalizing the probability density; that is, by integrating the function over all of phase space and requiring that

(Eq’n 28)

When we carry out that calculation and make the appropriate substitution into Equation 27 we get

(Eq’n 29)

which describes the Maxwell distribution of velocities (or kinetic energies) in the gas. The dependence of the exponential function upon the square of velocity tells us that we could have obtained the same result by analyzing the Drunkard’s Walk in velocity space.

Now go back to Equation 11 and substitute into it from Equation 27, multiplying by N to account for all of the particles in the gas. We then get

(Eq’n 30)

That integral calculates the total energy (which exists as heat) in the system divided by kT, so we have

(Eq’n 31)

If we take the time derivative of that equation and multiply the result by minus one, we get

(Eq’n 32)

That’s just the law of entropy, the second law of thermodynamics, which is what Boltzmann wanted to demonstrate.

Appendix: A Review of Probability

If we have a physical system that has a number of possible manifestations available to it, probability gives us a measure of the likelihood of the system residing in one particular manifestation. We take the six-sided die, with spots on its faces representing the numbers one through six, as the classic example of such a system. The ideal die is a perfect and perfectly balanced cube; that is, it is perfectly symmetrical and it has a uniform composition throughout its volume. With no basis to infer otherwise, we assume that each of the sides has the same probability of coming out on top when the rolled die comes to rest, the same probability as does each of the other sides. We can determine probability by rolling the die N times and counting how many of those rolls leave one particular face on top. If we roll the die 600 times, we expect that the face showing the number X will come out on top 100 times give or take some small number. We assume that the give or take does not grow as rapidly as does N, so we can define the probability of a disturbed system coming to rest in its i-th state as

(Eq’n A-1)

in which n_{i} represents the number of times that the system comes
into the i-th state in the course of N disturbances that allow the system to
change its state (rolls of the die in this case). Alternatively, we can disturb
N identical systems and count the number of systems that manifest the i-th
state. As we infer from symmetry considerations, so we discover through
experiment: for a die each side has a probability of 1/6 of coming out on top.

In accordance with the ergodic theorem I have tacitly assumed in the above statements that ensemble averages equal time averages. If we roll one billion dice one time and also roll one die one billion times, we will not be able to distinguish the distributions of the readings of the dice one from the other. The averages calculated from those distributions, then, are certainly indistinguishable one from the other.

The definition of Equation A-1 makes probability a
fraction between zero and one and that fact makes good sense. If we have a
situation for which p_{i}=0, then we know that the i-th state of the
system effectively does not exist, because the system will never occupy that
state. If we have a situation for which p_{i}=1, then we know that the
system will occupy the i-th state with absolute certainty and that any other
states effectively do not exist. Both of those facts render negative
probabilities and probabilities greater than one as meaningless as Equation A-1
implies.

If we label two faces of a die with A and B, we know that when we roll the die it will not come to rest with both A and B on top. A coming out on top and B coming out on top occur as two mutually exclusive, mutually independent events: any given roll of the die can manifest one outcome or the other, but not both. We infer, then, that if we know the probabilities of our mutually exclusive events, p(A) and p(B), then we have

(Eq’n A-2)

For a single die we have p(A)=1/6 and p(B)=1/6, so we calculate p(A or B)=1/3, which means that on one third of our throws of the die, the die will come to rest with either A or B showing on top.

If we use two dice, we get the same results that we would
get from throwing one die twice. But we also get something else, something
that’s impossible with a single die. On one third of our throws each die will
show either A or B. On one sixth of our throws one of the dice will show A and
on one sixth of those throws the other die will show B; that is, on one
thirty-sixth of our throws the dice will show A __and__ B. So we infer the
rule

(Eq’n A-3)

Thus we have the basic arithmetic of probability.

But those equations only tell us about occupation probabilities, the probabilities of the components of a complex system occupying certain states available to them. Now we want to consider transition probabilities, the probabilities of the system going from manifesting one state to manifesting another.

Imagine that we have a large, thin board that we have divided into cells by crisscrossing slats. Into each cell we put a single die and then we cover the cells with a rigid, transparent sheet that will keep each die in its cell and yet allow us to see the spots showing on the die’s top face. Under the board we have a solenoid that drives a small hammer against the bottom of the board, each strike making a single die jump and roll in its cell. The solenoid moves its hammer once every second as it enacts the Drunkard’s Walk in two dimensions, so it taps dice at random. We use this apparatus to explore transition probabilities.

On the board we have N dice with n_{x} of them
displaying the number X, rendered as the number of spots showing on the top
face. The probability that the hammer will strike any one of those dice in any
given strike thus equals

(Eq’n A-4)

in accordance with Equation A-1. But if one of the dice displaying X gets
tapped, that transition probability will change, due to the fact that the tap
decrements n_{x} by one. We might think that we could calculate that
change in the transition probability by simply prefixing deltas to p_{x}
and n_{x} in Equation A-4: in this example we will have n_{x}=-1.
The fraction 1/N represents the probability that the hammer will tap any die in
the array, not only one displaying X, so
Δn_{x}/N
does not give us the transition probability for the dice displaying X; it gives
us the transition probability for any die in the array. To get the transition
probability for the dice showing X we need to multiply that raw transition
probability by the probability that the die struck is one showing X. Thus we can
calculate the average rate per hammer strike at which the transition probability
of a die getting knocked out of the state X changes,

(Eq’n A-5)

If N represents an extremely large number, we can replace the delta
differences with small-dee differentials, then we can divide the equation by p_{x},
carry out the implicit integration, and get

(Eq’n A-6)

That equation looks like a blatant contradiction of Equation A-4, so we need to remember that Equation A-4 gives us the probability that the hammer will strike a die in the state X and Equation A-6 gives us the average rate at which that probability changes with each hammer strike. In fact I have mis-written Equation A-6: more properly we have

(Eq’n A-7)

in which lnp_{xb} and lnp_{xa} represent, respectively, the
rate of change in the transition probability between step b and step a in the
evolution of our diceboard and n_{xb} and n_{xa} represent,
respectively, the number of dice in state X at step b and at step a. Equation
A-6 appears to give us a nonsensical result until we complete the evaluation of
the integral.

We can also sum Equation A-6 over all six of the X states available to the dice and obtain

(Eq’n A-8)

which tells us that a decrease in the transition probability out of one state must accompany an equal increase in the transition probability out of another state of the system. Next we multiply Equation A-6 by Equation A-4 and sum the result to get

(Eq’n A-9)

We know that if the system has some property R_{x} for each of its
states, then

(Eq’n A-10)

gives us the average value (or expectation value) of that property over the
whole system, so the H-function of Equation A-9 represents the average value of
lnp_{x} over the system, the average of the raw rate of transition of
dice out of the states they occupy.

We know that if we subject our system of dice to random
perturbations, it will evolve toward a macrostate in which about one sixth of
the dice occupy any given state of display. Referring back to Equation A-4, we
see that the statement, n_{x}=N/W, stands true to mathematics and
physics because the number of different ways W that the dice have of displaying
their spots have equal probability, p_{x}=1/W. We thus identify that
macrostate as the one having maximum probability of being manifested. Now
suppose that we contrive to increase the probability of finding a die in the
state X,

(Eq’n A-11)

The physical system must conform to the fact that the statement

(Eq’n A-12)

always remains true to mathematics, so for some other state (or states) we must have in consequence

(Eq’n A-13)

If we substitute Equations A-11 and A-13 into Equation A-9 by way of n_{x}=Np_{x},
we get

(Eq’n A-14)

which means that any change from the system’s state of maximum probability increases the value of its H-function. That fact means that the system’s state of maximum probability corresponds to the system’s H-function having its minimum value. Thus, for the system in any state we have

(Eq’n A-15)

which expresses Boltzmann’s H-theorem as it applies to our array of dice and to any system analogous to it.

eabf