A Little Tensor Geometry

Enough, I hope, to give the reader a feel for what these weird mathematical constructs mean and what they do for us.

Surveying Space and Time

We use rulers and clocks to measure space and time, distance and duration. But measurement by itself only gives us results analogous to an indeterminate metes and bounds description of a piece of land: it sits in isolation, referred only to on-site landmarks. In order to discern more general patterns (i.e. the laws of physics on a proper cosmic scale) we need to refer our measurements to a system of reference that, in concept at least, spans the cosmos and its history.

To fill that need we must devise a means of describing the warp and weft of a four-dimensional tapestry, one on which we can embroider our measurements. That means consists of conceiving an infinite set of imaginary lines occupying and marking the continuum of space and time. We construct the set from four subsets, each subset consisting of lines that do not cross each other but which cross lines in all three of the other subsets. On each line of a subset we use real numbers to denumerate the locations of points on the line relative to an arbitrarily chosen zero point and we call those numbers the coordinates of the subset: we call the difference between two coordinates on any given line the distance between the points bearing those coordinates (calling the difference duration if the line represents the elapse of time). We call the entire set a coordinate grid and, in concept, we can lay it out by surveying it with our rulers and clocks (each clock tracing a line through time).

Within the coordinate grid we identify geometric points
(having only spatial coordinates) and instantaneous events (the spatio-temporal
analogue of geometric points) and we designate them with three- or
four-component vectors, **q**=(q_{1}, q_{2}, q_{3}, q_{4}),
in which q_{1}, q_{2}, and q_{3} represent spatial
coordinates and q_{4} represents a temporal coordinate (some physicists
and mathematicians use 0,1,2,3 as coordinate indices with q_{0}
representing the temporal coordinate. I prefer not to use that convention,
having been brought up to think of time as the fourth, not the zeroth,
dimension).

In order to make our grid useful, to enable ourselves to
embroider our measurements onto it, we must give it a basis, a set of linearly
independent vectors of which we can represent any other vector on the grid as a
finite linear combination. Ideally we want to use a finite-dimensional
orthonormal basis; specifically, we want to use a finite set of mutually
orthogonal unit vectors as our basis. The space must also support an inner
product, the analogue of the vector dot product in three-space (which we
calculate as the product of the magnitudes of the two vectors and the cosine of
the angle between them). In Euclidean four-space, then, we have the unit vectors
as ê_{1}=(1, 0, 0, 0), ê_{2}=(0, 1, 0, 0), ê_{3}=(0, 0,
1, 0), and ê_{4}=(0, 0, 0, 1). For those vectors we have the inner
product as

(Eq’n 1)

in which δ_{ik}
represents the Kronecker delta and the minus sign applies only when i=k=4. Those
unit vectors have magnitudes equal to one. Now we know that we can represent any
vector in Euclidean four-space (Hermann Minkowski’s spacetime) as a linear sum,

(Eq’n 2)

The process of extracting components from a vector at a
given point (or event) consists, in mathematical concept, of projecting the
vector onto the coordinate lines passing through the point and taking the
projections as the components. The inner product gives us the algebraic analogue
of that geometric process; in essence, we project the vector onto the basis
vectors. Because the basis vectors **b**_{i} differ from the unit
vectors only in not having a magnitude necessarily equal to one, we must
normalize the inner product and the associated basis vector in order to obtain a
correct description of the vector,

(Eq’n 3)

If, instead of directly measuring the vector itself, we measure its components, we can calculate the magnitude of the vector from the description in Equation 2. The first step in that calculation gives us the inner product of the vector with itself, the norm,

(Eq’n 4)

In going to the last step in that equation I made use of Equation 1. Further analysis of Equation 4 gives us the metric tensor.

The Metric Tensor

We define the metric tensor through an invariant quantity derived from a
generalized version of the Pythagorean theorem, which we see expressed in
Equation 4. Assume that we have two events arbitrarily close together. We
measure a differential vector d**s**= dq_{i}ê_{i} extending
straight from one event to the other with components dq_{i}=<d**s**,ê_{i}>.
In that case we get Equation 4 as

(Eq’n 5)

In the last step of that equation I have tacitly replaced the summation sigma with the Einstein convention of automatically summing over repeated indices. We now have the metric tensor defined as

(Eq’n 6)

Hermann Minkowski’s flat spacetime provides us with the simplest example of a metric tensor. For that four-dimensional analogue of Euclidean three-space we have

(Eq’n 7)

That, in turn, gives us Equation 5 as

(Eq’n 8)

In that formulation I have used the common Cartesian grid of rectangular coordinates, but I also have other options available to me.

If we choose, for example, to make our measurements in spherical coordinates, with θ representing the co-latitude and ϕ representing the longitude, we have the associated metric tensor as

(Eq’n 9)

so that we have Equation 5 as

(Eq’n 10)

If we make a slight modification in that metric tensor, we get

(Eq’n 11)

which represents the Schwarzschild solution of Einstein’s field equation in the case of a region of space and time that has a uniform, simple, spherical body of mass M whose center occupies the origin of the coordinate grid.

In Equations 9 and 11 I seem to have violated the definition of Equation 6. However, a comparison of Equations 9 and 10 shows us that I’ve merely shifted the coefficients of the squared coordinates in the metric equation into the metric tensor. Making that shift alters the metric tensor, not so much in Equation 9 but certainly in Equation 11 and others like it, in a way that automatically encodes Einstein’s equivalence principle into the metric tensor: to the extent that a metric tensor differs from the Minkowski tensor it represents a gravitational field. That fact, that space and time can be warped out of true, will necessitate the existence of other tensors and related entities.

We can see clearly now that the metric tensor, by taking the coefficients of the coordinates, tells us the shape of space and time. Also note that Equation 5 gives us a generalized version of the Pythagorean theorem. The version that I have written contains a subtle error: in order to multiply tensors together we must have one as covariant and the other as contravariant. In Equation 5 I have written all of the tensors in the product as covariant. Mathematical propriety requires that we rewrite Equation 5 as

(Eq’n 12)

In that equation the subscripts mark the covariant tensor and the superscripts mark the contravariant tensors.

The Christoffel Symbols

We have all heard that a straight line gives us the shortest distance between two fixed points. But if warped space lies between the points A and B, then the shortest distance between those points will come manifest as a curve. We call that curve a geodesic and describe it through the statement

(Eq’n 13)

That statement looks like the principle of least action, so we can apply the calculus of variations to work out an explicit description of the geodesic.

Define a function v^{i}=dq^{i}/dτ
in which τ
represents a parameter, usually taken as representing elapsed time. We can then
rewrite Equation 5 as

(Eq’n 14)

so we can rewrite Equation 13 as

(Eq’n 15)

As in the case of Lagrangian dynamics, we have Euler-Lagrange equations

(Eq’n 16)

In order to evaluate the derivatives directly we exploit the facts that

(Eq’n 17)

(in which n=i or k) and that

(Eq’n 18)

We thus get our Euler-Lagrange equation as

(Eq’n 19)

We can rewrite that equation into a simpler form if we multiply it by the
square root and exploit the facts that dτ=dq^{m}/v^{m}
and that the metric tensor is not a function of the velocities, which makes
∂g_{ik}/∂v^{m}=0.
We then have the rewritten equation as

(Eq’n 20)

In going from the first line to the second line in that
equation I have exploited the fact that g_{ik}=g_{ki}, that the
metric tensor appears as a symmetric matrix, to combine the third and fourth
terms on the first line into the single first term on the second line. For the
first two terms on the first line we have

(Eq’n 21)

In going from the first line of that equation to the second line I exploited the fact that the indices merely represent dummy numbers and, thus, can be interchanged so long as we do so consistently.

Now Equation 20 gives us the mathematical description of a geodesic curve, the shortest distance between two points in distorted space. Writing it more simply, we have

(Eq’n 22)

in which

(Eq’n 23)

the Christoffel symbol of the first kind. (And though it’s tempting to pronounce it Christ-awful, it’s actually pronounced Krist-oh-FELL, named for Elwin Bruno Christoffel (1829 Nov 10 – 1900 Mar 15), the German mathematician who discovered them.)

If we multiply Equation 22 by g^{mp}, we get

(Eq’n 24)

in which

(Eq’n 25)

represents the Christoffel symbol of the second kind. In going from Equation
22 to Equation 24 I exploited the fact that g_{mk}g^{mp}=δ_{k}^{p},
the Kronecker delta, which, because it only has a non-zero value when k=p,
converts v^{k} into v^{p}.

Equation 24 shows us that the Christoffel symbol relates
to the acceleration inherent in the metric g_{ik} and General
Relativity, via the equivalence principle, relates that acceleration to
gravitation. If we identify the metric tensor with the gravitational potential,
albeit in somewhat spread out form, then the Christoffel symbols correspond to
the gravitational forcefield.

Thus we have raw mathematical manipulation. But can we find a more intuitive way to understand the Christoffel symbols?

Start by noting that the scalar product of two purely
arbitrary four-vectors, S=A^{i}B_{i}=A^{i}B^{k}g_{ik}
remains invariant when we subject the vectors to parallel transport (see below
under the Riemann Curvature Tensor). If we covariantly differentiate that scalar
invariant, we must get zero;

(Eq’n 26)

In this case the primed differentiation operator represents the covariant derivative. Because the only changes in the vectors come from parallel transport, the covariant derivatives of those vectors necessarily equal zero, which necessitates that the third term on the second line of Equation 26 also equal zero. And because we have chosen the two vectors arbitrarily, the covariant derivative of the metric tensor must necessarily equal zero,

(Eq’n 27)

in which the upper-case omegas represent the Levi-Civita connection coefficients. The unprimed differentiation operator represents the ordinary partial derivative, of course.

By permuting the indices of the derivative we can generate two other equations equivalent to Equation 27,

(Eq’ns 28)

Add those two equations together and subtract Equation 27 from their sum. Recognizing that the metric tensor and the connection coefficients possess transpose symmetry allows us to simplify the result to

(Eq’n 29)

We can easily solve that equation for the connection coefficient if we
remember that g_{rm}g^{mp}=δ^{p}_{r}.
We get

(Eq’n 30)

which tells us that the connection coefficients coincide precisely with the Christoffel symbols of the second kind.

The Riemann Curvature Tensor

Given a vector field V_{i}, we want to
differentiate it with respect to two different coordinates q^{m} and q^{n},
in essence differentiating the vector field with respect to a minuscule change
in area. The covariant derivatives do not commute with each other –∂_{m}’∂_{n}’V_{i}
≠
∂_{n}’∂_{m}’V_{i}
– so we need to determine how the two double derivatives differ from each other.
Let’s start by calculating out ∂_{m}’∂_{n}’V_{i}.

Because ∂_{n}’V_{i}
represents a second-rank tensor, we have the covariant derivative
∂_{m}’
as

(Eq’n 31)

And because V_{i} represents a first-rank tensor (a vector), we have
the covariant derivative ∂_{n}’
as

(Eq’n 32)

If we use that equation to make the appropriate substitutions into Equation 31, we get

(Eq’n 33)

To calculate ∂_{n}’∂_{m}’V_{i}
we merely interchange the indices m and n in that equation. If we subtract that
equation from the index-exchanged version, the terms in the square brackets drop
out (due to symmetry) and we get

(Eq’n 34)

in which we have the Riemann curvature tensor,

(Eq’n 35)

We thus have a fourth-rank tensor that is anti-symmetric with respect to the
indices m and n; which means that R^{p}_{imn}=-R^{p}_{inm}.
In four-dimensional space and time that tensor has 256 components organized into
block, matrix, row, and column, which we might represent as R^{b}_{mrc}
in order to remember how we organize those elements.

To gain some understanding of what the Riemann tensor does
we need to take another look at the double differentiation through which we
derived it. Establish a point P_{0} (and, yes, Euclidean elephants eat
pee-noughts) and extend from it two differential line segments dx^{m}
and dy^{m}, parallel to the appropriate axes, to thereby define points P_{1}
and P_{2}. From each of those latter points extend the alternate
differential line segment to a point P_{3}, thereby drawing a minuscule
parallelogram.

At the point P_{0} establish a constant
contravariant vector A^{i} and subject it to two parallel transports.
The first transport follows the path P_{0}P_{1}P_{3},
which consists of a displacement dx^{m} followed by a displacement dy^{m}.
The second transport follows the path P_{0}P_{2}P_{3},
which consists of a displacement dy^{n} followed by a displacement dx^{n}.

Parallel transport of A^{i} from P_{0} to
P_{1} produces the vector

(Eq’n 36)

Transport of that vector to P_{3} then produces the vector

(Eq’n 37)

The Christoffel symbol at P_{1} differs from the one at P_{0}
by a minuscule increment so that

(Eq’n 38)

which lets us rewrite Equation 37 as

(Eq’n 39)

In cobbling up that equation I have left off the term in the square of dx^{m}
as ignorably small. We obtain the equivalent equation for the parallel transport
of A^{i} along the path P_{0}P_{2}P_{3} by
interchanging the indices m and n.

We can calculate the net change that we would make in A^{i}
in taking it around the closed path P_{0}P_{1}P_{3}P_{2}P_{0}
by calculating the difference

(Eq’n 40)

in which we recognize that the remainder left behind by the subtraction
coincides with the Riemann curvature tensor (multiplied by the appropriate
factors). That gives us a result similar to the one of Equation 34. Because the
product B_{i}A^{i} yields a scalar invariant, we can use it to
work out the covariant equivalent of Equation 40. We write

(Eq’n 41)

Note that in going from the second line to the third I interchanged the
indices on B_{i} and A^{k}, justifying the change by noting that
because we sum over those indices the interchange does not change the value of
the expression. The third line in that equation only zeroes out necessarily when
the coefficient of Ai equals zero. That criterion necessitates that

(Eq’n 42)

Because we derive the Riemann curvature tensor (also know as the Riemann-Christoffel tensor) from displacing a vector around a closed loop enclosing a minuscule element of area on a surface, I conceive an analogy between the Riemann tensor and the curl operator of ordinary vector calculus. Of course a fundamental difference comes between the two operators: the curl involves a six-fold differentiation of the vector field while the Riemann tensor involves multiplication of the vector field by a 256-fold array of elements made by combining derivatives of the metric tensor. But both operators give us a measure of the curliness of the vector field, either due to the inherent curvature of the field itself or due to curvature induced by the curvature of the space and time in which the field exists.

Let the constant vector A^{k}=dz^{k}, a
minuscule unit vector different from the minuscule unit vectors dx^{n}
and dy^{m}, in Equation 40. The unit vectors dz^{k}, dx^{n},
and dy^{m} mark the primary sides of the minuscule four-cube (tesseract)
that serves as a fundamental unit of the coordinate grid (the variables x, y, z
in this case do not represent the standard Cartesian coordinates: I use them
here to represent the generalized coordinates q in a manner that avoids
confusion). In describing the change in one unit vector Equation 40 also
describes the difference between the volume of the unit four-cube in the curved
space and time of the Riemann tensor and the volume of the corresponding unit
four-cube of Minkowski’s flat spacetime.

To calculate the total difference in the unit volume we
want to calculate the change in Equation 40 in all four directions in space and
time. But the calculation in Equation 40 automatically permutes the variables x,
y, and z, which makes the result six times as large as it should be. Thus we
calculate the element of volume dV_{R} in curved spacetime in proportion
to the corresponding volume element dV_{M} in Minkowski spacetime as

(Eq’n 43)

Now we want to look at another way of treating volumes in curved space.

The Ricci Tensor and Scalar

Here we get into the deepest curvature of space and time, the geometry of warped space. If we have a space described by a metric tensor, then in that space we have geodesic curves. Although we have already described a geodesic as the line that produces the minimum distance between two given points, the mathematical definition of a geodesic tells us that it is a curve that subjects any vector tangent to it to parallel transport automatically. That criterion necessitates that the covariant derivative of that vector equal zero,

(Eq’n 44)

in which **v** represents the tangent vector in question.

If we take elapsed time as a parameter and take v^{k}=dx^{k}/dt
as representing the velocity at which a point moves along the geodesic, then we
can convert the space derivatives into time derivatives by using the fact that v^{i}∂_{i}f=df/dt
for any function f defined in the space. If we multiply Equation 44 by v^{i},
then we get the geodesic equation,

(Eq’n 45)

which gives us what we derived above by minimizing the length of the curve.

Of course we can fill any given space with geodesics and
assign the same parameter t to all of them. For any given value of t, then, we
get a set of points that define a surface in our space. At least one line in
that surface lies normal to the geodesics. On that line we have a tangent vector
**y** and we define a parameter s that determines the location of a point on
the line. Again we get parallel transport, this time of the tangent vector **y**.

Pick two geodesics arbitrarily close to each other and
find two points (x^{m}(t) and *x ^{m}(t)*) that have the
same value of the parameter t. Between those two points we can construct a
minuscule vector, y

(Eq’n 46)

Because we have made y^{m}(t) arbitrarily small, we can use a Taylor
series expansion, drop all terms beyond those of first order in y^{m}(t),
and thereby calculate the difference between the Christoffel symbols evaluated
at the two points as

(Eq’n 47)

In locating our two points arbitrarily close together we
have put the un-italicized point at the center of a local inertial frame. On
that minuscule patch of space the curvature of the space differs insignificantly
from zero (think of a small area of the San Joaquin Valley, where the spherical
Earth appears flat). At the point x^{a}(t), then, the elements of the
Christoffel symbol differ negligibly from zero, so we can treat the Christoffel
symbol at that point as if it had zeroed out. Thus we have the geodesic equation
at the two points as

(Eq’ns 48)

If we subtract the first of those equations from the second, we calculate the second parameter derivative of the deviation vector,

(Eq’n 49)

Next we increment the parameter by a minuscule amount,
thereby bringing our attention to two new points (x’^{a}(t) and x’^{a}(t)).
That move also gives us a new deviation vector, y’^{a}(t), extending
between the two new points. We can produce that new deviation vector by parallel
transporting the old deviation vector from the old pair of points to the new
one. In order to calculate a description of the new deviation vector from the
old one we need to know the second covariant parameter derivative of the old
deviation vector. Relating the covariant parameter derivative to the ordinary
covariant derivative by D_{v}=v^{q}∂’_{v},
we have

(Eq’n 50)

In devising that equation I have made several tacit algebraic moves. In writing the covariant parameter derivative operator as

(Eq’n 51)

I have made use of the fact that d/dt=v^{c}∂_{c}
and applied it to Equation 32. In going from the first line to the second and
from the second to the third I have exploited the fact that in our local
inertial frame the Christoffel symbols effectively zero out, even though their
derivatives don’t. And in going from the fourth line to the fifth I have
replaced the difference between the derivatives of the Christoffel symbols with
the corresponding Riemann curvature tensor as it exists in a local inertial
frame.

So now we know how to calculate the length of the
deviation vector extending between nearby geodesics, the paths that free
particles follow. Imagine now that a minuscule patch occupies the un-italicized
geodesic where the deviation vector meets it. The area vector associated with
the patch stands perpendicular to any line drawn across the patch and also lies
parallel to the deviation vector: the dot product of the area vector and the
deviation vector thus define an arbitrarily small element of volume associated
with our geodesics. If we multiply Equation 50 by that area vector da_{a},
we get

(Eq’n 52)

In writing that equation I have invoked the rule that tells us to make one
covariant index and one contravariant index the same when multiplying tensors to
form an inner product. But y^{a} already has a matching covariant index
on the Riemann tensor, so to enable the proper multiplication of the area vector
we must change the contravariant index on the Riemann tensor to match it; thus,
we get

(Eq’n 53)

in which R_{bc} represents the Ricci tensor. In going from the first
line to the second line in that equation I used the fact that R^{m}_{bcm}=-R^{m}_{bmc}
and then applied the Einstein summation rule, in accordance with the convention
for calculating the trace of a fourth-rank tensor in a process called
contraction (see appendix).

The Ricci (REECH-chee) tensor, discovered by and named after Gregorio Ricci-Curbastro (1853 Jan 12 – 1925 Aug 06), thus tells us the degree to which the volume of a geodesic sphere differs from that of the same sphere in a Euclidean space. Because we can represent any geometric solid as a collection of cylinders, like the one we employed above, we can apply Equation 53 to any suitably small volume element.

Equation 53 also shows us that the Ricci tensor plays a
role analogous to the role played by the metric tensor in determining the length
of a minuscule line segment (Equation 5). Just as the metric tensor, multiplied
by paired differential coordinates, gives us the squared differential length
element, so the Ricci tensor, multiplied by paired velocities, gives us the
second parameter derivative of a ratio. And just as the product g^{ik}g_{ik}=δ^{i}_{i}
gives us a measure of the dimension of the space described by the metric tensor,
so the product

(Eq’n 54)

gives us a measure of the scalar curvature (associated with the Ricci scalar R) of the space that the metric tensor describes. More specifically, we say that the Ricci scalar represents the degree to which the volume of a given small geometric solid in a curved Riemannian space differs from the volume of the same figure in Euclidean space, in this case independent of any motion of the figure.

That latter fact leads to the Ricci scalar playing a central role in the Hilbert action, postulated in 1915 by David Hilbert and used by him to deduce Einstein’s field equation via a variational principle. A discussion of that feature of tensor geometry goes beyond the scope of this essay: I will address it in a suitable essay in the Map of Physics. Thus, we now have the basic facts of tensor geometry, the geometry of warped spaces.

Appendix: Tensor Contraction

Contraction of a tensor in index notation consists of
setting two of a tensor’s indices, one contravariant and the other covariant,
equal to each other and then applying the Einstein summation convention. The
resulting contracted tensor retains the other indices of the original tensor.
For example, we can contract a fourth-rank tensor T^{ab}_{cd} of
valence (2,2) on the second and third indices to create a new second-rank tensor
U^{a}_{d} of valence (1,1). We write that contraction as

(Eq’n A-1)

Note that we cannot perform a contraction on a pair of
indices that are either both contravariant or both covariant. In order to carry
out such a contraction we must first raise or lower an index through calculating
an inner product with an appropriate metric tensor, either covariant (g_{ab})
or contravariant (g^{ab}). We use the metric tensor, then, to raise or
lower one of the indices, as needed, and then apply the operation of
contraction. Mathematicians call that combined operation a metric contraction.
We can thus conceive tensor contraction as a generalization of the trace of a
matrix.

habg