If Integral Kernels were Metrics

I have been thinking about how to interpret integrals in my formalism of exterior calculus. So far, I have not been very successful at getting a good analogue to Stokes' theorem mainly because I am having trouble defining a boundary. In time that may change. For now, I am thinking that maybe I should be looking at describing an infinite dimensional inner product, and letting the definition of an integral follow from there. I talked about infinite dimensional vector spaces in a previous post, but I did so in a limiting sense. Now that is true from many perspectives, but I was referring to the fact that I essentially used an infinite dimensional Euclidean metric. I want to generalize that to an arbitrary infinite dimensional metric in the hopes that I can equate any infinite dimensional metric with an appropriately curved space.

Say we redefine the infinite dimensional inner product between two functions as:

\langle f,h\rangle=\int f(x)G(x,y)h(y)dxdy

, where G(x,y)=G(y,x) is the infinite dimensional metric. Note that this metric can be any generalized function (like the Dirac delta). It is interesting to note that an integral transform could then be viewed as raising or lowering an indices:

h^\dagger(x)=\int G(x,y)h(y)dy

At this point, I noticed that if we express f and h as the sum of a set of differentials at each point in a set, we would get something like this:

f=\sum\limits_{P\in S} f(P)\omega(P)\\h=\sum\limits_{P\in S} h(P)\omega(P)\\\langle f,h\rangle=\sum\limits_{\{P,Q\}\in S}f(P)\omega(P)\cdot\omega(Q) h(Q)

To me, that looks awfully similar to the generalized definition of an inner product between functions. So I think that the inner product of two functions with an arbitrary metric is always equivalent to the inner product of two differentials in an appropriately curved space. Note that any two directions at different points would have to be orthogonal for a Euclidean inner product to occur. According to this formalism, the Euclidean inner product would be written as:

\langle f,h\rangle=\int f(x)\delta(x-y)h(y)dxdy

I will try to formalize all this stuff over the summer. For now, I am going on strong hunches and building more strong hunches. I suspect this notion of equivalence between metrics of function spaces and dot products between directions at different points could lead to some interesting results in the future.

Symmetries of the Metric

The metric may be symmetric about permutations of directions. As of the time of this writing, I have a hunch that symmetry transformations like rotations could be understood in terms of permutations of directions since they interchange directions while preserving the metric. If this were the case, there is a chance spinors could be understood in terms of the matrix representations of these permutations.

Since the rows and columns of the metric are spanned by the same set of directions at a given point, a permutation of directions must do the same thing to rows as it does to columns. That means the permutations can be represented by similarity transformations. Note that if g=hgh^{-1} , then gh=hg or [g,h]=0 . Furthermore, if symmetry transformations are to be represented as as permutation matrices, they must be orthogonal. So symmetry transformations can be represented by orthogonal matrices that commute with the metric.

Now we may ask, if two permutations commute with the metric, will their composition permute with the metric? Consider two permutations, h_1 and h_2 . We find that:

[h_1h_2,g]=h_1h_2g-gh_1h_2=h_1h_2g-h_1gh_2=h_1h_2g-h_1h_2g=0

So the set of all symmetry transformations forms a group under composition. This makes sense. Since each symmetry transformation must preserve the metric, so should any composition of symmetry transformations.

Given the whole group of symmetry transformations, S , consider the cyclic subgroups generated by each element, C(h_n)\subset S . Clearly the union of the elements of all these subgroups will contain all the elements of S since every element in S is the generator of one of these subgroups. For every power of h_n , h_n^k , there is another cyclic subgroup, C(h_n^k)\subset C(h_n) . Therefore, one can always consider a set of "disjoint" cyclic subgroups (i.e. that only have the identity element in common) as a means of forming the original group, S .

I wonder whether these disjoint cyclic subgroups could be equated with rotations and reflections. I may edit this post after I have explored the concept more with friends and/or professors.

Notes on Directions

This post is just a bunch of notes on properties of directions and dual directions. I will probably refer back to this in future posts if I continue to use this formalism.

Consider k directions wedged together: \omega_{j_1}\wedge\dots\wedge\omega_{j_k} . I will call this a "simple k-direction" in accordance with the terminology used to describe multivectors. Since multiplying a direction by a number yields a vector, multiplying a simple k-direction by a number yields a simple k-vector. k-vectors represent oriented volumes, and their magnitude is given by:

|\boldsymbol v_1\wedge\boldsymbol v_k|^2=\boldsymbol v_1\wedge\dots\wedge\boldsymbol v_k\lrcorner\boldsymbol v_k\wedge\dots\wedge\boldsymbol v_1

, where \lrcorner is the product described here. Since any simple k-vector is just a magnitude times a k-direction, we have:

|\boldsymbol v_1\wedge\boldsymbol v_k|^2=\alpha^2|\omega_1\wedge\dots\wedge\omega_k|^2\\=\alpha^2\omega_1\wedge\dots\wedge\omega_k\lrcorner\omega_k\wedge\dots\wedge\omega_1=\alpha^2\epsilon^{j_1,\dots,j_k}\prod_n\omega_n\cdot\omega_{j_n}

, where \epsilon^{j_1,\dots,j_k} is the rank k Levi-Civita tensor. Therefore, the volume spanned by a simple k-vector is always proportional to the square root of the determinant of the metric tensor assigned to the k underlying directions. From here, we can describe a notion of the "dimension" described by a bunch of directions as the rank of the metric tensor assigned to them. The dimension of the kernel of the metric tensor gives you the number of "linearly dependent" directions. By linearly dependent, I mean there exists a nontrivial linear combination directions that yields the zero vector.

This imposes an interesting constraint on dual directions. Consider three directions, \omega_1 , \omega_2 and \omega_3 such that \alpha^1\omega_1+\alpha^2\omega_2+\alpha^3\omega_3=\boldsymbol 0 . Now consider the dual direction \omega^3 such that \omega^3\cdot\omega_i=\delta^3_i . We find that:

\omega^3\cdot(\alpha^1\omega_1+\alpha^2\omega_2+\alpha^3\omega_3=\boldsymbol 0)\\\alpha^10+\alpha^20+\alpha^3=0\\\alpha^3=0

So either we give up on distributing the dot product (not likely) or we admit that dual directions cannot "single out" directions unless they are linearly independent.

We may then define the dual directions in the more general context of a projection operator:

\omega_i\cdot\omega^k\omega_k\cdot\omega_j=\omega_i\cdot\omega_j\\\omega^i\cdot\omega^k\omega_k\cdot\omega^j=\omega^i\cdot\omega^j\\\omega^i\cdot\omega^k\omega_k\cdot\omega_j=\omega^i\cdot\omega_j

We then find that:

g_{ik}\tilde g^{kp}g_{pj}=\omega_i\cdot\omega_k\omega^k\cdot\omega^p\omega_p\cdot\omega_j=\omega_i\cdot\omega_j=g_{ij}\\\tilde g^{ik} g_{kp}\tilde g^{pj}=\omega^i\cdot\omega^k\omega_k\cdot\omega_p\omega^p\cdot\omega^j=\omega^i\cdot\omega^j=\tilde g^{ij}

We therefore conclude that g and \tilde g are generalized inverses. If we further impose that:

\omega^i\cdot\omega_j=\omega^j\cdot\omega_i

, we find that:

\omega^i\cdot\omega^k\omega_k\cdot\omega_j=\tilde g^{ik}g_{kj}=\omega^i\cdot\omega_j=\omega^j\cdot\omega_i=\tilde g^{jk}g_{ki}

Since g and \tilde g are symmetric, we find that \tilde g^{jk}g_{ki}=g_{ik}\tilde g^{kj} . These results imply g and \tilde g are Moore-Penrose pseudoinverses. These are known to exist for any matrix. Therefore, I have not imposed any limiting conditions upon the dual directions.

As a final note, I would like to look at some basic constraints on the entries of the metric tensor. First, recall that in its most general sense, the metric tensor maps pairs of directions to the interval, [-1,1] . A direction dot itself need not be 1. Indeed, on a Minkowski metric, the "time" direction dot itself is -1. However, a direction dot itself should always be \pm 1 and the absolute value of the inner product of two different directions is always less than or equal to 1. It makes sense intuitively, and as far as I can tell, should be taken as an axiom. The motivation is that if the inner product is to be equated with some means of comparison, the closest thing to a given direction is the direction itself. Mathematically, it looks like this:

|\omega_i\cdot\omega_j|\leq1

, where equality holds if and only if the directions are parallel. Why not identical? Because north and south are different directions yet their inner product is -1. One might argue that I should only consider north or south a direction, then derive any southern pointing vector as a negative northern pointing vector. The reason I disagree lies in the motivation for defining a direction. I wanted something that could be uniquely assigned to pairs of neighboring points. Consider a simple square tiling. Clearly there would be two neighbors per direction if we considered vectors of equal magnitudes and opposite signs to be pointing in the same direction. Therefore, one needs to consider opposite directions distinct. The problem with this is that any metric that describes a system that has two points in oppsite directions will not be invertible. For example, in the square tiling, there will be 4 directions with two pairs that are linearly dependent. However, this phenomenon, is not particular to a square lattice. Indeed, a hexagonal tiling has three directions per point, and yet we would not complain about having a linearly dependent direction in that circumstance. So, if it is generally OK to have more neighbors than there are dimensions, I do not think it is a big deal to treat opposite directions as different.

--UPDATE--

I found that:

\omega_i\lrcorner(\omega_i\lrcorner\omega_{k_1}\wedge\dots\wedge\omega_{k_p})=0

 First, note that:

\omega_i\lrcorner\omega_{k_1}\wedge\dots\wedge\omega_{k_p}=\sum\limits_j \epsilon^j\omega_i\cdot\omega_{k_j}\omega_{k_1}\wedge\dots\check\omega_{k_j}\dots\wedge\omega_{k_p}

, where \epsilon^j is the rank 1 Levi-Civita tensor and \check\omega_{k_j} denotes the absence of \omega_{k_j} from the product. By the same logic,

\omega_i\lrcorner(\omega_i\lrcorner\omega_{k_1}\wedge\dots\wedge\omega_{k_p})=\omega_i\lrcorner(\sum\limits_j \epsilon^j\omega_i\cdot\omega_{k_j}\omega_{k_1}\wedge\dots\check\omega_{k_j}\dots\wedge\omega_{k_p})\\=\sum\limits_{jl}\epsilon^j\epsilon^l\DeclareMathOperator{\sgn}{sgn}\sgn(j-l)\omega_i\cdot\omega_{k_l}\omega_i\cdot\omega_{k_j}\omega_{k_1}\wedge\dots\check\omega_{k_l}\dots\check\omega_{k_j}\dots\wedge\omega_{k_p}

The \sgn(j-l) comes from the fact \omega_{k_j} is missing from the sum, so the sign you would expect flips when l>j . Since this term is antisymmetric about l and j , but the other terms are symmetric, the sum goes to zero. I strongly suspect this will come in handy in the future.

The Covariant Codifferential

The codifferential of a k-form is usually written in terms of the exterior derivative and the Hodge star as: \delta=(-1)^k\star d\star . I described how the exterior covariant derivative works--at least within the framework described here. However, I made no mention of the Hodge star operator. I have seen it described in terms of non-orthogonal coordinates (e.g. the Wikipedia article), but I am not so sure it generalizes so readily to curved and/or discrete spacetime.

The standard definition of the codifferential is \delta=(-1)^k\star^{-1} d\star , where k is the degree of the form it is acting on. The Hodge star, \star , can be defined such that:

a\wedge\star b=\star(a\lrcorner b)

and \star^{-1} is defined such that \star^{-1}\star=\star\star^{-1}=1 , with 1 representing the identity operator. If a is a p-form and b is a q-form, a\lrcorner b is a q-p-form. It should be treated as the ordinary dot product except it goes to zero when p>q. On the left hand side, one has an exterior product between a p-form and an n-q-form--which gives an p+n-q-form (n being the dimension of the space at hand). The right hand side is the dual of a q-p-form, giving an n-(q-p)-form. Since p+n-q=n-(q-p), the left and right hand side are of the same degree.

Consider the exterior derivative acting in a particular direction, i:

\boldsymbol{d}_iT(P)=(T(P_i)-T(P))\wedge\omega^i(P)

, where T is a k-form. Imagine we took the Hodge star of T(P) first:

\boldsymbol{d}_i\star T(P)=(\star T(P_i)-\star T(P))\wedge\omega^i(P)\\=\star T(P_i)\wedge\omega^i(P)-\star T(P)\wedge\omega^i(P)\\=(-1)^k(\omega^i(P)\wedge\star T(P_i)-\omega^i(P)\wedge\star T(P))\\=(-1)^k\star(\omega^i(P)\lrcorner T(P_i)-\omega^i(P)\lrcorner T(P))\\=(-1)^k\star(\omega^i(P)\lrcorner(T(P_i)-T(P)))

Operating on both sides with \star^{-1} yields:

(-1)^k\star^{-1}\boldsymbol{d}_i\star T(P)=\omega^i(P)\lrcorner(T(P_i)-T(P))

Thus, the covariant codifferential is:

\boldsymbol{\delta} T=\sum\limits_i\omega^i(P)\lrcorner(T(P_i)-T(P))

Notice how this transfers perfectly well to a discrete space, and I did not even define a discrete Hodge star! I simply used the desired properties of what a discrete Hodge star would look like to deduce what the discrete codifferential should be. I think I got lucky that everything worked out and I got simple definition that is computationally tractable.

The Exterior Covariant Derivative

I did not know there was a name for it until I stumbled upon a link to a stub Wikipedia article on it. The exterior covariant derivative is the only operator necessary to develop the Einstein tensor, which is the key to understanding Einstein's field equations (general relativity). I am going to develop the concept on a discrete set of points, following the formalism of a previous post:

We can define the gradient (exterior derivative for a scalar), \boldsymbol{d} , of that parameter as:

\boldsymbol{d}x(P)=\sum\limits_i(x(P_{i})-x(P))\omega^i(P)

, where \omega^i(P) is dual of the direction that points from point P to its ith neighbor, P_{i} .

The exterior derivative only operates on magnitudes, not directions. The exterior covariant derivative operates on both. The exterior covariant derivative is applied to an arbitrary tensor in the same manner as a scalar, namely:

\boldsymbol{d}T(P)=\sum\limits_i(T(P_{i})-T(P))\wedge\omega^i(P)

Notice how in this formalism, tensors gain a new [dual] directional component, not a new vector or 1-form component. The distinction here is that directions do not have magnitude and you cannot add them or subtract them without first giving them a magnitude.

When you apply the exterior covariant derivative to a basis vector, \boldsymbol{e}_i you get:

\boldsymbol{d}\boldsymbol{e}_i=\Gamma_{ji}^k\omega_k\omega^j=\omega_k\omega^k_i

, where \omega^k_i is called the connection 1-form. Note that directions and dual directions do not get wedged, they just sort of clump together. This is to mimic the way tensors gain upper indices in a manner independent of the number and order of their lower indices and gain lower indices independent of the number and order of their upper indices.

In general, \boldsymbol{d}^2\neq 0 unless we are working in flat spacetime. Consider \boldsymbol{d}^2T :

\boldsymbol{d}^2T(P)=\sum\limits_{ij}((T(P_{ij})-T(P_{j}))\wedge\omega^i(P_{j})\\-(T(P_{i})-T(P))\wedge\omega^i(P))\wedge\omega^j(P)

When T is a basis vector, \boldsymbol{e}_\sigma , you get the curvature 2-form, R^j_\sigma :

\boldsymbol{d}^2\boldsymbol{e}_\sigma=\sum\limits_{ij}((\boldsymbol{e}_\sigma(P_{ij})-\boldsymbol{e}_\sigma(P_{j}))\wedge\omega^i(P_{j})\\-(\boldsymbol{e}_\sigma(P_{i})-\boldsymbol{e}_\sigma(P))\wedge\omega^i(P))\wedge\omega^j(P)\\=\sum\limits_{j}(\omega_k(P_{j})\omega_{\sigma}^k(P_{j}))-\omega_k(P)\omega_{\sigma}^k(P))\wedge\omega^j(P)\\=\omega_k(P) R^k_\sigma(P)

The Riemann curvature tensor can be recovered from the curvature 2-form as follows:
R^k_\sigma=R^k_{\sigma ij}\omega^j\wedge\omega^i

Cramer's Rule and Exterior Algebra

There is an interesting way to formulate Cramer's rule using exterior algebra (the algebra of differential forms). Instead of writing a matrix of vector components, we write down a matrix of vectors--basis and all like so:

A_i=v^{ij}\boldsymbol{e}_j=\begin{pmatrix}\boldsymbol{v}^1\\\vdots\\\boldsymbol{v}^N\end{pmatrix}

, where A_i is the ith row of A, \boldsymbol{e}_j is the jth vector in the basis and v^{ij} is the jth component of the ith vector such that v^{ij}\boldsymbol{e}_j=\boldsymbol{v}^i . It turns out, the inverse of A can be written as:

(A^{-1})^i=(-1)^{i-1}\frac{(\boldsymbol{v}^1\wedge\dots\wedge\boldsymbol{v}^{i-1}\wedge\boldsymbol{v}^{i+1}\dots\wedge\boldsymbol{v}^N)\lrcorner(\boldsymbol{v}^1\wedge\dots\wedge\boldsymbol{v}^N)}{(\boldsymbol{v}^1\wedge\dots\wedge\boldsymbol{v}^N)\lrcorner(\boldsymbol{v}^1\wedge\dots\wedge\boldsymbol{v}^N)}

, where (A^{-1})^i is the ith column of the inverse of A. The glyph, \lrcorner , represents a kind of dot product for multivectors. It works like this:

\boldsymbol{a}\lrcorner(\boldsymbol{b}\wedge\boldsymbol{c})=(\boldsymbol{a}\cdot\boldsymbol{b})\boldsymbol{c}-(\boldsymbol{a}\cdot\boldsymbol{c})\boldsymbol{d}

, which can be deduced from the defining the wedge product as the antisymmetric outer product. Furthermore, if the degree of the multivector to the left of the glyph is greater than the degree multivector to the right of the, the result is zero. This will come in handy later. Furthermore,

\boldsymbol{a}\lrcorner(\boldsymbol{b}^1\wedge\dots\wedge\boldsymbol{b}^N)=\epsilon_{i}(\boldsymbol{a}\cdot\boldsymbol{b}^i)\boldsymbol{b}^1\wedge\dots\wedge\boldsymbol{b}^{i-1}\wedge\boldsymbol{b}^{i+1}\wedge\dots\wedge\boldsymbol{b}^N

, where \epsilon_{i}=(-1)^{i-1} is the rank 1 Levi-Civita tensor. Generalizing the above expression to an N-form dotted with an N-form, the denominator would be:

(-1)^{N(N-1)/2}\epsilon_{k_1,\dots,k_N}\prod\limits_{n}\boldsymbol{v}^{n}\cdot\boldsymbol{v}^{k_n}

Similarly, the numerator would be:

(-1)^{N(N-1)/2+i-1}\epsilon_{k_1,\dots,k_N}\boldsymbol{v}^{k_i}\prod\limits_{n\neq i}\boldsymbol{v}^{n}\cdot\boldsymbol{v}^{k_n}

We can prove that this is the inverse of A as follows:

A_p\lrcorner(A^{-1})^i=\frac{\epsilon_{k_1,\dots,k_N}\boldsymbol{v}^p\cdot\boldsymbol{v}^{k_i}\prod\limits_{n\neq i}\boldsymbol{v}^{n}\cdot\boldsymbol{v}^{k_n}}{\epsilon_{k_1,\dots,k_N}\prod\limits_{n}\boldsymbol{v}^{n}\cdot\boldsymbol{v}^{k_n}}

Clearly when p=i, the numerator equals the denominator. However, when p\neq i , the numerator vanishes because the term to the \boldsymbol{v}^p to the left of the product sign will match one of the \boldsymbol{v}^n terms to the right of the product sign. Since we are summing over every permutation of k_i , for every permutation we sum over, there will be another permutation that is identical except that k_i and k_n are switched. Since every time two indices are switched, the Levi-Civita tensor produces a negative sign, these two terms will exactly cancel out. Thus,

A_p\lrcorner(A^{-1})^i=\delta_p^i

, where \delta_p^i is the Kronecker delta--which equates to the identity matrix.

I would like to point out that this formalism automatically takes an arbitrary metric into account by expressing everything in terms of interior products.

Magnitudes and Directions

I have been rereading one of the introductory chapters of Misner, Thorn and Wheeler and I decided to try to come up with my own notion of a 1-form based on the concept of a differential. I suspect the result is equivalent to the standard definition, but from a new perspective that treats "magnitudes" and "directions" as separate entities that combine to form a vector.

Let us start with the most bare (but relevant) mathematical entity I can think of: a set of points. Picture a set of points scattered throughout space. We next need to introduce a set of directions. At each point, imagine there is an associated set of directions. What is a direction you ask? I would define it as something you can take the dot product of, but you cannot add together. You cannot add north and east because you do not know "how much north" and "how much east," but you can certainly say north dot east = 0 because they are perpendicular. You could define the dot product of two directions as the cosine of the angle between them. The difference between dotting two directions and dotting two vectors is that the former gives you a number with no units. You could then think of a dot product as a mapping from pairs of directions to numbers on the interval [-1,1]. So we have a set of unique directions at each point. The number of directions associated with each point does not have to be the same. Some points could have eight neighbors while others could have two.

I am going to define the basis as the set of directions associated with each point. Normally, the basis is thought of as being composed of vectors, but in this context, I am going to think of them as directions without magnitude. We can make vectors by arranging a set of directions in a row and a set of "magnitudes" (numbers with units) in a column and doing matrix multiplication. Call your magnitudes v^\alpha and your directions \omega_\alpha . We can represent a vector as v^\alpha \omega_\alpha by using the Einstein summation convention.

From here, we may define the metric as:

g_{ij}=\omega_i\cdot\omega_j

The diagonal entries of this notion of a metric metric will always describe flat spacetime. If the basis is not orthogonal, you will get off-diagonal entries. Notice how the metric naturally shows up when taking dot products of two vectors:

(v^i\omega_i)\cdot(u^j\omega_j)=v^iu^j(\omega_i\cdot\omega_j)=v^iu^jg_{ij}

Imagine that each basis direction at a given point pointed toward a unique neighboring point. Consider a parameter, x, that varies from point to point. We can define the gradient (exterior derivative for a scalar), \boldsymbol{d} , of that parameter as:

\boldsymbol{d}x(P)=\sum\limits_i(x(P_{i})-x(P))\omega^i(P)

, where \omega^i(P) is the dual of the direction, \omega_i(P) , that points from point, P , to its ith neighbor, P_{i} . The dual direction, \omega^i , is defined such that:

\omega^i\cdot\omega_j=\delta^i_j

, where \delta^i_j is the Kronecker delta. In this sense, a 1-form is just a vector of infinitesimal magnitude spanned by the dual of the basis directions at a point. Since coordinates are just a set of parameters that vary from point to point, these differentials are completely compatible with the usual differentials used in calculus. Since all but one of the N coordinates are held constant in this definition, one can imagine (or at least imagine the existence of) N-1 dimensional sheets in which all other coordinates are held constant. The differentials point in the direction normal to these sheets. The depth of the sheets approaches zero as the points approach a continuum. This is why Misner, Thorn, and Wheeler describe 1-forms as sheets. They do not mean infinitesimal sheets but "isosurfaces" in which all but one coordinate is held constant.

It is worth noting that unless we let our set of points approach a continuum, the product rule works like this:

\boldsymbol{d}x(P)y(P)\\=\frac{1}{2}\sum\limits_i((x(P_{i})-x(P))(y(P_i)+y(P))+(x(P_i)+x(P))(y(P_{i})-y(P)))\omega^i(P)\\=\sum\limits_i(x(P_i)y(P_i)-x(P)y(P))\omega^i(P)

This does not generalize to arbitrary numbers of products in a simple way; one must iteratively apply the product rule.

Given a set of basis vectors, \boldsymbol{e}_j , we may derive a set of basis 1-forms and vice versa. Consider the quantity, \boldsymbol{g}_{ij}=\omega_i\cdot\boldsymbol{e}_j , its inverse, \tilde{\boldsymbol{g}}^{ij} , and the following quantities: \boldsymbol{e}^i=\tilde{\boldsymbol{g}}^{ij}\omega_j . Since \tilde{\boldsymbol{g}}^{ik}\boldsymbol{g}_{kj}=\delta^i_j (where \delta^i_j is the Kronecker delta),

\boldsymbol{e}^i\cdot\boldsymbol{e}_j=\tilde{\boldsymbol{g}}^{ik}\omega_k\cdot\boldsymbol{e}_j=\tilde{\boldsymbol{g}}^{ik}\boldsymbol{g}_{kj}=\delta^i_j

The quantities, \boldsymbol{e}^i , form the basis for what are called covectors and can be considered dual to the basis \boldsymbol{e}_i . Whatever units you give the vectors, covectors have the inverse of those units. These covectors are like 1-forms that do not have infinitesimal magnitude.

I later came to the conclusion that what I call "directions" are analogous to what most texts call "unit vectors" except for the fact that I say you cannot add or subtract directions. In practice, I cannot think of a single scenario in which you would add or subtract unit vectors without first assigning them a magnitude of some sorts. So I think, in practice, it is generally safe to think of unit vectors as directions and apply the formalism I outlined in this post to tensor analysis.

The Metric

The metric lies at the heart of general relativity. Every concept in general relativity can be derived from the metric. In fact, the metric is usually what you solve for. I discussed a heuristic approach to calculating the Christoffel symbols and the Riemann curvature tensor, but I think it time to tie that heuristic reasoning to the modern mathematical formalism.

The simplest non-Euclidean metric is the Minkowski metric. Recall that in special relativity, time slows down (for you) as you move closer to the speed of light. Special relativity is really all about a formula for "proper time," \tau , which is the amount of time you think has gone by. The formula for proper time is:

d\tau^2=dt^2-\frac{dx^2}{c^2}-\frac{dy^2}{c^2}-\frac{dz^2}{c^2}

, where I used differentials to denote small increments of the variable. c is the speed of light, dt is an increment of time measured by someone else, and dx , dy and dz represent how much you moved. This way you can integrate over arbitrary functions of time. There are a lot of ways to interpret this--especially when you start flipping the cs (plural "c") around. The purpose of this post is not to give interesting interpretations of this fundamental law so much as to see how it generalizes to curved spacetime. Anyway, since real numbers cannot have negative norms, you have to come up with another way of describing dot products. The standard solution is to just make a table of the dot products of all the different components of the basis with each other. This list is called the metric tensor, and in four dimensional spacetime, it is a 4x4 matrix. The Minkowski metric used in special relativity is given by:

\nu=\begin{pmatrix}1&0&0&0\\0&-\frac{1}{c^2}&0&0\\0&0&-\frac{1}{c^2}&0\\0&0&0&-\frac{1}{c^2}\end{pmatrix}

When you want to find the dot product of a row vector and a column vector, we are used to just matrix multiplying them as is. That is all well and good for Euclidean space (in which the metric tensor is just the identity matrix) but does not work for the world we live in (unless you make time imaginary). So instead of just plopping your row vector next to your column vector and multiplying, you stick the metric tensor in between the two to compensate for non-Euclidean spacetime. But you should not think of the metric tensor as just something you plop between a row and column vector when you take a dot product, but as a table of dot products. This interpretation is much more fruitful.

So if the metric tensor is just a table of dot products, we can take into account any anisotropy by having some diagonal elements larger than others. We can take shearing into account with off-diagonal elements. The metric tensor pretty much encodes all the information about spacetime, and with it, we can calculate things like Christoffel symbols and Riemann curvature tensors. Naturally, I will start with the former.

Recall the definition of the Christoffel symbols:

\nabla_ie_j=\Gamma_{ij}^ke_k

, where the exponent is not an exponent but another index--an "upper" index--and like upper lower indices are summed over (Einstein summation). I will show you how write this in terms of the inverse metric tensor:

g_{ij}=e_i\cdot e_j

, although the standard derivation calls this the metric tensor and g^{ij} its inverse. Since the Christoffel symbols are the derivatives of the basis, I will start by taking the derivative of the inverse metric tensor just to see what happens:

\nabla_k\left(e_i\cdot e_j\right)=\left(\nabla_k e_i\right)\cdot e_j+e_i\cdot\left(\nabla_k e_j\right)\\=\Gamma_{ki}^\lambda e_\lambda\cdot e_j+\Gamma_{kj}^\lambda e_\lambda\cdot e_i\\=\Gamma_{ki}^\lambda g_{\lambda j}+\Gamma_{kj}^\lambda g_{\lambda i}

Note what happens when we switch indices i and k:

\nabla_i\left(e_k\cdot e_j\right)=\Gamma_{ik}^\lambda g_{\lambda j}+\Gamma_{ij}^\lambda g_{\lambda k}

Compare this to the result of switching j and k:

\nabla_j\left(e_i\cdot e_k\right)=\Gamma_{ji}^\lambda g_{\lambda k}+\Gamma_{jk}^\lambda g_{\lambda i}

Note that:

\nabla_i\left(e_k\cdot e_j\right)+\nabla_j\left(e_i\cdot e_k\right)-\nabla_k\left(e_i\cdot e_j\right)\\=\Gamma_{ik}^\lambda g_{\lambda j}+\Gamma_{ij}^\lambda g_{\lambda k}+\Gamma_{ji}^\lambda g_{\lambda k}+\Gamma_{jk}^\lambda g_{\lambda i}-\left(\Gamma_{ki}^\lambda g_{\lambda j}+\Gamma_{kj}^\lambda g_{\lambda i}\right)\\=\left(\Gamma_{ik}^\lambda-\Gamma_{ki}^\lambda\right)g_{\lambda j}+\left(\Gamma_{ij}^\lambda+\Gamma_{ji}^\lambda\right)g_{\lambda k}+\left(\Gamma_{jk}^\lambda-\Gamma_{kj}^\lambda\right)g_{\lambda i}

Now if the Christoffel symbols were symmetric about their lower indices, this would reduce to:

2\Gamma_{ij}^\lambda g_{\lambda k}

From here, we could derive the alternate definition of the Christoffel symbols:

\Gamma_{ij}^k=\frac{1}{2}g^{k\lambda}\hat{e}_\lambda\cdot\left(\nabla_ig_{kj}+\nabla_jg_{ik}-\nabla_kg_{ij}\right)

, where g^{k\lambda}g_{\lambda i}=\delta_j^k ( \delta being the Kronecker delta function), and \hat{e_\lambda}=\frac{e_\lambda}{|e_\lambda|} . So the definition of the Christoffel symbols that I showed you in my previous post was more general than this one in that \nabla_ie_j=\nabla_je_i need not apply. However, since this latter definition is more common in practice you should be careful to choose a basis such that the Christoffel symbols are symmetric about their lower indices.

The Riemann curvature tensor would be:

R_{\sigma kij}=\hat{e}_\sigma\cdot\left(\nabla_i\nabla_j-\nabla_j\nabla_i\right)e_k\\=\hat{e}_\sigma\cdot\left(\nabla_i\Gamma_{jk}^\lambda e_\lambda-\nabla_j\Gamma_{ik}^\lambda e_\lambda\right)\\=g_{m\sigma}\left(\left(\delta_k^\lambda\nabla_i+\Gamma_{jk}^{\lambda}\right)\Gamma_{j\lambda}^m-\left(\delta_k^\lambda\nabla_j+\Gamma_{ik}^{\lambda}\right)\Gamma_{i\lambda}^m\right)

Therefore,

R_{kij}^\nu=g^{\nu\sigma}R_{\sigma ijk}=\left(\delta_k^\lambda\nabla_i+\Gamma_{ik}^{\lambda}\right)\Gamma_{j\lambda}^\nu-\left(\delta_k^\lambda\nabla_j+\Gamma_{jk}^{\lambda}\right)\Gamma_{i\lambda}^\nu

The Riemann Curvature Tensor

Like the Christoffel symbols, I had a lot of trouble trying to make sense out of the Riemann curvature tensor. I was rereading the Wikipedia article, and I made the connection when it got to the explanation concerning parallel transport. Imagine you are standing at the equator facing north. From there, you travel to the north pole. Then, without turning, you start moving to your right until you are back on the equator. Now you move backwards until you are back to where you started. If you visualized this correctly, you should have turned 90 degrees clockwise. So in curved space, you can make a loop and end up changing direction. The Riemann curvature tensor measures the degree to which that happens.

To understand how this works mathematically, I am going to refer back to the Christoffel symbols. The Christoffel symbols tell you how much your orientation changes when you move in a straight line. We want to know how much it changes when you move around an infinitesimal loop around a point. Consider four points,

\{P_1,P_2,P_3,P_4\}=\{\{x,y\},\{x,y+dy\},\{x+dx,y\},\{x+dx,y+dy\}\}

such that we define an orientation that states going from x to x+dx , and going from y to y+dy describe traveling in a "positive" direction. We can express the square that these points make as the product of two intervals:

[x,x+dx]\times[y,y+dy]=\begin{pmatrix}\{x,y+dy\}&\{x+dx,y+dy\}\\\{x,y\}&\{x+dx,y\}\end{pmatrix}

The boundary of this area is defined as:

(\{x+dx\}-\{x\})\times(\{y+dy\}-\{y\})=\{x+dx,y+dy\}-\{x,y+dy\}+\{x,y\}-\{x+dx,y\}

Notice that each line bounding our infinitesimal square can be written as the sum of two terms in the right hand side of the above definition. Taking e_k at each of these points in this arrangement yields:

e_k(x+dx,y+dy)-e_k(x,y+dy)-(e_k(x+dx,y)-e_k(x,y))

I should probably do an example. How about moving along the surface of the earth? Say we want to know what happens to our north vector as we take a step, dx , north then a step, dy , east, then a step, -dx , backwards, then a step -dy backwards. Based on the previous example of parallel transport, we know that the north vector will be rotated by a small amount clockwise. Therefore, we should look at the Christoffel symbols that make the north vector lean east. Those would be the ones in which you move from west to east or vice versa. When we take a step west at the equator, the north vector stays the same, but if you were near the north pole and took a step east, your compass would rotate a little if you did not turn as you walked. The amount your compass moves east (per step you take east) is given by the Christoffel symbol, \Gamma^{\text{east}}_{\text{east},\text{north}}=\tan(\text{latitude}) , where "latitude" is the latitude you are standing at. We want to know how much further east our compass will move when we are a small step away from the equator than it would at the equator (per small step we take away from the equator). That is just the derivative! The derivative of tan is \sec^2(\text{latitude}) . So that is the Riemann curvature tensor, R_{\text{north},\text{east},\text{north}}^{\text{east}} . The first index is the vector you are keeping track of, second is the direction of the second step you take, the third is the direction of the first step you take, and the upper index is the component your original vector gained after moving around your closed loop.

Differential Forms and the Dirac Equation

In an earlier post, I talked about how Maxwell's equations can be written in terms of differential forms. I would like to do the Dirac equation a similar justice, but I am just guessing on how this would work.

Dirac derived the famous equation by coming up with a way to take the square root of the Klein-Gordon equation:

\sqrt{\partial^2-m^2}=0

Well, I know the \partial^2 equates to the Laplace–Beltrami operator. This is the negative of the Laplace-de Rham operator, d\delta+\delta d for a scalar function due to sign conventions. The Laplace-de Rham operator has an obvious square root, but the m^2 throws a wrench in the works. However, it would be equally valid to derive the Dirac equation from:

\sqrt{\partial^2}=m

This makes the Dirac equation:

d-\delta=m

It turns out a someone by the name of Kahler came up with this idea in 1962.

Since d takes a p-form and returns a p+1 form, \delta takes a p-form and returns a p-1-form, and m takes a p-form and returns a p-form, nontrivial solutions to this form of the Dirac equation will have to be the sum differential forms of different degrees.

Starting with:

(d-\delta-m)\Phi=0

, such that

\Phi=\phi^0+\phi^1+\phi^2+\phi^3+\phi^4

, where the exponent is an upper index denoting the degree of the differential form, you get:

  1. -\delta\phi^1=m\phi^0
  2. d\phi^0-\delta\phi^2=m\phi^1
  3. d\phi^1-\delta\phi^3=m\phi^2
  4. d\phi^2-\delta\phi^4=m\phi^3
  5. d\phi^3=m\phi^4

We may turn this into a matrix equation as follows:

\begin{pmatrix}0&-\delta_1&0&0&0\\d_0&0&-\delta_2&0&0\\0&d_1&0&-\delta_3&0\\0&0&d_2&0&-\delta_4\\0&0&0&d_3&0\end{pmatrix}\begin{pmatrix}\phi^0\\\phi^1\\\phi^2\\\phi^3\\\phi^4\end{pmatrix}=m\begin{pmatrix}\phi^0\\\phi^1\\\phi^2\\\phi^3\\\phi^4\end{pmatrix}

The subscripts denote the degree of the form the operator applies to. Now we may row reduce the matrix. I will start from the bottom and work my way up as to end up with an equation for a scalar field. Start with:

\begin{pmatrix}-m&-\delta_1&0&0&0\\d_0&-m&-\delta_2&0&0\\0&d_1&-m&-\delta_3&0\\0&0&d_2&-m&-\delta_4\\0&0&0&d_3&-m\end{pmatrix}\begin{pmatrix}\phi^0\\\phi^1\\\phi^2\\\phi^3\\\phi^4\end{pmatrix}=0

First multiply the fourth row by -m, then add the codifferential of the fifth row:

\begin{pmatrix}-m&-\delta_1&0&0&0\\d_0&-m&-\delta_2&0&0\\0&d_1&-m&-\delta_3&0\\0&0&-md_2&\delta_4d_3+m^2&0\\0&0&0&\delta_4d_3&-m\delta_4\end{pmatrix}

The next step would be to operate with \delta d+m^2 from the left on the third row, operate with \delta from the right on the fourth row, then add the resulting fourth row to the resulting third row. The matrix will look like this next:

\begin{pmatrix}-m&-\delta_1&0&0&0\\d_0&-m&-\delta_2&0&0\\0&\left(\delta_3d_2+m^2\right)d_1&-m\left(\delta_3d_2+m^2\right)-md_1\delta_2&0&0\\0&0&-md_1\delta_2&\left(\delta_3d_2+m^2\right)\delta_3&0\\0&0&0&\delta_4d_3&-m\delta_4\end{pmatrix}

Simplifying, you get:

\begin{pmatrix}-m&-\delta_1&0&0&0\\d_0&-m&-\delta_2&0&0\\0&m^2d_1&-m\left(\delta_3d_2+d_1\delta_2+m^2\right)&0&0\\0&0&-md_1\delta_2&\left(\delta_3d_2+m^2\right)\delta_3&0\\0&0&0&\delta_4d_3&-m\delta_4\end{pmatrix}

Repeating the process yields:

\begin{pmatrix}-m&-\delta_1&0&0&0\\md_0\left(\delta_1d_0+m^2\right)&-m^2\left(\delta_2d_1+d_0\delta_1+m^2\right)+m^2d_0\delta_1&0&0&0\\0&m^2d_0\delta_1&-m\left(\delta_2d_1+d_0\delta_1+m^2\right)\delta_2&0&0\\0&0&-md_1\delta_2&\left(\delta_3d_2+m^2\right)\delta_3&0\\0&0&0&\delta_4d_3&-m\delta_4\end{pmatrix}

After combining like terms, you get:

\begin{pmatrix}-m&-\delta_1&0&0&0\\md_0\left(\delta_1d_0+m^2\right)&-m^2\left(\delta_2d_1+m^2\right)&0&0&0\\0&m^2d_0\delta_1&-m\left(\delta_2d_1+d_0\delta_1+m^2\right)\delta_2&0&0\\0&0&-md_1\delta_2&\left(\delta_3d_2+m^2\right)\delta_3&0\\0&0&0&\delta_4d_3&-m\delta_4\end{pmatrix}

One more time!

\begin{pmatrix}m^3\left(\delta_1d_0+m^2\right)&0&0&0&0\\0&-m^2\left(\delta_1d_0+m^2\right)\delta_1&0&0&0\\0&m^2d_0\delta_1&-m\left(\delta_2d_1+d_0\delta_1+m^2\right)\delta_2&0&0\\0&0&-md_1\delta_2&\left(\delta_3d_2+m^2\right)\delta_3&0\\0&0&0&\delta_4d_3&-m\delta_4\end{pmatrix}

So you end up with:

\delta_1d_0+m^2

Which is indeed the Klein-Gordon equation. Depending on from which side you decided to operate on your rows during that last row reduction, you could have also ended up with this:

\begin{pmatrix}m^3\left(\delta_1d_0+m^2\right)+m\delta_1d_0\left(\delta_1d_0+m^2\right)&0&0&0&0\\m\delta_1d_0\left(\delta_1d_0+m^2\right)&-m^2\delta_2\left(\delta_1d_0+m^2\right)&0&0&0\\0&m^2d_0\delta_1&-m\left(\delta_2d_1+d_0\delta_1+m^2\right)\delta_2&0&0\\0&0&-md_1\delta_2&\left(\delta_3d_2+m^2\right)\delta_3&0\\0&0&0&\delta_4d_3&-m\delta_4\end{pmatrix}

This would yield:

m^2\left(\delta_1d_0+m^2\right)+\delta_1d_0\left(\delta_1d_0+m^2\right)=0

, which factors into (trumpets please):

\left(\delta_1d_0+m^2\right)^2=0

So no matter which way you slice it, the Klein-Gordon equation is the characteristic equation of the Dirac equation.