Why Higher Geometric Quantization

The largest single presentation was a pair of talks on “The Motivation for Higher Geometric Quantum Field Theory” by Urs Schreiber, running to about two and a half hours, based on these notes. This was probably the clearest introduction I’ve seen so far to the motivation for the program he’s been developing for several years. Broadly, the idea is to develop a higher-categorical analog of geometric quantization (GQ for short).

One guiding idea behind this is that we should really be interested in quantization over (higher) stacks, rather than merely spaces. This leads inexorably to a higher-categorical version of GQ itself. The starting point, though, is that the defining features of stacks capture two crucial principles from physics: the gauge principle, and locality. The gauge principle means that we need to keep track not just of connections, but gauge transformations, which form respectively the objects and morphisms of a groupoid. “Locality” means that these groupoids of configurations of a physical field on spacetime is determined by its local configuration on regions as small as you like (together with information about how to glue together the data on small regions into larger regions).

Some particularly simple cases can be described globally: a scalar field gives the space of all scalar functions, namely maps into $\mathbb{C}$; sigma models generalise this to the space of maps $\Sigma \rightarrow M$ for some other target space. These are determined by their values pointwise, so of course are local.

More generally, physicists think of a field theory as given by a fibre bundle $V \rightarrow \Sigma$ (the previous examples being described by trivial bundles $\pi : M \times \Sigma \rightarrow \Sigma$), where the fields are sections of the bundle. Lagrangian physics is then described by a form on the jet bundle of $V$, i.e. the bundle whose fibre over $p \in \Sigma$ consists of the space describing the possible first $k$ derivatives of a section over that point.

More generally, a field theory gives a procedure $F$ for taking some space with structure – say a (pseudo-)Riemannian manifold $\Sigma$ – and produce a moduli space $X = F(\Sigma)$ of fields. The Sigma models happen to be representable functors: $F(\Sigma) = Maps(\Sigma,M)$ for some $M$, the representing object. A prestack is just any functor taking $\Sigma$ to a moduli space of fields. A stack is one which has a “descent condition”, which amounts to the condition of locality: knowing values on small neighbourhoods and how to glue them together determines values on larger neighborhoods.

The Yoneda lemma says that, for reasonable notions of “space”, the category $\mathbf{Spc}$ from which we picked target spaces $M$ embeds into the category of stacks over $\mathbf{Spc}$ (Riemannian manifolds, for instance) and that the embedding is faithful – so we should just think of this as a generalization of space. However, it’s a generalization we need, because gauge theories determine non-representable stacks. What’s more, the “space” of sections of one of these fibred stacks is also a stack, and this is what plays the role of the moduli space for gauge theory! For higher gauge theories, we will need higher stacks.

All of the above is the classical situation: the next issue is how to quantize such a theory. It involves a generalization of Geometric Quantization (GQ for short). Now a physicist who actually uses GQ will find this perspective weird, but it flows from just the same logic as the usual method.

In ordinary GQ, you have some classical system described by a phase space, a manifold $X$ equipped with a pre-symplectic 2-form $\omega \in \Omega^2(X)$. Intuitively, $\omega$ describes how the space, locally, can be split into conjugate variables. In the phase space for a particle in $n$-space, these “position” and “momentum” variables, and $\omega = \sum_x dx^i \wedge dp^i$; many other systems have analogous conjugate variables. But what really matters is the form $\omega$ itself, or rather its cohomology class.

Then one wants to build a Hilbert space describing the quantum analog of the system, but in fact, you need a little more than $(X,\omega)$ to do this. The Hilbert space is a space of sections of some bundle whose sections look like copies of the complex numbers, called the “prequantum line bundle“. It needs to be equipped with a connection, whose curvature is a 2-form in the class of $\omega$: in general, . (If $\omega$ is not symplectic, i.e. is degenerate, this implies there’s some symmetry on $X$, in which case the line bundle had better be equivariant so that physically equivalent situations correspond to the same state). The easy case is the trivial bundle, so that we get a space of functions, like $L^2(X)$ (for some measure compatible with $\omega$). In general, though, this function-space picture only makes sense locally in $X$: this is why the choice of prequantum line bundle is important to the interpretation of the quantized theory.

Since the crucial geometric thing here is a bundle over the moduli space, when the space is a stack, and in the context of higher gauge theory, it’s natural to seek analogous constructions using higher bundles. This would involve, instead of a (pre-)symplectic 2-form $\omega$, an $(n+1)$-form called a (pre-)$n$-plectic form (for an introductory look at this, see Chris Rogers’ paper on the case $n=2$ over manifolds). This will give a higher analog of the Hilbert space.

Now, maps between Hilbert spaces in QG come from Lagrangian correspondences – these might be maps of moduli spaces, but in general they consist of a “space of trajectories” equipped with maps into a space of incoming and outgoing configurations. This is a span of pre-symplectic spaces (equipped with pre-quantum line bundles) that satisfies some nice geometric conditions which make it possible to push a section of said line bundle through the correspondence. Since each prequantum line bundle can be seen as maps out of the configuration space into a classifying space (for $U(1)$, or in general an $n$-group of phases), we get a square. The action functional is a cell that fills this square (see the end of 2.1.3 in Urs’ notes). This is a diagrammatic way to describe the usual GQ construction: the advantage is that it can then be repeated in the more general setting without much change.

This much is about as far as Urs got in his talk, but the notes go further, talking about how to extend this to infinity-stacks, and how the Dold-Kan correspondence tells us nicer descriptions of what we get when linearizing – since quantization puts us into an Abelian category.

I enjoyed these talks, although they were long and Urs came out looking pretty exhausted, because while I’ve seen several others on this program, this was the first time I’ve seen it discussed from the beginning, with a lot of motivation. This was presumably because we had a physically-minded part of the audience, whereas I’ve mostly seen these for mathematicians, and usually they come in somewhere in the middle and being more time-limited miss out some of the details and the motivation. The end result made it quite a natural development. Overall, very helpful!

Continuing from the previous post, we’ll take a detour in a different direction. The physics-oriented talks were by Martin Wolf, Sam Palmer, Thomas Strobl, and Patricia Ritter. Since my background in this subject isn’t particularly physics-y, I’ll do my best to summarize the ones that had obvious connections to other topics, but may be getting things wrong or unbalanced here…

Dirac Sigma Models

Thomas Strobl’s talk, “New Methods in Gauge Theory” (based on a whole series of papers linked to from the conference webpage), started with a discussion of of generalizing Sigma Models. Strobl’s talk was a bit high-level physics for me to do it justice, but I came away with the impression of a fairly large program that has several points of contact with more mathematical notions I’ll discuss later.

In particular, Sigma models are physical theories in which a field configuration on spacetime $\Sigma$ is a map $X : \Sigma \rightarrow M$ into some target manifold, or rather $(M,g)$, since we need a metric to integrate and find differentials. Given this, we can define the crucial physics ingredient, an action functional
$S[X] = \int_{\Sigma} g_{ij} dX^i \wedge (\star d X^j)$
where the $dX^i$ are the differentials of the map into $M$.

In string theory, $\Sigma$ is the world-sheet of a string and $M$ is ordinary spacetime. This generalizes the simpler example of a moving particle, where $\Sigma = \mathbb{R}$ is just its worldline. In that case, minimizing the action functional above says that the particle moves along geodesics.

The big generalization introduced is termed a “Dirac Sigma Model” or DSM (the paper that introduces them is this one).

In building up to these DSM, a different generalization notes that if there is a group action $G \rhd M$ that describes “rigid” symmetries of the theory (for Minkowski space we might pick the Poincare group, or perhaps the Lorentz group if we want to fix an origin point), then the action functional on the space $Maps(\Sigma,M)$ is invariant in the direction of any of the symmetries. One can use this to reduce $(M,g)$, by “gauging out” the symmetries to get a quotient $(N,h)$, and get a corresponding $S_{gauged}$ to integrate over $N$.

To generalize this, note that there’s an action groupoid associated with $G \rhd M$, and replace this with some other (Poisson) groupoid instead. That is, one thinks of the real target for a gauge theory not as $M$, but the action groupoid $M \/\!\!\/ G$, and then just considers replacing this with some generic groupoid that doesn’t necessarily arise from a group of rigid symmetries on some underlying $M$. (In this regard, see the second post in this series, about Urs Schreiber’s talk, and stacks as classifying spaces for gauge theories).

The point here seems to be that one wants to get a nice generalization of this situation – in particular, to be able to go backward from $N$ to $M$, to deal with the possibility that the quotient $N$ may be geometrically badly-behaved. Or rather, given $(N,h)$, to find some $(M,g)$ of which it is a reduction, but which is better behaved. That means needing to be able to treat a Sigma model with symmetry information attached.

There’s also an infinitesimal version of this: locally, invariance means the Lie derivative of the action in the direction of any of the generators of the Lie algebra of $G$ – so called Killing vectors – is zero. So this equation can generalize to a case where there are vectors where the Lie derivative is zero – a so-called “generalized Killing equation”. They may not generate isometries, but can be treated similarly. What they do give, if you integrate these vectors, is a foliation of $M$. The space of leaves is the quotient $N$ mentioned above.

The most generic situation Thomas discussed is when one has a Dirac structure on $M$ – this is a certain kind of subbundle $D \subset TM \oplus T^*M$ of the tangent-plus-cotangent bundle over $M$.

Supersymmetric Field Theories

Another couple of physics-y talks related higher gauge theory to some particular physics models, namely $N=(2,0)$ and $N=(1,0)$ supersymmetric field theories.

The first, by Martin Wolf, was called “Self-Dual Higher Gauge Theory”, and was rooted in generalizing some ideas about twistor geometry – here are some lecture notes by the same author, about how twistor geometry relates to ordinary gauge theory.

The idea of twistor geometry is somewhat analogous to the idea of a Fourier transform, which is ultimately that the same space of fields can be described in two different ways. The Fourier transform goes from looking at functions on a position space, to functions on a frequency space, by way of an integral transform. The Penrose-Ward transform, analogously, transforms a space of fields on Minkowski spacetime, satisfying one set of equations, to a set of fields on “twistor space”, satisfying a different set of equations. The theories represented by those fields are then equivalent (as long as the PW transform is an isomorphism).

The PW transform is described by a “correspondence”, or “double fibration” of spaces – what I would term a “span”, such that both maps are fibrations:

$P \stackrel{\pi_1}{\leftarrow} K \stackrel{\pi_2}{\rightarrow} M$

The general story of such correspondences is that one has some geometric data on $P$, which we call $Ob_P$ – a set of functions, differential forms, vector bundles, cohomology classes, etc. They are pulled back to $K$, and then “pushed forward” to $M$ by a direct image functor. In many cases, this is given by an integral along each fibre of the fibration $\pi_2$, so we have an integral transform. The image of $Ob_P$ we call $Ob_M$, and it consists of data satisfying, typically, some PDE’s.In the case of the PW transform, $P$ is complex projective 3-space $\mathbb{P}^3/\mathbb{P}^1$ and $Ob_P$ is the set of holomorphic principal $G$ bundles for some group $G$; $M$ is (complexified) Minkowski space $\mathbb{C}^4$ and the fields are principal $G$-bundles with connection. The PDE they satisfy is $F = \star F$, where $F$ is the curvature of the bundle and $\star$ is the Hodge dual). This means cohomology on twistor space (which classifies the bundles) is related self-dual fields on spacetime. One can also find that a point in $M$ corresponds to a projective line in $P$, while a point in $P$ corresponds to a null plane in $M$. (The space $K = \mathbb{C}^4 \times \mathbb{P}^1$).

Then the issue to to generalize this to higher gauge theory: rather than principal $G$-bundles for a group, one is talking about a 2-group $\mathcal{G}$ with connection. Wolf’s talk explained how there is a Penrose-Ward transform between a certain class of higher gauge theories (on the one hand) and an $N=(2,0)$ supersymmetric field theory (on the other hand). Specifically, taking $M = \mathbb{C}^6$, and $P$ to be (a subspace of) 6D projective space $\mathbb{P}^7 / \mathbb{P}^1$, there is a similar correspondence between certain holomorphic 2-bundles on $P$ and solutions to some self-dual field equations on $M$ (which can be seen as constraints on the curvature 3-form $F$ for a principal 2-bundle: the self-duality condition is why this only makes sense in 6 dimensions).

This picture generalizes to supermanifolds, where there are fermionic as well as bosonic fields. These turn out to correspond to a certain 6-dimensional $N = (2,0)$ supersymmetric field theory.

Then Sam Palmer gave a talk in which he described a somewhat similar picture for an $N = (1,0)$ supersymmetric theory. However, unlike the $N=(2,0)$ theory, this one gives, not a higher gauge theory, but something that superficially looks similar, but in fact is quite different. It ends up being a theory of a number of fields – form valued in three linked vector spaces

$\mathfrak{g}^* \stackrel{g}{\rightarrow} \mathfrak{h} \stackrel{h}{\rightarrow} \mathfrak{g}$

equipped with a bunch of maps that give the whole setup some structure. There is a collection of seven fields in groups (“multiplets”, in physics jargon) valued in each of these spaces. They satisfy a large number of identities. It somewhat resembles the higher gauge theory that corresponds to the $N=(1,0)$ case, so this situation gets called a “$(1,0)$-gauge model”.

There are some special cases of such a setup, including Courant-Dorfman algebras and Lie 2-algebras. The talk gave quite a few examples of solutions to the equations that fall out. The overall conclusion is that, while there are some similarities between $(1,0)$-gauge models and the way Higher Gauge Theory appears at the level of algebra-valued forms and the equations they must satisfy, there are some significant differences. I won’t try to summarize this in more depth, because (a) I didn’t follow the nitty-gritty technical details very well, and (b) it turns out to be not HGT, but some new theory which is less well understood at summary-level.

To continue from the previous post

Twisted Differential Cohomology

Ulrich Bunke gave a talk introducing differential cohomology theories, and Thomas Nikolaus gave one about a twisted version of such theories (unfortunately, perhaps in the wrong order). The idea here is that cohomology can give a classification of field theories, and if we don’t want the theories to be purely topological, we would need to refine this. A cohomology theory is a (contravariant) functorial way of assigning to any space $X$, which we take to be a manifold, a $\mathbb{Z}$-graded group: that is, a tower of groups of “cocycles”, one group for each $n$, with some coboundary maps linking them. (In some cases, the groups are also rings) For example, the group of differential forms, graded by degree.

Cohomology theories satisfy some axioms – for example, the Mayer-Vietoris sequence has to apply whenever you cut a manifold into parts. Differential cohomology relaxes one axiom, the requirement that cohomology be a homotopy invariant of $X$. Given a differential cohomology theory, one can impose equivalence relations on the differential cocycles to get a theory that does satisfy this axiom – so we say the finer theory is a “differential refinement” of the coarser. So, in particular, ordinary cohomology theories are classified by spectra (this is related to the Brown representability theorem), whereas the differential ones are represented by sheaves of spectra – where the constant sheaves represent the cohomology theories which happen to be homotopy invariants.

The “twisting” part of this story can be applied to either an ordinary cohomology theory, or a differential refinement of one (though this needs similarly refined “twisting” data). The idea is that, if $R$ is a cohomology theory, it can be “twisted” over $X$ by a map $\tau: X \rightarrow Pic_R$ into the “Picard group” of $R$. This is the group of invertible $R$-modules (where an $R$-module means a module for the cohomology ring assigned to $X$) – essentially, tensoring with these modules is what defines the “twisting” of a cohomology element.

An example of all this is twisted differential K-theory. Here the groups are of isomorphism classes of certain vector bundles over $X$, and the twisting is particularly simple (the Picard group in the topological case is just $\mathbb{Z}_2$). The main result is that, while topological twists are classified by appropriate gerbes on $X$ (for K-theory, $U(1)$-gerbes), the differential ones are classified by gerbes with connection.

Fusion Categories

Scott Morrison gave a talk about Classifying Fusion Categories, the point of which was just to collect together a bunch of results constructing particular examples. The talk opens with a quote by Rutherford: “All science is either physics or stamp collecting” – that is, either about systematizing data and finding simple principles which explain it, or about collecting lots of data. This talk was unabashed stamp-collecting, on the grounds that we just don’t have a lot of data to systematically understand yet – and for that very reason I won’t try to summarize all the results, but the slides are well worth a look-over. The point is that fusion categories are very useful in constructing TQFT’s, and there are several different constructions that begin “given a fusion category $\mathcal{C}$“… and yet there aren’t all that many examples, and very few large ones, known.

Scott also makes the analogy that fusion categories are “noncommutative finite groups” – which is a little confusing, since not all finite groups are commutative anyway – but the idea is that the symmetric fusion categories are exactly the representation categories of finite groups. So general fusion categories are a non-symmetric generalization of such groups. Since classifying finite groups turned out to be difficult, and involve a laundry-list of sporadic groups, it shouldn’t be too surprising that understanding fusion categories (which, for the symmetric case, include the representation categories of all these examples) should be correspondingly tricky. Since, as he points out, we don’t have very many non-symmetric examples beyond rank 12 (analogous to knowing only finite groups with at most 12 elements), it’s likely that we don’t have a very good understanding of these categories in general yet.

There were a couple of talks – one during the workshop by Sonia Natale, and one the previous week by Sebastian Burciu, whom I also had the chance to talk with that week – about “Equivariantization” of fusion categories, and some fairly detailed descriptions of what results. The two of them have a paper on this which gives more details, which I won’t summarize – but I will say a bit about the construction.

An “equivariantization” of a category $C$ acted on by a group $G$ is supposed to be a generalization of the notion of the set of fixed points for a group acting on a set.  The category $C^G$ has objects which consist of an object $x \in C$ which is fixed by the action of $G$, together with an isomorphism $\mu_g : x \rightarrow x$ for each $g \in G$, satisfying a bunch of unsurprising conditions like being compatible with the group operation. The morphisms are maps in $C$ between the objects, which form commuting squares for each $g \in G$. Their paper, and the talks, described how this works when $C$ is a fusion category – namely, $C^G$ is also a fusion category, and one can work out its fusion rules (i.e. monoidal structure). In some cases, it’s a “group theoretical” fusion category (it looks like $Rep(H)$ for some group $H$) – or a weakened version of such a thing (it’s Morita equivalent to ).

A nice special case of this is if the group action happens to be trivial, so that every object of $C$ is a fixed point. In this case, $C^G$ is just the category of objects of $C$ equipped with a $G$-action, and the intertwining maps between these. For example, if $C = Vect$, then $C^G = Rep(G)$ (in particular, a “group-theoretical fusion category”). What’s more, this construction is functorial in $G$ itself: given a subgroup $H \subset G$, we get an adjoint pair of functors between $C^G$ and $C^H$, which in our special case are just the induced-representation and restricted-representation functors for that subgroup inclusion. That is, we have a Mackey functor here. These generalize, however, to any fusion category $C$, and to nontrivial actions of $G$ on $C$. The point of their paper, then, is to give a good characterization of the categories that come out of these constructions.

Quantizing with Higher Categories

The last talk I’d like to describe was by Urs Schreiber, called Linear Homotopy Type Theory for Quantization. Urs has been giving evolving talks on this topic for some time, and it’s quite a big subject (see the long version of the notes above if there’s any doubt). However, I always try to get a handle on these talks, because it seems to be describing the most general framework that fits the general approach I use in my own work. This particular one borrows a lot from the language of logic (the “linear” in the title alludes to linear logic).

Basically, Urs’ motivation is to describe a good mathematical setting in which to construct field theories using ingredients familiar to the physics approach to “field theory”, namely… fields. (See the description of Kevin Walker’s talk.) Also, Lagrangian functionals – that is, the notion of a physical action. Constructing TQFT from modular tensor categories, for instance, is great, but the fields and the action seem to be hiding in this picture. There are many conceptual problems with field theories – like the mathematical meaning of path integrals, for instance. Part of the approach here is to find a good setting in which to locate the moduli spaces of fields (and the spaces in which path integrals are done). Then, one has to come up with a notion of quantization that makes sense in that context.

The first claim is that the category of such spaces should form a differentially cohesive infinity-topos which we’ll call $\mathbb{H}$. The “infinity” part means we allow morphisms between field configurations of all orders (2-morphisms, 3-morphisms, etc.). The “topos” part means that all sorts of reasonable constructions can be done – for example, pullbacks. The “differentially cohesive” part captures the sort of structure that ensures we can really treat these as spaces of the suitable kind: “cohesive” means that we have a notion of connected components around (it’s implemented by having a bunch of adjoint functors between spaces and points). The “differential” part is meant to allow for the sort of structures discussed above under “differential cohomology” – really, that we can capture geometric structure, as in gauge theories, and not just topological structure.

In this case, we take $\mathbb{H}$ to have objects which are spectral-valued infinity-stacks on manifolds. This may be unfamiliar, but the main point is that it’s a kind of generalization of a space. Now, the sort of situation where quantization makes sense is: we have a space (i.e. $\mathbb{H}$-object) of field configurations to start, then a space of paths (this is WHERE “path-integrals” are defined), and a space of field configurations in the final system where we observe the result. There are maps from the space of paths to identify starting and ending points. That is, we have a span:

$A \leftarrow X \rightarrow B$

Now, in fact, these may all lie over some manifold, such as $B^n(U(1))$, the classifying space for $U(1)$ $(n-1)$-gerbes. That is, we don’t just have these “spaces”, but these spaces equipped with one of those pieces of cohomological twisting data discussed up above. That enters the quantization like an action (it’s WHAT you integrate in a path integral).

Aside: To continue the parallel, quantization is playing the role of a cohomology theory, and the action is the twist. I really need to come back and complete an old post about motives, because there’s a close analogy here. If quantization is a cohomology theory, it should come by factoring through a universal one. In the world of motives, where “space” now means something like “scheme”, the target of this universal cohomology theory is a mild variation on just the category of spans I just alluded to. Then all others come from some functor out of it.

Then the issue is what quantization looks like on this sort of scenario. The Atiyah-Singer viewpoint on TQFT isn’t completely lost here: quantization should be a functor into some monoidal category. This target needs properties which allow it to capture the basic “quantum” phenomena of superposition (i.e. some additivity property), and interference (some actual linearity over $\mathbb{C}$). The target category Urs talked about was the category of $E_{\infty}$-rings. The point is that these are just algebras that live in the world of spectra, which is where our spaces already lived. The appropriate target will depend on exactly what $\mathbb{H}$ is.

But what Urs did do was give a characterization of what the target category should be LIKE for a certain construction to work. It’s a “pull-push” construction: see the link way above on Mackey functors – restriction and induction of representations are an example . It’s what he calls a “(2-monoidal, Beck-Chevalley) Linear Homotopy-Type Theory”. Essentially, this is a list of conditions which ensure that, for the two morphisms in the span above, we have a “pull” operation for some and left and right adjoints to it (which need to be related in a nice way – the jargon here is that we must be in a Wirthmuller context), satisfying some nice relations, and that everything is functorial.

The intuition is that if we have some way of getting a “linear gadget” out of one of our configuration spaces of fields (analogous to constructing a space of functions when we do canonical quantization over, let’s say, a symplectic manifold), then we should be able to lift it (the “pull” operation) to the space of paths. Then the “push” part of the operation is where the “path integral” part comes in: many paths might contribute to the value of a function (or functor, or whatever it may be) at the end-point of those paths, because there are many ways to get from A to B, and all of them contribute in a linear way.

So, if this all seems rather abstract, that’s because the point of it is to characterize very generally what has to be available for the ideas that appear in physics notions of path-integral quantization to make sense. Many of the particulars – spectra, $E_{\infty}$-rings, infinity-stacks, and so on – which showed up in the example are in a sense just placeholders for anything with the right formal properties. So at the same time as it moves into seemingly very abstract terrain, this approach is also supposed to get out of the toy-model realm of TQFT, and really address the trouble in rigorously defining what’s meant by some of the standard practice of physics in field theory by analyzing the logical structure of what this practice is really saying. If it turns out to involve some unexpected math – well, given the underlying issues, it would have been more surprising if it didn’t.

It’s not clear to me how far along this road this program gets us, as far as dealing with questions an actual physicist would like to ask (for the most part, if the standard practice works as an algorithm to produce results, physicists seldom need to ask what it means in rigorous math language), but it does seem like an interesting question.

This entry is a by-special-request blog, which Derek Wise invited me to write for the blog associated with the International Loop Quantum Gravity Seminar, and it will appear over there as well.  The ILQGS is a long-running regular seminar which runs as a teleconference, with people joining in from various countries, on various topics which are more or less closely related to Loop Quantum Gravity and the interests of people who work on it.  The custom is that when someone gives a talk, someone else writes up a description of the talk for the ILQGS blog, and Derek invited me to write up a description of his talk.  The audio file of the talk itself is available in .aiff and .wav formats, and the slides are here.

The talk that Derek gave was based on a project of his and Steffen Gielen’s, which has taken written form in a few papers (two shorter ones, “Spontaneously broken Lorentz symmetry for Hamiltonian gravity“, “Linking Covariant and Canonical General Relativity via Local Observers“, and a new, longer one called “Lifting General Relativity to Observer Space“).

The key idea behind this project is the notion of “observer space”, which is exactly what it sounds like: a space of all observers in a given universe.  This is easiest to picture when one has a spacetime – a manifold with a Lorentzian metric, $(M,g)$ – to begin with.  Then an observer can be specified by choosing a particular point $(x_0,x_1,x_2,x_3) = \mathbf{x}$ in spacetime, as well as a unit future-directed timelike vector $v$.  This vector is a tangent to the observer’s worldline at $\mathbf{x}$.  The observer space is therefore a bundle over $M$, the “future unit tangent bundle”.  However, using the notion of a “Cartan geometry”, one can give a general definition of observer space which makes sense even when there is no underlying $(M,g)$.

The result is a surprising, relatively new physical intuition is that “spacetime” is a local and observer-dependent notion, which in some special cases can be extended so that all observers see the same spacetime.  This is somewhat related to the relativity of locality, which I’ve blogged about previously.  Geometrically, it is similar to the fact that a slicing of spacetime into space and time is not unique, and not respected by the full symmetries of the theory of Relativity, even for flat spacetime (much less for the case of General Relativity).  Similarly, we will see a notion of “observer space”, which can sometimes be turned into a bundle over an objective spacetime $M$, but not in all cases.

So, how is this described mathematically?  In particular, what did I mean up there by saying that spacetime becomes observer-dependent?

Cartan Geometry

The answer uses Cartan geometry, which is a framework for differential geometry that is slightly broader than what is commonly used in physics.  Roughly, one can say “Cartan geometry is to Klein geometry as Riemannian geometry is to Euclidean geometry”.  The more familiar direction of generalization here is the fact that, like Riemannian geometry, Cartan is concerned with manifolds which have local models in terms of simple, “flat” geometries, but which have curvature, and fail to be homogeneous.  First let’s remember how Klein geometry works.

Klein’s Erlangen Program, carried out in the mid-19th-century, systematically brought abstract algebra, and specifically the theory of Lie groups, into geometry, by placing the idea of symmetry in the leading role.  It describes “homogeneous spaces”, which are geometries in which every point is indistinguishable from every other point.  This is expressed by the existence of a transitive action of some Lie group $G$ of all symmetries on an underlying space.  Any given point $x$ will be fixed by some symmetries, and not others, so one also has a subgroup $H = Stab(x) \subset G$.  This is the “stabilizer subgroup”, consisting of all symmetries which fix $x$.  That the space is homogeneous means that for any two points $x,y$, the subgroups $Stab(x)$ and $Stab(y)$ are conjugate (by a symmetry taking $x$ to $y$).  Then the homogeneous space, or Klein geometry, associated to $(G,H)$ is, up to isomorphism, just the same as the quotient space $G/H$ of the obvious action of $H$ on $G$.

The advantage of this program is that it has a great many examples, but the most relevant ones for now are:

• $n$-dimensional Euclidean space. the Euclidean group $ISO(n) = SO(n) \ltimes \mathbb{R}^n$ is precisely the group of transformations that leave the data of Euclidean geometry, lengths and angles, invariant.  It acts transitively on $\mathbb{R}^n$.  Any point will be fixed by the group of rotations centred at that point, which is a subgroup of $ISO(n)$ isomorphic to $SO(n)$.  Klein’s insight is to reverse this: we may define Euclidean space by $R^n \cong ISO(n)/SO(n)$.
• $n$-dimensional Minkowski space.  Similarly, we can define this space to be $ISO(n-1,1)/SO(n-1,1)$.  The Euclidean group has been replaced by the Poincaré group, and rotations by the Lorentz group (of rotations and boosts), but otherwise the situation is essentially the same.
• de Sitter space.  As a Klein geometry, this is the quotient $SO(4,1)/SO(3,1)$.  That is, the stabilizer of any point is the Lorentz group – so things look locally rather similar to Minkowski space around any given point.  But the global symmetries of de Sitter space are different.  Even more, it looks like Minkowski space locally in the sense that the Lie algebras give representations $so(4,1)/so(3,1)$ and $iso(3,1)/so(3,1)$ are identical, seen as representations of $SO(3,1)$.  It’s natural to identify them with the tangent space at a point.  de Sitter space as a whole is easiest to visualize as a 4D hyperboloid in $\mathbb{R}^5$.  This is supposed to be seen as a local model of spacetime in a theory in which there is a cosmological constant that gives empty space a constant negative curvature.
• anti-de Sitter space. This is similar, but now the quotient is $SO(3,2)/SO(3,1)$ – in fact, this whole theory goes through for any of the last three examples: Minkowski; de Sitter; and anti-de Sitter, each of which acts as a “local model” for spacetime in General Relativity with the cosmological constant, respectively: zero; positive; and negative.

Now, what does it mean to say that a Cartan geometry has a local model?  Well, just as a Lorentzian or Riemannian manifold is “locally modelled” by Minkowski or Euclidean space, a Cartan geometry is locally modelled by some Klein geometry.  This is best described in terms of a connection on a principal $G$-bundle, and the associated $G/H$-bundle, over some manifold $M$.  The crucial bundle in a Riemannian or Lorenztian geometry is the frame bundle: the fibre over each point consists of all the ways to isometrically embed a standard Euclidean or Minkowski space into the tangent space.  A connection on this bundle specifies how this embedding should transform as one moves along a path.  It’s determined by a 1-form on $M$, valued in the Lie algebra of $G$.

Given a parametrized path, one can apply this form to the tangent vector at each point, and get a Lie algebra-valued answer.  Integrating along the path, we get a path in the Lie group $G$ (which is independent of the parametrization).  This is called a “development” of the path, and by applying the $G$-values to the model space $G/H$, we see that the connection tells us how to move through a copy of $G/H$ as we move along the path.  The image this suggests is of “rolling without slipping” – think of the case where the model space is a sphere.  The connection describes how the model space “rolls” over the surface of the manifold $M$.  Curvature of the connection measures the failure to commute of the processes of rolling in two different directions.  A connection with zero curvature describes a space which (locally at least) looks exactly like the model space: picture a sphere rolling against its mirror image.  Transporting the sphere-shaped fibre around any closed curve always brings it back to its starting position. Now, curvature is defined in terms of transports of these Klein-geometry fibres.  If curvature is measured by the development of curves, we can think of each homogeneous space as a flat Cartan geometry with itself as a local model.

This idea, that the curvature of a manifold depends on the model geometry being used to measure it, shows up in the way we apply this geometry to physics.

Gravity and Cartan Geometry

MacDowell-Mansouri gravity can be understood as a theory in which General Relativity is modelled by a Cartan geometry.  Of course, a standard way of presenting GR is in terms of the geometry of a Lorentzian manifold.  In the Palatini formalism, the basic fields are a connection $A$ and a vierbein (coframe field) called $e$, with dynamics encoded in the Palatini action, which is the integral over $M$ of $R[\omega] \wedge e \wedge e$, where $R$ is the curvature 2-form for $\omega$.

This can be derived from a Cartan geometry, whose model geometry is de Sitter space $SO(4,1)/SO(3,1)$.   Then MacDowell-Mansouri gravity gets $\omega$ and $e$ by splitting the Lie algebra as $so(4,1) = so(3,1) \oplus \mathbb{R^4}$.  This “breaks the full symmetry” at each point.  Then one has a fairly natural action on the $so(4,1)$-connection:

$\int_M tr(F_h \wedge \star F_h)$

Here, $F_h$ is the $so(3,1)$ part of the curvature of the big connection.  The splitting of the connection means that $F_h = R + e \wedge e$, and the action above is rewritten, up to a normalization, as the Palatini action for General Relativity (plus a topological term, which has no effect on the equations of motion we get from the action).  So General Relativity can be written as the theory of a Cartan geometry modelled on de Sitter space.

The cosmological constant in GR shows up because a “flat” connection for a Cartan geometry based on de Sitter space will look (if measured by Minkowski space) as if it has constant curvature which is exactly that of the model Klein geometry.  The way to think of this is to take the fibre bundle of homogeneous model spaces as a replacement for the tangent bundle to the manifold.  The fibre at each point describes the local appearance of spacetime.  If empty spacetime is flat, this local model is Minkowski space, $ISO(3,1)/SO(3,1)$, and one can really speak of tangent “vectors”.  The tangent homogeneous space is not linear.  In these first cases, the fibres are not vector spaces, precisely because the large group of symmetries doesn’t contain a group of translations, but they are Klein geometries constructed in just the same way as Minkowski space. Thus, the local description of the connection in terms of $Lie(G)$-valued forms can be treated in the same way, regardless of which Klein geometry $G/H$ occurs in the fibres.  In particular, General Relativity, formulated in terms of Cartan geometry, always says that, in the absence of matter, the geometry of space is flat, and the cosmological constant is included naturally by the choice of which Klein geometry is the local model of spacetime.

Observer Space

The idea in defining an observer space is to combine two symmetry reductions into one.  The reduction from $SO(4,1)$ to $SO(3,1)$ gives de Sitter space, $SO(4,1)/SO(3,1)$ as a model Klein geometry, which reflects the “symmetry breaking” that happens when choosing one particular point in spacetime, or event.  Then, the reduction of $SO(3,1)$ to $SO(3)$ similarly reflects the symmetry breaking that occurs when one chooses a specific time direction (a future-directed unit timelike vector).  These are the tangent vectors to the worldline of an observer at the chosen point, so $SO(3,1)/SO(3)$ the model Klein geometry, is the space of such possible observers.  The stabilizer subgroup for a point in this space consists of just the rotations of space around the corresponding observer – the boosts in $SO(3,1)$ translate between observers.  So locally, choosing an observer amounts to a splitting of the model spacetime at the point into a product of space and time. If we combine both reductions at once, we get the 7-dimensional Klein geometry $SO(4,1)/SO(3)$.  This is just the future unit tangent bundle of de Sitter space, which we think of as a homogeneous model for the “space of observers”

A general observer space $O$, however, is just a Cartan geometry modelled on $SO(4,1)/SO(3)$.  This is a 7-dimensional manifold, equipped with the structure of a Cartan geometry.  One class of examples are exactly the future unit tangent bundles to 4-dimensional Lorentzian spacetimes.  In these cases, observer space is naturally a contact manifold: that is, it’s an odd-dimensional manifold equipped with a 1-form $\alpha$, the contact form, which is such that the top-dimensional form $\alpha \wedge d \alpha \wedge \dots \wedge d \alpha$ is nowhere zero.  This is the odd-dimensional analog of a symplectic manifold.  Contact manifolds are, intuitively, configuration spaces of systems which involve “rolling without slipping” – for instance, a sphere rolling on a plane.  In this case, it’s better to think of the local space of observers which “rolls without slipping” on a spacetime manifold $M$.

Now, Minkowski space has a slicing into space and time – in fact, one for each observer, who defines the time direction, but the time coordinate does not transform in any meaningful way under the symmetries of the theory, and different observers will choose different ones.  In just the same way, the homogeneous model of observer space can naturally be written as a bundle $SO(4,1)/SO(3) \rightarrow SO(4,1)/SO(3,1)$.  But a general observer space $O$ may or may not be a bundle over an ordinary spacetime manifold, $O \rightarrow M$.  Every Cartan geometry $M$ gives rise to an observer space $O$ as the bundle of future-directed timelike vectors, but not every Cartan geometry $O$ is of this form, in any natural way. Indeed, without a further condition, we can’t even reconstruct observer space as such a bundle in an open neighborhood of a given observer.

This may be intuitively surprising: it gives a perfectly concrete geometric model in which “spacetime” is relative and observer-dependent, and perhaps only locally meaningful, in just the same way as the distinction between “space” and “time” in General Relativity. It may be impossible, that is, to determine objectively whether two observers are located at the same base event or not. This is a kind of “Relativity of Locality” which is geometrically much like the by-now more familiar Relativity of Simultaneity. Each observer will reach certain conclusions as to which observers share the same base event, but different observers may not agree.  The coincident observers according to a given observer are those reached by a good class of geodesics in $O$ moving only in directions that observer sees as boosts.

When one can reconstruct $O \rightarrow M$, two observers will agree whether or not they are coincident.  This extra condition which makes this possible is an integrability constraint on the action of the Lie algebra $H$ (in our main example, $H = SO(3,1)$) on the observer space $O$.  In this case, the fibres of the bundle are the orbits of this action, and we have the familiar world of Relativity, where simultaneity may be relative, but locality is absolute.

Lifting Gravity to Observer Space

Apart from describing this model of relative spacetime, another motivation for describing observer space is that one can formulate canonical (Hamiltonian) GR locally near each point in such an observer space.  The goal is to make a link between covariant and canonical quantization of gravity.  Covariant quantization treats the geometry of spacetime all at once, by means of a Lagrangian action functional.  This is mathematically appealing, since it respects the symmetry of General Relativity, namely its diffeomorphism-invariance.  On the other hand, it is remote from the canonical (Hamiltonian) approach to quantization of physical systems, in which the concept of time is fundamental. In the canonical approach, one gets a Hilbert space by quantizing the space of states of a system at a given point in time, and the Hamiltonian for the theory describes its evolution.  This is problematic for diffeomorphism-, or even Lorentz-invariance, since coordinate time depends on a choice of observer.  The point of observer space is that we consider all these choices at once.  Describing GR in $O$ is both covariant, and based on (local) choices of time direction.

This is easiest to describe in the case of a bundle $O \rightarrow M$.  Then a “field of observers” to be a section of the bundle: a choice, at each base event in $M$, of an observer based at that event.  A field of observers may or may not correspond to a particular decomposition of spacetime into space evolving in time, but locally, at each point in $O$, it always looks like one.  The resulting theory describes the dynamics of space-geometry over time, as seen locally by a given observer.  In this case, a Cartan connection on observer space is described by to a $Lie(SO(4,1))$-valued form.  This decomposes into four Lie-algebra valued forms, interpreted as infinitesimal transformations of the model observer by: (1) spatial rotations; (2) boosts; (3) spatial translations; (4) time translation.  The four-fold division is based on two distinctions: first, between the base event at which the observer lives, and the choice of observer (i.e. the reduction of $SO(4,1)$ to $SO(3,1)$, which symmetry breaking entails choosing a point); and second, between space and time (i.e. the reduction of $SO(3,1)$ to $SO(3)$, which symmetry breaking entails choosing a time direction).

This splitting, along the same lines as the one in MacDowell-Mansouri gravity described above, suggests that one could lift GR to a theory on an observer space $O$.  This amount to describing fields on $O$ and an action functional, so that the splitting of the fields gives back the usual fields of GR on spacetime, and the action gives back the usual action.  This part of the project is still under development, but this lifting has been described.  In the case when there is no “objective” spacetime, the result includes some surprising new fields which it’s not clear how to deal with, but when there is an objective spacetime, the resulting theory looks just like GR.

Well, as promised in the previous post, I’d like to give a summary of some of what was discussed at the conference I attended (quite a while ago now, late last year) in Erlangen, Germany.  I was there also to visit Derek Wise, talking about a project we’ve been working on for some time.

(I’ve also significantly revised this paper about Extended TQFT since then, and it now includes some stuff which was the basis of my talk at Erlangen on cohomological twisting of the category $Span(Gpd)$.  I’ll get to that in the next post.  Also coming up, I’ll be describing some new things I’ve given some talks about recently which relate the Baez-Dolan groupoidification program to Khovanov-Lauda categorification of algebras – at least in one example, hopefully in a way which will generalize nicely.)

In the meantime, there were a few themes at the conference which bear on the Extended TQFT project in various ways, so in this post I’ll describe some of them.  (This isn’t an exhaustive description of all the talks: just of a selection of illustrative ones.)

Categories with Structures

A few talks were mainly about facts regarding the sorts of categories which get used in field theory contexts.  One important type, for instance, are fusion categories is a monoidal category which is enriched in vector spaces, generated by simple objects, and some other properties: essentially, monoidal 2-vector spaces.  The basic example would be categories of representations (of groups, quantum groups, algebras, etc.), but fusion categories are an abstraction of (some of) their properties.  Many of the standard properties are described and proved in this paper by Etingof, Nikshych, and Ostrik, which also poses one of the basic conjectures, the “ENO Conjecture”, which was referred to repeatedly in various talks.  This is the guess that every fusion category can be given a “pivotal” structure: an isomorphism from $Id$ to $**$.  It generalizes the theorem that there’s always such an isomorphism into $****$.  More on this below.

Hendryk Pfeiffer talked about a combinatorial way to classify fusion categories in terms of certain graphs (see this paper here).  One way I understand this idea is to ask how much this sort of category really does generalize categories of representations, or actually comodules.  One starting point for this is the theorem that there’s a pair of functors between certain monoidal categories and weak Hopf algebras.  Specifically, the monoidal categories are $(Cat \downarrow Vect)^{\otimes}$, which consists of monoidal categories equipped with a forgetful functor into $Vect$.  Then from this one can get (via a coend), a weak Hopf algebra over the base field $k$(in the category $WHA_k$).  From a weak Hopf algebra $H$, one can get back such a category by taking all the modules of $H$.  These two processes form an adjunction: they’re not inverses, but we have maps between the two composites and the identity functors.

The new result Hendryk gave is that if we restrict our categories over $Vect$ to be abelian, and the functors between them to be linear, faithful, and exact (that is, roughly, that we’re talking about concrete monoidal 2-vector spaces), then this adjunction is actually an equivalence: so essentially, all such categories $C$ may as well be module categories for weak Hopf algebras.  Then he gave a characterization of these in terms of the “dimension graph” (in fact a quiver) for $(C,M)$, where $M$ is one of the monoidal generators of $C$.  The vertices of $\mathcal{G} = \mathcal{G}_{(C,M)}$ are labelled by the irreducible representations $v_i$ (i.e. set of generators of the category), and there’s a set of edges $j \rightarrow l$ labelled by a basis of $Hom(v_j, v_l \otimes M)$.  Then one can carry on and build a big graded algebra $H[\mathcal{G}]$ whose $m$-graded part consists of length-$m$ paths in $\mathcal{G}$.  Then the point is that the weak Hopf algebra of which $C$ is (up to isomorphism) the module category will be a certain quotient of $H[\mathcal{G}]$ (after imposing some natural relations in a systematic way).

The point, then, is that the sort of categories mostly used in this area can be taken to be representation categories, but in general only of these weak Hopf algebras: groups and ordinary algebras are special cases, but they show up naturally for certain kinds of field theory.

Tensor Categories and Field Theories

There were several talks about the relationship between tensor categories of various sorts and particular field theories.  The idea is that local field theories can be broken down in terms of some kind of n-category: $n$-dimensional regions get labelled by categories, $(n-1)$-D boundaries between regions, or “defects”, are labelled by functors between the categories (with the idea that this shows how two different kinds of field can couple together at the defect), and so on (I think the highest-dimension that was discussed explicitly involved 3-categories, so one has junctions between defects, and junctions between junctions, which get assigned some higher-morphism data).  Alteratively, there’s the dual picture where categories are assigned to points, functors to 1-manifolds, and so on.  (This is just Poincaré duality in the case where the manifolds come with a decomposition into cells, which they often are if only for convenience).

Victor Ostrik gave a pair of talks giving an overview role tensor categories play in conformal field theory.  There’s too much material here to easily summarize, but the basics go like this: CFTs are field theories defined on cobordisms that have some conformal structure (i.e. notion of angles, but not distance), and on the algebraic side they are associated with vertex algebras (some useful discussion appears on mathoverflow, but in this context they can be understood as vector spaces equipped with exactly the algebraic operations needed to model cobordisms with some local holomorphic structure).

In particular, the irreducible representations of these VOA’s determine the “conformal blocks” of the theory, which tell us about possible correlations between observables (self-adjoint operators).  A VOA $V$ is “rational” if the category $Rep(V)$ is semisimple (i.e. generated as finite direct sums of these conformal blocks).  For good VOA’s, $Rep(V)$ will be a modular tensor category (MTC), which is a fusion category with a duality, braiding, and some other strucutre (see this for more).   So describing these gives us a lot of information about what CFT’s are possible.

The full data of a rational CFT are given by a vertex algebra, and a module category $M$: that is, a fusion category is a sort of categorified ring, so it can act on $M$ as an ring acts on a module.  It turns out that choosing an $M$ is equivalent to finding a certain algebra (i.e. algebra object) $\mathcal{L}$, a “Lagrangian algebra” inside the centre of $Rep(V)$.  The Drinfel’d centre $Z(C)$ of a monoidal category $C$ is a sort of free way to turn a monoidal category into a braided one: but concretely in this case it just looks like $Rep(V) \otimes Rep(V)^{\ast}$.  Knowing the isomorphism class $\mathcal{L}$ determines a “modular invariant”.  It gets “physics” meaning from how it’s equipped with an algebra structure (which can happen in more than one way), but in any case $\mathcal{L}$ has an underlying vector space, which becomes the Hilbert space of states for the conformal field theory, which the VOA acts on in the natural way.

Now, that was all conformal field theory.  Christopher Douglas described some work with Chris Schommer-Pries and Noah Snyder about fusion categories and structured topological field theories.  These are functors out of cobordism categories, the most important of which are $n$-categories, where the objects are points, morphisms are 1D cobordisms, and so on up to $n$-morphisms which are $n$-dimensional cobordisms.  To keep things under control, Chris Douglas talked about the case $Bord_0^3$, which is where $n=3$, and a “local” field theory is a 3-functor $Bord_0^3 \rightarrow \mathcal{C}$ for some 3-category $\mathcal{C}$.  Now, the (Baez-Dolan) Cobordism Hypothesis, which was proved by Jacob Lurie, says that $Bord_0^3$ is, in a suitable sense, the free symmetric monoidal 3-category with duals.  What this amounts to is that a local field theory whose target 3-category is $\mathcal{C}$ is “just” a dualizable object of $\mathcal{C}$.

The handy example which links this up to the above is when $\mathcal{C}$ has objects which are tensor categories, morphisms which are bimodule categories (i.e. categories acted), 2-morphisms which are functors, and 3-morphisms which are natural transformations.  Then the issue is to classify what kind of tensor categories these objects can be.

The story is trickier if we’re talking about, not just topological cobordisms, but ones equipped with some kind of structure regulated by a structure group $G$(for instance, orientation by $G=SO(n)$, spin structure by its universal cover $G= Spin(n)$, and so on).  This means the cobordisms come equipped with a map into $BG$.  They take $O(n)$ as the starting point, and then consider groups $G$ with a map to $O(n)$, and require that the map into $BG$ is a lift of the map to $BO(n)$.  Then one gets that a structured local field theory amounts to a dualizable objects of $\mathcal{C}$ with a homotopy-fixed point for some $G$-action – and this describes what gets assigned to the point by such a field theory.  What they then show is a correspondence between $G$ and classes of categories.  For instance, fusion categories are what one gets by imposing that the cobordisms be oriented.

Liang Kong talked about “Topological Orders and Tensor Categories”, which used the Levin-Wen models, from condensed matter phyiscs.  (Benjamin Balsam also gave a nice talk describing these models and showing how they’re equivalent to the Turaev-Viro and Kitaev models in appropriate cases.  Ingo Runkel gave a related talk about topological field theories with “domain walls”.).  Here, the idea of a “defect” (and topological order) can be understood very graphically: we imagine a 2-dimensional crystal lattice (of atoms, say), and the defect is a 1-dimensional place where the two lattices join together, with the internal symmetry of each breaking down at the boundary.  (For example, a square lattice glued where the edges on one side are offset and meet the squares on the other side in the middle of a face, as you typically see in a row of bricks – the slides linked above have some pictures).  The Levin-Wen models are built using a hexagonal lattice, starting with a tensor category with several properties: spherical (there are dualities satisfying some relations), fusion, and unitary: in fact, historically, these defining properties were rediscovered independently here as the requirement for there to be excitations on the boundary which satisfy physically-inspired consistency conditions.

These abstract the properties of a category of representations.  A generalization of this to “topological orders” in 3D or higher is an extended TFT in the sense mentioned just above: they have a target 3-category of tensor categories, bimodule categories, functors and natural transformations.  The tensor categories (say, $\mathcal{C}$, $\mathcal{D}$, etc.) get assigned to the bulk regions; to “domain walls” between different regions, namely defects between lattices, we assign bimodule categories (but, for instance, to a line within a region, we get $\mathcal{C}$ understood as a $\mathcal{C}-\mathcal{C}$-bimodule); then to codimension 2 and 3 defects we attach functors and natural transformations.  The algebra for how these combine expresses the ways these topological defects can go together.  On a lattice, this is an abstraction of a spin network model, where typically we have just one tensor category $\mathcal{C}$ applied to the whole bulk, namely the representations of a Lie group (say, a unitary group).  Then we do calculations by breaking down into bases: on codimension-1 faces, these are simple objects of $\mathcal{C}$; to vertices we assign a Hom space (and label by a basis for intertwiners in the special case); and so on.

Thomas Nickolaus spoke about the same kind of $G$-equivariant Dijkgraaf-Witten models as at our workshop in Lisbon, so I’ll refer you back to my earlier post on that.  However, speaking of equivariance and group actions:

Michael Müger  spoke about “Orbifolds of Rational CFT’s and Braided Crossed $G$-Categories” (see this paper for details).  This starts with that correspondence between rational CFT’s (strictly, rational chiral CFT’s) and modular categories $Rep(F)$.  (He takes $F$ to be the name of the CFT).  Then we consider what happens if some finite group $G$ acts on $F$ (if we understand $F$ as a functor, this is an action by natural transformations; if as an algebra, then ).  This produces an “orbifold theory” $F^G$ (just like a finite group action on a manifold produces an orbifold), which is the “$G$-fixed subtheory” of $F$, by taking $G$-fixed points for every object, and is also a rational CFT.  But that means it corresponds to some other modular category $Rep(F^G)$, so one would like to know what category this is.

A natural guess might be that it’s $Rep(F)^G$, where $C^G$ is a “weak fixed-point” category that comes from a weak group action on a category $C$.  Objects of $C^G$ are pairs $(c,f_g)$ where $c \in C$ and $f_g : g(c) \rightarrow c$ is a specified isomorphism.  (This is a weak analog of $S^G$, the set of fixed points for a group acting on a set).  But this guess is wrong – indeed, it turns out these categories have the wrong dimension (which is defined because the modular category has a trace, which we can sum over generating objects).  Instead, the right answer, denoted by $Rep(F^G) = G-Rep(F)^G$, is the $G$-fixed part of some other category.  It’s a braided crossed $G$-category: one with a grading by $G$, and a $G$-action that gets along with it.  The identity-graded part of $Rep(F^G)$ is just the original $Rep(F)$.

State Sum Models

This ties in somewhat with at least some of the models in the previous section.  Some of these were somewhat introductory, since many of the people at the conference were coming from a different background.  So, for instance, to begin the workshop, John Barrett gave a talk about categories and quantum gravity, which started by outlining the historical background, and the development of state-sum models.  He gave a second talk where he began to relate this to diagrams in Gray-categories (something he also talked about here in Lisbon in February, which I wrote about then).  He finished up with some discussion of spherical categories (and in particular the fact that there is a Gray-category of spherical categories, with a bunch of duals in the suitable sense).  This relates back to the kind of structures Chris Douglas spoke about (described above, but chronologically right after John).  Likewise, Winston Fairbairn gave a talk about state sum models in 3D quantum gravity – the Ponzano Regge model and Turaev-Viro model being the focal point, describing how these work and how they’re constructed.  Part of the point is that one would like to see that these fit into the sort of framework described in the section above, which for PR and TV models makes sense, but for the fancier state-sum models in higher dimensions, this becomes more complicated.

Higher Gauge Theory

There wasn’t as much on this topic as at our own workshop in Lisbon (though I have more remarks on higher gauge theory in one post about it), but there were a few entries.  Roger Picken talked about some work with Joao Martins about a cubical formalism for parallel transport based on crossed modules, which consist of a group $G$ and abelian group $H$, with a map $\partial : H \rightarrow G$ and an action of $G$ on $H$ satisfying some axioms.  They can represent categorical groups, namely group objects in $Cat$ (equivalently, categories internal to $Grp$), and are “higher” analogs of groups with a set of elements.  Roger’s talk was about how to understand holonomies and parallel transports in this context.  So, a “connection” lets on transport things with $G$-symmetries along paths, and with $H$-symmetries along surfaces.  It’s natural to describe this with squares whose edges are labelled by $G$-elements, and faces labelled by $H$-elements (which are the holonomies).  Then the “cubical approach” means that we can describe gauge transformations, and higher gauge transformations (which in one sense are the point of higher gauge theory) in just the same way: a gauge transformation which assigns $H$-values to edges and $G$-values to vertices can be drawn via the holonomies of a connection on a cube which extends the original square into 3D (so the edges become squares, and so get $H$-values, and so on).  The higher gauge transformations work in a similar way.  This cubical picture gives a good way to understand the algebra of how gauge transformations etc. work: so for instance, gauge transformations look like “conjugation” of a square by four other squares – namely, relating the front and back faces of a cube by means of the remaining faces.  Higher gauge transformations can be described by means of a 4D hypercube in an analogous way, and their algebraic properties have to do with the 2D faces of the hypercube.

Derek Wise gave a short talk outlining his recent paper with John Baez in which they show that it’s possible to construct a higher gauge theory based on the Poincare 2-group which turns out to have fields, and dynamics, which are equivalent to teleparallel gravity, a slightly unusal theory which nevertheless looks in practice just like General Relativity.  I discussed this in a previous post.

So next time I’ll talk about the new additions to my paper on ETQFT which were the basis of my talk, which illustrates a few of the themes above.

So apparently the “Integral” gamma-ray observatory has put some pretty strong limits on predictions of a “grain size” for spacetime, like in Loop Quantum Gravity, or other theories predicting similar violations of Lorentz invariants which would be detectable in higher- and lower-energy photons coming from distant sources.  (Original paper.)  I didn’t actually hear much about such predictions when I was the conference “Quantum Theory and Gravitation” last month in Zurich, though partly that was because it was focused on bringing together people from a variety of different approaches , so the Loop QG and String Theory camps were smaller than at some other conferences on the same subject.  It was a pretty interesting conference, however (many of the slides and such material can be found here).  As one of the organizers, Jürg Fröhlich, observed in his concluding remarks, it gave grounds for optimism about physics, in that it was clear that we’re nowhere near understanding everything about the universe.  Which seems like a good attitude to have to the situation – and it informs good questions: he asked questions in many of the talks that went right to the heart of the most problematic things about each approach.

Often after attending a conference like that, I’d take the time to do a blog about all the talks – which I was tempted to do, but I’ve been busy with things I missed while I was away, and now it’s been quite a while.  I will probably come back at some point and think about the subject of conformal nets, because there were some interesting talks by Andre Henriques at one workshop I was at, and another by Roberto Longo at this one, which together got me interested in this subject.  But that’s not what I’m going to write about this time.

This time, I want to talk about a different kind of topic.  Talking  in Zurich with various people – John Barrett, John Baez, Laurent Freidel, Derek Wise, and some others, on and off – we kept coming back to kept coming back to various seemingly strange algebraic structures.  One such structure is a “loop“, also known (maybe less confusingly) as a “quasigroup” (in fact, a loop is a quasigroup with a unit).  This was especially confusing, because we were talking about these gadgets in the context of gauge theory, where you might want to think about assigning an element of one as the holonomy around a LOOP in spacetime.  Limitations of the written medium being what they are, I’ll just avoid the problem and say “quasigroup” henceforth, although actually I tend to use “loop” when I’m speaking.

The axioms for a quasigroup look just like the axioms for a group, except that the axiom of associativity is missing.  That is, it’s a set with a “multiplication” operation, and each element $x$ has a left and a right inverse, called $x^{\lambda}$ and $x^{\rho}$.  (I’m also assuming the quasigroup is unital from here on in).  Of course, in a group (which is a special kind of quasigroup where associativity holds), you can use associativity to prove that $x^{\lambda} = x^{\rho}$, but we don’t assume it’s true in a quasigroup.  Of course, you can consider the special case where it IS true: this is a “quasigroup with two-sided inverse”, which is a weaker assumption than associativity.

In fact, this is an example of a kind of question one often asks about quasigroups: what are some extra properties we can suppose which, if they hold for a quasigroup $Q$, make life easier?  Associativity is a strong condition to ask, and gives the special case of a group, which is a pretty well-understood area.  So mostly one looks for something weaker than associativity.  Probably the most well-known, among people who know about such things, is the Moufang axiom, named after Ruth Moufang, who did a lot of the pioneering work studying quasigroups.

There are several equivalent ways to state the Moufang axiom, but a nice one is:

$y(x(yz)) = ((yx)y)z$

Which you could derive from the associative law if you had it, but which doesn’t imply associativity.   With associators, one can go from a fully-right-bracketed to a fully-left-bracketed product of four things: $w(x(yz)) \rightarrow (wx)(yz) \rightarrow ((wx)y)z$.  There’s no associator here (a quasigroup is a set, not a category – though categorifying this stuff may be a nice thing to try), but the Moufang axiom says this is an equation when $w=y$.  One might think of the stronger condition that says it’s true for all $(w,x,y,z)$, but the Moufang axiom turns out to be the more handy one.

One way this is so is found in the division algebras.  A division algebra is a (say, real) vector space with a multiplication for which there’s an identity and a notion of division – that is, an inverse for nonzero elements.  We can generalize this enough that we allow different left and right inverses, but in any case, even if we relax this (and the assumption of associativity), it’s a well-known theorem that there are still only four finite dimensional ones.  Namely, they are $\mathbb{R}$, $\mathbb{C}$, $\mathbb{H}$, and $\mathbb{O}$: the real numbers, complex numbers, quaternions, and octonions, with real dimensions 1, 2, 4, and 8 respectively.

So the pattern goes like this.  The first two, $\mathbb{R}$ and $\mathbb{C}$, are commutative and associative.  The quaternions $\mathbb{H}$ are noncommutative, but still associative.  The octonions $\mathbb{O}$ are neither commutative nor associative.  They also don’t satisfy that stronger axiom $w(x(yz)) = ((wx)y)z$.  However, the octonions do satisfy the Moufang axiom.  In each case, you can get a quasigroup by taking the nonzero elements – or, using the fact that there’s a norm around in the usual way of presenting these algebras, the elements of unit norm.  The unit quaternions, in fact, form a group – specifically, the group $SU(2)$.  The unit reals and complexes form abelian groups (respectively, $\mathbb{Z}_2$, and $U(1)$).  These groups all have familiar names.  The quasigroup of unit octonions doesn’t have any other more familiar name.  If you believe in the fundamental importance of this sequence of four division algebras, though, it does suggest that a natural sequence in which to weaken axioms for “multiplication” goes: commutative-and-associative, associative, Moufang.

The Moufang axiom does imply some other commonly suggested weakenings of associativity, as well.  For instance, a quasigroup that satisfies the Moufang axiom must also be alternative (a restricted form of associativity when two copies of one element are next to each other: i.e. the left alternative law $x(xy) = (xx)y$, and right alternative law $x(yy) = (xy)y$).

Now, there are various ways one could go with this; the one I’ll pick is toward physics.  The first three entries in that sequence of four division algebras – and the corresponding groups – all show up all over the place in physics.  $\mathbb{Z}_2$ is the simplest nontrivial group, so this could hardly fail to be true, but at any rate, it appears as, for instance, the symmetry group of the set of orientations on a manifold, or the grading in supersymmetry (hence plays a role distinguishing bosons and fermions), and so on.  $U(1)$ is, among any number of other things, the group in which action functionals take their values in Lagrangian quantum mechanics; in the Hamiltonian setup, it’s the group of phases that characterizes how wave functions evolve in time.  Then there’s $SU(2)$, which is the (double cover of the) group of rotations of 3-space; as a consequence, its representation theory classifies the “spins”, or angular momenta, that a quantum particle can have.

What about the octonions – or indeed the quasigroup of unit octonions?  This is a little less clear, but I will mention this: John Baez has been interested in octonions for a long time, and in Zurich, gave a talk about what kind of role they might play in physics.  This is supposed to partially explain what’s going on with the “special dimensions” that appear in string theory – these occur where the dimension of a division algebra (and a Clifford algebra that’s associated to it) is the same as the codimension of a string worldsheet.  J.B.’s student, John Huerta, has also been interested in this stuff, and spoke about it here in Lisbon in February – it’s the subject of his thesis, and a couple of papers they’ve written.  The role of the octonions here is not nearly so well understood as elsewhere, and of course whether this stuff is actually physics, or just some interesting math that resembles it, is open to experiment – unlike those other examples, which are definitely physics if anything is!

So at this point, the foregoing sets us up to wonder about two questions.  First: are there any quasigroups that are actually of some intrinsic interest which don’t satisfy the Moufang axiom?  (This might be the next step in that sequence of successively weaker axioms).  Second: are there quasigroups that appear in genuine, experimentally tested physics?  (Supposing you don’t happen to like the example from string theory).

Well, the answer is yes on both counts, with one common example – a non-Moufang quasigroup which is of interest precisely because it has a direct physical interpretation.  This example is the composition of velocities in Special Relativity, and was pointed out to me by Derek Wise as a nice physics-based example of nonassociativity.  That it’s also non-Moufang is also true, and not too surprising once you start trying to check it by a direct calculation: in each case, the reason is that the interpretation of composition is very non-symmetric.  So how does this work?

Well, if we take units where the speed of light is 1, then Special Relativity tells us that relative velocities of two observers are vectors in the interior of $B_1(0) \subset \mathbb{R}^3$.  That is, they’re 3-vectors with length less than 1, since the magnitude of the relative velocity must be less than the speed of light.  In any elementary course on Relativity, you’d learn how to compose these velocities, using the “gamma factor” that describes such things as time-dilation.  This can be derived from first principles, nor is it too complicated, but in any case the end result is a new “addition” for vectors:

$\mathbf{v} \oplus_E \mathbf{u} = \frac{ \mathbf{v} + \mathbf{u}_{\parallel} + \alpha_{\mathbf{v}} \mathbf{u}_{\perp}}{1 + \mathbf{v} \cdot \mathbf{u}}$

where $\alpha_{\mathbf{v}} = \sqrt{1 - \mathbf{v} \cdot \mathbf{v}}$  is the reciprocal of the aforementioned “gamma” factor.  The vectors $\mathbf{u}_{\parallel}$ and $\mathbf{u}_{\perp}$ are the components of the vector $\mathbf{u}$ which are parallel to, and perpendicular to, $\mathbf{v}$, respectively.

The way this is interpreted is: if $\mathbf{v}$ is the velocity of observer B as measured by observer A, and $\mathbb{u}$ is the velocity of observer C as measured by observer B, then $\mathbf{v} \oplus_E \mathbf{u}$ is the velocity of observer C as measured by observer A.

Clearly, there’s an asymmetry in how $\mathbf{v}$ and $\mathbf{u}$ are treated: the first vector, $\mathbf{v}$, is a velocity as seen by the same observer who sees the velocity in the final answer.  The second, $\mathbf{u}$, is a velocity as seen by an observer who’s vanished by the time we have $\mathbf{v} \oplus_e \mathbf{u}$ in hand.  Just looking at the formular, you can see this is an asymmetric operation that distinguishes the left and right inputs.  So the fact (slightly onerous, but not conceptually hard, to check) that it’s noncommutative, and indeed nonassociative, and even non-Moufang, shouldn’t come as a big shock.

The fact that it makes $B_1(0)$ into a quasigroup is a little less obvious, unless you’ve actually worked through the derivation – but from physical principles, $B_1(0)$ is closed under this operation because the final relative velocity will again be less than the speed of light.  The fact that this has “division” (i.e. cancellation), is again obvious enough from physical principles: if we have $\mathbf{v} \oplus _E \mathbf{u}$, the relative velocity of A and C, and we have one of $\mathbf{v}$ or $\mathbf{u}$ – the relative velocity of B to either A or C – then the relative velocity of B to the other one of these two must exist, and be findable using this formula.  That’s the “division” here.

So in fact this non-Moufang quasigroup, exotic-sounding algebraic terminology aside, is one that any undergraduate physics student will have learned about and calculated with.

One point that Derek was making in pointing this example out to me was as a comment on a surprising claim someone (I don’t know who) had made, that mathematical abstractions like “nonassociativity” don’t really appear in physics.  I find the above a pretty convincing case that this isn’t true.

In fact, physics is full of Lie algebras, and the Lie bracket is a nonassociative multiplication (except in trivial cases).  But I guess there is an argument against this: namely, people often think of a Lie algebra as living inside its universal enveloping algebra.  Then the Lie bracket is defined as $[x,y] = xy - yx$, using the underlying (associative!) multiplication.  So maybe one can claim that nonassociativity doesn’t “really” appear in physics because you can treat it as a derived concept.

An even simpler example of this sort of phenomenon: the integers with subtraction (rather than addition) are nonassociative, in that $x-(y-z) \neq (x-y)-z$.  But this only suggests that subtraction is the wrong operation to use: it was derived from addition, which of course is commutative and associative.

In which case, the addition of velocities in relativity is also a derived concept.  Because, of course, really in SR there are no 3-space “velocities”: there are tangent vectors in Minkowski space, which is a 4-dimensional space.  Adding these vectors in $\mathbb{R}^4$ is again, of course, commutative and associative.  The concept of “relative velocity” of two observers travelling along given vectors is a derived concept which gets its strange properties by treating the two arguments asymmetrically, just like like “commutator” and “subtraction” do: you first use one vector to decide on a way of slicing Minkowski spacetime into space and time, and then use this to measure the velocity of the other.

Even the octonions, seemingly the obvious “true” example of nonassociativity, could be brushed aside by someone who really didn’t want to accept any example: they’re constructed from the quaternions by the Cayley-Dickson construction, so you can think of them as pairs of quaternions (or 4-tuples of complex numbers).  Then the nonassociative operation is built from associative ones, via that construction.

So are there any “real” examples of “true” nonassociativity (let alone non-Moufangness) that can’t simply be dismissed as not a fundamental operation by someone sufficiently determined?  Maybe, but none I know of right now.  It may be quite possible to consistently hold that anything nonassociative can’t possibly be fundamental (for that matter, elements of noncommutative groups can be represented by matrices of commuting real numbers).  Maybe it’s just my attitude to fundamentals, but somehow this doesn’t move me much.  Even if there are no “fundamentals” examples, I think those given above suggest a different point: these derived operations have undeniable and genuine meaning – in some cases more immediate than the operations they’re derived from.  Whether or not subtraction, or the relative velocity measured by observers, or the bracket of (say) infinitesimal rotations, are “fundamental” ideas is less important than that they’re practical ones that come up all the time.

There is no abiding thing in what we know. We change from weaker to stronger lights, and each more powerful light pierces our hitherto opaque foundations and reveals fresh and different opacities below. We can never foretell which of our seemingly assured fundamentals the next change will not affect.

H.G. Wells, A Modern Utopia

So there’s a recent paper by some physicists, two of whom work just across the campus from me at IST, which purports to explain the Pioneer Anomaly, ultimately using a computer graphics technique, Phong shading. The point being that they use this to model more accurately than has been done before how much infrared radiation is radiating and reflecting off various parts of the Pioneer spacecraft. They claim that with the new, more accurate model, the net force from this radiation is just enough to explain the anomalous acceleration.

Well, plainly, any one paper needs to be rechecked before you can treat it as definitive, but this sort of result looks good for conventional General Relativity, when some people had suggested the anomaly was evidence some other theory was needed.  Other anomalies in the predictions of GR – the rotational profiles of galaxies, or redshift data, have also suggested alternative theories.  In order to preserve GR exactly on large scales, you have to introduce things like Dark Matter and Dark Energy, and suppose that something like 97% of the mass-energy of the universe is otherwise invisible.  Such Dark entities might exist, of course, but I worry it’s kind of circular to postulate them on the grounds that you need them to make GR explain observations, while also claiming this makes sense because GR is so well tested.

In any case, this refined calculation about Pioneer is a reminder that usually the more conservative extension of your model is better. It’s not so obvious to me whether a modified theory of gravity, or an unknown and invisible majority of the universe is more conservative.

And that’s the best segue I can think of into this next post, which is very different from recent ones.

Fundamentals

I was thinking recently about “fundamental” theories.  At the HGTQGR workshop we had several talks about the most popular physical ideas into which higher gauge theory and TQFT have been infiltrating themselves recently, namely string theory and (loop) quantum gravity.  These aren’t the only schools of thought about what a “quantum gravity” theory should look like – but they are two that have received a lot of attention and work.  Each has been described (occasionally) as a “fundamental” theory of physics, in the sense of one which explains everything else.  There has been a debate about this, since they are based on different principles.  The arguments against string theory are various, but a crucial one is that no existing form of string theory is “background independent” in the same way that General Relativity is. This might be because string theory came out of a community grounded in particle physics – it makes sense to perturb around some fixed background spacetime in that context, because no experiment with elementary particles is going to have a measurable effect on the universe at infinity. “M-theory” is supposed to correct this defect, but so far nobody can say just what it is.  String theorists criticize LQG on various grounds, but one of the more conceptually simple ones would be that it can’t be a unified theory of physics, since it doesn’t incorporate forces other than gravity.

There is, of course, some philosophical debate about whether either of these properties – background independence, or unification – is really crucial to a fundamental theory.   I don’t propose to answer that here (though for the record my hunch at he moment is that both of them are important and will hold up over time).  In fact, it’s “fundamental theory” itself that I’m thinking about here.

As I suggested in one of my first posts explaining the title of this blog, I expect that we’ll need lots of theories to get a grip on the world: a whole “atlas”, where each “map” is a theory, each dealing with a part of the whole picture, and overlapping somewhat with others. But theories are formal entities that involve symbols and our brain’s ability to manipulate symbols. Maybe such a construct could account for all the observable phenomena of the world – but a-priori it seems odd to assume that. The fact that they can provide various limits and approximations has made them useful, from an evolutionary point of view, and the tendency to confuse symbols and reality in some ways is a testament to that (it hasn’t hurt so much as to be selected out).

One little heuristic argument – not at all conclusive – against this idea involves Kolmogorov complexity: wanting to explain all the observed data about the universe is in some sense to “compress” the data.  If we can account for the observations – say, with a short description of some physical laws and a bunch of initial conditions, which is what a “fundamental theory” suggests – then we’ve found an upper bound on its Kolmogorov complexity.  If the universe actually contains such a description, then that must also be a lower bound on its complexity.  Thus, any complete description of the universe would have to be as big as the whole universe.

Well, as I said, this argument fails to be very convincing.  Partly because it assumes a certain form of the fundamental theory (in particular, a deterministic one), but mainly because it doesn’t rule out that there is indeed a very simple set of physical laws, but there are limits to the precision with which we could use them to simulate the whole world because we can’t encode the state of the universe perfectly.  We already knew that.  At most, that lack of precision puts some practical limits on our ability to confirm that a given set of physical laws we’ve written down is  empirically correct.  It doesn’t preclude there being one, or even our finding it (without necessarily being perfectly certain).  The way Einstein put it (in this address, by the way) was “As far as the laws of mathematics refer to reality, they are not certain; and as far as they are certain, they do not refer to reality.”  But a lack of certainty doesn’t mean they aren’t there.

However, this got me thinking about fundamental theories from the point of view of epistemology, and how we handle knowledge.

Reduction

First, there’s a practical matter. The idea of a fundamental theory is the logical limit of one version of reductionism. This is the idea that the behaviour of things should be explained in terms of smaller, simpler things. I have no problem with this notion, unless you then conclude that once you’ve found a “more fundamental” theory, the old one should be discarded.

For example: we have a “theory of chemistry”, which says that the constituents of matter are those found on the periodic table of elements.  This theory comes in various degrees of sophistication: for instance, you can start to learn the periodic table without knowing that there are often different isotopes of a given element, and only knowing the 91 naturally occurring elements (everything up to Uranium, except Technicium). This gives something like Mendeleev’s early version of the table. You could come across these later refinements by finding a gap in the theory (Technicium, say), or a disagreement with experiment (discovering isotopes by measuring atomic weights). But even a fairly naive version of the periodic table, along with some concepts about atomic bonds, gives a good explanation of a huge range of chemical reactions under normal conditions. It can’t explain, for example, how the Sun shines – but it explains a lot within its proper scope.

Where this theory fits in a fuller picture of the world has at least two directions: more fundamental, and less fundamental, theories.  What I mean by less “fundamental” is that some things are supposed to be explained by this theory of chemistry: the great abundance of proteins and other organic chemicals, say. The behaviour of the huge variety of carbon compounds predicted by basic chemistry is supposed to explain all these substances and account for how they behave.  The millions of organic compounds that show up in nature, and their complicated behaviour, is supposed to be explained in terms of just a few elements that they’re made of – mostly carbon, hydrogen, oxygen, nitrogen, sulfur, phosphorus, plus the odd trace element.

By “more fundamental”, I mean that the periodic table itself can start to seem fairly complicated, especially once you start to get more sophisticated, including transuranic elements, isotopes, radioactive decay rates, and the like. So it was explained in terms of a theory of the atom. Again, there are refinements, but the Bohr model of the atom ought to do the job: a nucleus made of protons and neutrons, and surrounded by shells of electrons.  We can add that these are governed by the Dirac equation, and then the possible states for electrons bound to a nucleus ought to explain the rows and columns of the periodic table. Better yet, they’re supposed to explain exactly the spectral lines of each element – the frequencies of light atoms absorb and emit – by the differences of energy levels between the shells.

Well, this is great, but in practice it has limits. Hardly anyone disputes that the Bohr model is approximately right, and should explain the periodic table etc. The problem is that it’s largely an intractable problem to actually solve the Schroedinger equation for the atom and use the results to predict the emission spectrum, chemical properties, melting point, etc. of, say, Vanadium…  On the other hand, it’s equally hard to use a theory of chemistry to adequately predict how proteins will fold. Protein conformation prediction is a hard problem, and while it’s chugging along and making progress, the point is a theory of chemistry alone isn’t enough: any successful method must rely on a whole extra body of knowledge.  This suggests our best bet at understanding all these phenomena is to have a whole toolbox of different theories, each one of which has its own body of relevant mathematics, its own domain-specific ontology, and some sense of how its concepts relate to those in other theories in the tookbox. (This suggests a view of how mathematics relates to the sciences which seems to me to reflect actual practice: it pervades all of them, in a different way than the way a “more fundamental” theory underlies a less fundamental one.  Which tends to spoil the otherwise funny XKCD comic on the subject…)

If one “explains” one theory in terms of another (or several others), then we may be able to put them into at least a partial order.  The mental image I have in mind is the “theoretical atlas” – a bunch of “charts” (the theories) which cover different parts of a globe (our experience, or the data we want to account for), and which overlap in places.  Some are subsets of others (are completely explained by them, in principle). Then we’d like to find a minimal (or is it maximal) element of this order: something which accounts for all the others, at least in principle.  In that mental image, it would be a map of the whole globe (or a dense subset of the surface, anyway).  Because, of course, the Bohr model, though in principle sufficient to account for chemistry, needs an explanation of its own: why are atoms made this way, instead of some other way? This ends up ramifying out into something like the Standard Model of particle physics.  Once we have that, we would still like to know why elementary particles work this way, instead of some other way…

An Explanatory Trilemma

There’s a problem here, which I think is unavoidable, and which rather ruins that nice mental image.  It has to do with a sort of explanatory version of Agrippa’s Trilemma, which is an observation in epistemology that goes back to Agrippa the Skeptic. It’s also sometimes called “Munchausen’s Trilemma”, and it was originally made about justifying beliefs.  I think a slightly different form of it can be applied to explanations, where instead of “how do I know X is true?”, the question you repeatedly ask is “why does it happen like X?”

So, the Agrippa Trilemma as classically expressed might lead to a sequence of questions about observation.  Q: How do we know chemical substances are made of elements? A: Because of some huge body of evidence. Q: How do we know this evidence is valid? A: Because it was confirmed by a bunch of experimental data. Q: How do we know that our experiments were done correctly? And so on. In mathematics, it might ask a series of questions about why a certain theorem is true, which we chase back through a series of lemmas, down to a bunch of basic axioms and rules of inference. We could be asked to justify these, but typically we just posit them. The Trilemma says that there are three ways this sequence of justifications can end up:

1. we arrive at an endpoint of premises that don’t require any justification
2. we continue indefinitely in a chain of justifications that never ends
3. we continue in a chain of justifications that eventually becomes circular

None of these seems to be satisfactory for an experimental science, which is partly why we say that there’s no certainty about empirical knowledge. In mathematics, the first option is regarded as OK: all statements in mathematics are “really” of the form if axioms A, B, C etc. are assumed, then conclusions X, Y, Z etc. eventually follow. We might eventually find that some axioms don’t apply to the things we’re interested in, and cease to care about those statements, but they’ll remain true. They won’t be explanations of anything very much, though.  If we’re looking at reality, it’s not enough to assume axioms A, B, C… We also want to check them, test them, see if they’re true – and we can’t be completely sure with only a finite amount of evidence.

The explanatory variation on Agrippa’s Trilemma, which I have in mind, deals with a slightly different problem.  Supposing the axioms seem to be true, and accepting provisionally that they are, we also have another question, which if anything is even more basic to science: we want to know WHY they’re true – we look for an explanation.

This is about looking for coherence, rather than confidence, in our knowledge (or at any rate, theories). But a similar problem appears. Suppose that elementary chemistry has explained organic chemistry; that atomic physics has explained why chemistry is how it is; and that the Standard model explains why atomic physics is how it is.  We still want to know why the Standard Model is the way it is, and so on. Each new explanation gives an account for one phenomenon in terms of different, more basic phenomenon. The Trilemma suggests the following options:

1. we arrive at an endpoint of premises that don’t require any explanation
2. we continue indefinitely in a chain of explanations that never ends
3. we continue in a chain of explanations that eventually becomes circular

Unless we accept option 1, we don’t have room for a “fundamental theory”.

Here’s the key point: this isn’t even a position about physics – it’s about epistemology, and what explanations are like, or maybe rather what our behaviour is like with regard to explanations. The standard version of Agrippa’s Trilemma is usually taken as an argument for something like fallibilism: that our knowledge is always uncertain. This variation isn’t talking about the justification of beliefs, but the sufficiency of explanation. It says that the way our mind works is such that there can’t be one final summation of the universe, one principle, which accounts for everything – because it would either be unaccounted for itself, or because it would have to account for itself by circular reasoning.

This might be a dangerous statement to make, or at least a theological one (theology isn’t as dangerous as it used to be): reasoning that things are the way they are “because God made it that way” is a traditional answer of the first type. True or not, I don’t think you can really call an “explanation”, since it would work equally well if things were some other way. In fact, it’s an anti-explanation: if you accept an uncaused-cause anywhere along the line, the whole motivation for asking after explanations unravels.  Maybe this sort of answer is a confession of humility and acceptance of limited understanding, where we draw the line and stop demanding further explanations. I don’t see that we all need to draw that line in the same place, though, so the problem hasn’t gone away.

What seems likely to me is that this problem can’t be made to go away.  That the situation we’ll actually be in is (2) on the list above.  That while there might not be any specific thing that scientific theories can’t explain, neither could there be a “fundamental theory” that will be satisfying to the curious forever.  Instead, we have an asymptotic approach to explanation, as each thing we want to explain gets picked up somewhere along the line: “We change from weaker to stronger lights, and each more powerful light pierces our hitherto opaque foundations and reveals fresh and different opacities below.”

Next Page »