### philosophical

There is no abiding thing in what we know. We change from weaker to stronger lights, and each more powerful light pierces our hitherto opaque foundations and reveals fresh and different opacities below. We can never foretell which of our seemingly assured fundamentals the next change will not affect.

H.G. Wells, A Modern Utopia

So there’s a recent paper by some physicists, two of whom work just across the campus from me at IST, which purports to explain the Pioneer Anomaly, ultimately using a computer graphics technique, Phong shading. The point being that they use this to model more accurately than has been done before how much infrared radiation is radiating and reflecting off various parts of the Pioneer spacecraft. They claim that with the new, more accurate model, the net force from this radiation is just enough to explain the anomalous acceleration.

Well, plainly, any one paper needs to be rechecked before you can treat it as definitive, but this sort of result looks good for conventional General Relativity, when some people had suggested the anomaly was evidence some other theory was needed.  Other anomalies in the predictions of GR – the rotational profiles of galaxies, or redshift data, have also suggested alternative theories.  In order to preserve GR exactly on large scales, you have to introduce things like Dark Matter and Dark Energy, and suppose that something like 97% of the mass-energy of the universe is otherwise invisible.  Such Dark entities might exist, of course, but I worry it’s kind of circular to postulate them on the grounds that you need them to make GR explain observations, while also claiming this makes sense because GR is so well tested.

In any case, this refined calculation about Pioneer is a reminder that usually the more conservative extension of your model is better. It’s not so obvious to me whether a modified theory of gravity, or an unknown and invisible majority of the universe is more conservative.

And that’s the best segue I can think of into this next post, which is very different from recent ones.

Fundamentals

I was thinking recently about “fundamental” theories.  At the HGTQGR workshop we had several talks about the most popular physical ideas into which higher gauge theory and TQFT have been infiltrating themselves recently, namely string theory and (loop) quantum gravity.  These aren’t the only schools of thought about what a “quantum gravity” theory should look like – but they are two that have received a lot of attention and work.  Each has been described (occasionally) as a “fundamental” theory of physics, in the sense of one which explains everything else.  There has been a debate about this, since they are based on different principles.  The arguments against string theory are various, but a crucial one is that no existing form of string theory is “background independent” in the same way that General Relativity is. This might be because string theory came out of a community grounded in particle physics – it makes sense to perturb around some fixed background spacetime in that context, because no experiment with elementary particles is going to have a measurable effect on the universe at infinity. “M-theory” is supposed to correct this defect, but so far nobody can say just what it is.  String theorists criticize LQG on various grounds, but one of the more conceptually simple ones would be that it can’t be a unified theory of physics, since it doesn’t incorporate forces other than gravity.

There is, of course, some philosophical debate about whether either of these properties – background independence, or unification – is really crucial to a fundamental theory.   I don’t propose to answer that here (though for the record my hunch at he moment is that both of them are important and will hold up over time).  In fact, it’s “fundamental theory” itself that I’m thinking about here.

As I suggested in one of my first posts explaining the title of this blog, I expect that we’ll need lots of theories to get a grip on the world: a whole “atlas”, where each “map” is a theory, each dealing with a part of the whole picture, and overlapping somewhat with others. But theories are formal entities that involve symbols and our brain’s ability to manipulate symbols. Maybe such a construct could account for all the observable phenomena of the world – but a-priori it seems odd to assume that. The fact that they can provide various limits and approximations has made them useful, from an evolutionary point of view, and the tendency to confuse symbols and reality in some ways is a testament to that (it hasn’t hurt so much as to be selected out).

One little heuristic argument – not at all conclusive – against this idea involves Kolmogorov complexity: wanting to explain all the observed data about the universe is in some sense to “compress” the data.  If we can account for the observations – say, with a short description of some physical laws and a bunch of initial conditions, which is what a “fundamental theory” suggests – then we’ve found an upper bound on its Kolmogorov complexity.  If the universe actually contains such a description, then that must also be a lower bound on its complexity.  Thus, any complete description of the universe would have to be as big as the whole universe.

Well, as I said, this argument fails to be very convincing.  Partly because it assumes a certain form of the fundamental theory (in particular, a deterministic one), but mainly because it doesn’t rule out that there is indeed a very simple set of physical laws, but there are limits to the precision with which we could use them to simulate the whole world because we can’t encode the state of the universe perfectly.  We already knew that.  At most, that lack of precision puts some practical limits on our ability to confirm that a given set of physical laws we’ve written down is  empirically correct.  It doesn’t preclude there being one, or even our finding it (without necessarily being perfectly certain).  The way Einstein put it (in this address, by the way) was “As far as the laws of mathematics refer to reality, they are not certain; and as far as they are certain, they do not refer to reality.”  But a lack of certainty doesn’t mean they aren’t there.

However, this got me thinking about fundamental theories from the point of view of epistemology, and how we handle knowledge.

Reduction

First, there’s a practical matter. The idea of a fundamental theory is the logical limit of one version of reductionism. This is the idea that the behaviour of things should be explained in terms of smaller, simpler things. I have no problem with this notion, unless you then conclude that once you’ve found a “more fundamental” theory, the old one should be discarded.

For example: we have a “theory of chemistry”, which says that the constituents of matter are those found on the periodic table of elements.  This theory comes in various degrees of sophistication: for instance, you can start to learn the periodic table without knowing that there are often different isotopes of a given element, and only knowing the 91 naturally occurring elements (everything up to Uranium, except Technicium). This gives something like Mendeleev’s early version of the table. You could come across these later refinements by finding a gap in the theory (Technicium, say), or a disagreement with experiment (discovering isotopes by measuring atomic weights). But even a fairly naive version of the periodic table, along with some concepts about atomic bonds, gives a good explanation of a huge range of chemical reactions under normal conditions. It can’t explain, for example, how the Sun shines – but it explains a lot within its proper scope.

Where this theory fits in a fuller picture of the world has at least two directions: more fundamental, and less fundamental, theories.  What I mean by less “fundamental” is that some things are supposed to be explained by this theory of chemistry: the great abundance of proteins and other organic chemicals, say. The behaviour of the huge variety of carbon compounds predicted by basic chemistry is supposed to explain all these substances and account for how they behave.  The millions of organic compounds that show up in nature, and their complicated behaviour, is supposed to be explained in terms of just a few elements that they’re made of – mostly carbon, hydrogen, oxygen, nitrogen, sulfur, phosphorus, plus the odd trace element.

By “more fundamental”, I mean that the periodic table itself can start to seem fairly complicated, especially once you start to get more sophisticated, including transuranic elements, isotopes, radioactive decay rates, and the like. So it was explained in terms of a theory of the atom. Again, there are refinements, but the Bohr model of the atom ought to do the job: a nucleus made of protons and neutrons, and surrounded by shells of electrons.  We can add that these are governed by the Dirac equation, and then the possible states for electrons bound to a nucleus ought to explain the rows and columns of the periodic table. Better yet, they’re supposed to explain exactly the spectral lines of each element – the frequencies of light atoms absorb and emit – by the differences of energy levels between the shells.

Well, this is great, but in practice it has limits. Hardly anyone disputes that the Bohr model is approximately right, and should explain the periodic table etc. The problem is that it’s largely an intractable problem to actually solve the Schroedinger equation for the atom and use the results to predict the emission spectrum, chemical properties, melting point, etc. of, say, Vanadium…  On the other hand, it’s equally hard to use a theory of chemistry to adequately predict how proteins will fold. Protein conformation prediction is a hard problem, and while it’s chugging along and making progress, the point is a theory of chemistry alone isn’t enough: any successful method must rely on a whole extra body of knowledge.  This suggests our best bet at understanding all these phenomena is to have a whole toolbox of different theories, each one of which has its own body of relevant mathematics, its own domain-specific ontology, and some sense of how its concepts relate to those in other theories in the tookbox. (This suggests a view of how mathematics relates to the sciences which seems to me to reflect actual practice: it pervades all of them, in a different way than the way a “more fundamental” theory underlies a less fundamental one.  Which tends to spoil the otherwise funny XKCD comic on the subject…)

If one “explains” one theory in terms of another (or several others), then we may be able to put them into at least a partial order.  The mental image I have in mind is the “theoretical atlas” – a bunch of “charts” (the theories) which cover different parts of a globe (our experience, or the data we want to account for), and which overlap in places.  Some are subsets of others (are completely explained by them, in principle). Then we’d like to find a minimal (or is it maximal) element of this order: something which accounts for all the others, at least in principle.  In that mental image, it would be a map of the whole globe (or a dense subset of the surface, anyway).  Because, of course, the Bohr model, though in principle sufficient to account for chemistry, needs an explanation of its own: why are atoms made this way, instead of some other way? This ends up ramifying out into something like the Standard Model of particle physics.  Once we have that, we would still like to know why elementary particles work this way, instead of some other way…

An Explanatory Trilemma

There’s a problem here, which I think is unavoidable, and which rather ruins that nice mental image.  It has to do with a sort of explanatory version of Agrippa’s Trilemma, which is an observation in epistemology that goes back to Agrippa the Skeptic. It’s also sometimes called “Munchausen’s Trilemma”, and it was originally made about justifying beliefs.  I think a slightly different form of it can be applied to explanations, where instead of “how do I know X is true?”, the question you repeatedly ask is “why does it happen like X?”

So, the Agrippa Trilemma as classically expressed might lead to a sequence of questions about observation.  Q: How do we know chemical substances are made of elements? A: Because of some huge body of evidence. Q: How do we know this evidence is valid? A: Because it was confirmed by a bunch of experimental data. Q: How do we know that our experiments were done correctly? And so on. In mathematics, it might ask a series of questions about why a certain theorem is true, which we chase back through a series of lemmas, down to a bunch of basic axioms and rules of inference. We could be asked to justify these, but typically we just posit them. The Trilemma says that there are three ways this sequence of justifications can end up:

1. we arrive at an endpoint of premises that don’t require any justification
2. we continue indefinitely in a chain of justifications that never ends
3. we continue in a chain of justifications that eventually becomes circular

None of these seems to be satisfactory for an experimental science, which is partly why we say that there’s no certainty about empirical knowledge. In mathematics, the first option is regarded as OK: all statements in mathematics are “really” of the form if axioms A, B, C etc. are assumed, then conclusions X, Y, Z etc. eventually follow. We might eventually find that some axioms don’t apply to the things we’re interested in, and cease to care about those statements, but they’ll remain true. They won’t be explanations of anything very much, though.  If we’re looking at reality, it’s not enough to assume axioms A, B, C… We also want to check them, test them, see if they’re true – and we can’t be completely sure with only a finite amount of evidence.

The explanatory variation on Agrippa’s Trilemma, which I have in mind, deals with a slightly different problem.  Supposing the axioms seem to be true, and accepting provisionally that they are, we also have another question, which if anything is even more basic to science: we want to know WHY they’re true – we look for an explanation.

This is about looking for coherence, rather than confidence, in our knowledge (or at any rate, theories). But a similar problem appears. Suppose that elementary chemistry has explained organic chemistry; that atomic physics has explained why chemistry is how it is; and that the Standard model explains why atomic physics is how it is.  We still want to know why the Standard Model is the way it is, and so on. Each new explanation gives an account for one phenomenon in terms of different, more basic phenomenon. The Trilemma suggests the following options:

1. we arrive at an endpoint of premises that don’t require any explanation
2. we continue indefinitely in a chain of explanations that never ends
3. we continue in a chain of explanations that eventually becomes circular

Unless we accept option 1, we don’t have room for a “fundamental theory”.

Here’s the key point: this isn’t even a position about physics – it’s about epistemology, and what explanations are like, or maybe rather what our behaviour is like with regard to explanations. The standard version of Agrippa’s Trilemma is usually taken as an argument for something like fallibilism: that our knowledge is always uncertain. This variation isn’t talking about the justification of beliefs, but the sufficiency of explanation. It says that the way our mind works is such that there can’t be one final summation of the universe, one principle, which accounts for everything – because it would either be unaccounted for itself, or because it would have to account for itself by circular reasoning.

This might be a dangerous statement to make, or at least a theological one (theology isn’t as dangerous as it used to be): reasoning that things are the way they are “because God made it that way” is a traditional answer of the first type. True or not, I don’t think you can really call an “explanation”, since it would work equally well if things were some other way. In fact, it’s an anti-explanation: if you accept an uncaused-cause anywhere along the line, the whole motivation for asking after explanations unravels.  Maybe this sort of answer is a confession of humility and acceptance of limited understanding, where we draw the line and stop demanding further explanations. I don’t see that we all need to draw that line in the same place, though, so the problem hasn’t gone away.

What seems likely to me is that this problem can’t be made to go away.  That the situation we’ll actually be in is (2) on the list above.  That while there might not be any specific thing that scientific theories can’t explain, neither could there be a “fundamental theory” that will be satisfying to the curious forever.  Instead, we have an asymptotic approach to explanation, as each thing we want to explain gets picked up somewhere along the line: “We change from weaker to stronger lights, and each more powerful light pierces our hitherto opaque foundations and reveals fresh and different opacities below.”

One talk at the workshop was nominally a school talk by Laurent Freidel, but it’s interesting and distinctive enough in its own right that I wanted to consider it by itself.  It was based on this paper on the “Principle of Relative Locality”. This isn’t so much a new theory, as an exposition of what ought to happen when one looks at a particular limit of any putative theory that has both quantum field theory and gravity as (different) limits of it. This leads through some ideas, such as curved momentum space, which have been kicking around for a while. The end result is a way of accounting for apparently non-local interactions of particles, by saying that while the particles themselves “see” the interactions as local, distant observers might not.

Whereas Einstein’s gravity describes a regime where Newton’s gravitational constant $G_N$ is important but Planck’s constant $\hbar$ is negligible, and (special-relativistic) quantum field theory assumes $\hbar$ significant but $G_N$ not.  Both of these assume there is a special velocity scale, given by the speed of light $c$, whereas classical mechanics assumes that all three can be neglected (i.e. $G_N$ and $\hbar$ are zero, and $c$ is infinite).   The guiding assumption is that these are all approximations to some more fundamental theory, called “quantum gravity” just because it accepts that both $G_N$ and $\hbar$ (as well as $c$) are significant in calculating physical effects.  So GR and QFT incorporate two of the three constants each, and classical mechanics incorporates neither.  The “principle of relative locality” arises when we consider a slightly different approximation to this underlying theory.

This approximation works with a regime where $G_N$ and $\hbar$ are each negligible, but the ratio is not – this being related to the Planck mass $m_p \sim \sqrt{\frac{\hbar}{G_N}}$.  The point is that this is an approximation with no special length scale (“Planck length”), but instead a special energy scale (“Planck mass”) which has to be preserved.   Since energy and momentum are different parts of a single 4-vector, this is also a momentum scale; we expect to see some kind of deformation of momentum space, at least for momenta that are bigger than this scale.  The existence of this scale turns out to mean that momenta don’t add linearly – at least, not unless they’re very small compared to the Planck scale.

So what is “Relative Locality”?  In the paper linked above, it’s stated like so:

Physics takes place in phase space and there is no invariant global projection that gives a description of processes in spacetime.  From their measurements local observers can construct descriptions of particles moving and interacting in a spacetime, but different observers construct different spacetimes, which are observer-dependent slices of phase space.

Motivation

This arises from taking the basic insight of general relativity – the requirement that physical principles should be invariant under coordinate transformations (i.e. diffeomorphisms) – and extend it so that instead of applying just to spacetime, it applies to the whole of phase space.  Phase space (which, in this limit where $\hbar = 0$, replaces the Hilbert space of a truly quantum theory) is the space of position-momentum configurations (of things small enough to treat as point-like, in a given fixed approximation).  Having no $G_N$ means we don’t need to worry about any dynamical curvature of “spacetime” (which doesn’t exist), and having no Planck length means we can blithely treat phase space as a manifold with coordinates valued in the real line (which has no special scale).  Yet, having a special mass/momentum scale says we should see some purely combined “quantum gravity” effects show up.

The physical idea is that phase space is an accurate description of what we can see and measure locally.  Observers (whom we assume small enough to be considered point-like) can measure their own proper time (they “have a clock”) and can detect momenta (by letting things collide with them and measuring the energy transferred locally and its direction).  That is, we “see colors and angles” (i.e. photon energies and differences of direction).  Beyond this, one shouldn’t impose any particular theory of what momenta do: we can observe the momenta of separate objects and see what results when they interact and deduce rules from that.  As an extension of standard physics, this model is pretty conservative.  Now, conventionally, phase space would be the cotangent bundle of spacetime $T^*M$.  This model is based on the assumption that objects can be at any point, and wherever they are, their space of possible momenta is a vector space.  Being a bundle, with a global projection onto $M$ (taking $(x,v)$ to $x$), is exactly what this principle says doesn’t necessarily obtain.  We still assume that phase space will be some symplectic manifold.   But we don’t assume a priori that momentum coordinates give a projection whose fibres happen to be vector spaces, as in a cotangent bundle.

Now, a symplectic manifold  still looks locally like a cotangent bundle (Darboux’s theorem). So even if there is no universal “spacetime”, each observer can still locally construct a version of “spacetime”  by slicing up phase space into position and momentum coordinates.  One can, by brute force, extend the spacetime coordinates quite far, to distant points in phase space.  This is roughly analogous to how, in special relativity, each observer can put their own coordinates on spacetime and arrive at different notions of simultaneity.  In general relativity, there are issues with trying to extend this concept globally, but it can be done under some conditions, giving the idea of “space-like slices” of spacetime.  In the same way, we can construct “spacetime-like slices” of phase space.

Geometrizing Algebra

Now, if phase space is a cotangent bundle, momenta can be added (the fibres of the bundle are vector spaces).  Some more recent ideas about “quasi-Hamiltonian spaces” (initially introduced by Alekseev, Malkin and Meinrenken) conceive of momenta as “group-valued” – rather than taking values in the dual of some Lie algebra (the way, classically, momenta are dual to velocities, which live in the Lie algebra of infinitesimal translations).  For small momenta, these are hard to distinguish, so even group-valued momenta might look linear, but the premise is that we ought to discover this by experiment, not assumption.  We certainly can detect “zero momentum” and for physical reasons can say that given two things with two momenta $(p,q)$, there’s a way of combining them into a combined momentum $p \oplus q$.  Think of doing this physically – transfer all momentum from one particle to another, as seen by a given observer.  Since the same momentum at the observer’s position can be either coming in or going out, this operation has a “negative” with $(\ominus p) \oplus p = 0$.

We do have a space of momenta at any given observer’s location – the total of all momenta that can be observed there, and this space now has some algebraic structure.  But we have no reason to assume up front that $\oplus$ is either commutative or associative (let alone that it makes momentum space at a given observer’s location into a vector space).  One can interpret this algebraic structure as giving some geometry.  The commutator for $\oplus$ gives a metric on momentum space.  This is a bilinear form which is implicitly defined by the “norm” that assigns a kinetic energy to a particle with a given momentum. The associator given by $p \oplus ( q \oplus r ) - (p \oplus q ) \oplus r)$, infinitesimally near $0$ where this makes sense, gives a connection.  This defines a “parallel transport” of a finite momentum $p$ in the direction of a momentum $q$ by saying infinitesimally what happens when adding $dq$ to $p$.

Various additional physical assumptions – like the momentum-space “duals” of the equivalence principle (that the combination of momenta works the same way for all kinds of matter regardless of charge), or the strong equivalence principle (that inertial mass and rest mass energy per the relation $E = mc^2$ are the same) and so forth can narrow down the geometry of this metric and connection.  Typically we’ll find that it needs to be Lorentzian.  With strong enough symmetry assumptions, it must be flat, so that momentum space is a vector space after all – but even with fairly strong assumptions, as with general relativity, there’s still room for this “empty space” to have some intrinsic curvature, in the form of a momentum-space “dual cosmological constant”, which can be positive (so momentum space is closed like a sphere), zero (the vector space case we usually assume) or negative (so momentum space is hyperbolic).

This geometrization of what had been algebraic is somewhat analogous to what happened with velocities (i.e. vectors in spacetime)) when the theory of special relativity came along.  Insisting that the “invariant” scale $c$ be the same in every reference system meant that the addition of velocities ceased to be linear.  At least, it did if you assume that adding velocities has an interpretation along the lines of: “first, from rest, add velocity v to your motion; then, from that reference frame, add velocity w”.  While adding spacetime vectors still worked the same way, one had to rephrase this rule if we think of adding velocities as observed within a given reference frame – this became $v \oplus w = (v + w) (1 + uv)$ (scaling so $c =1$ and assuming the velocities are in the same direction).  When velocities are small relative to $c$, this looks roughly like linear addition.  Geometrizing the algebra of momentum space is thought of a little differently, but similar things can be said: we think operationally in terms of combining momenta by some process.  First transfer (group-valued) momentum $p$ to a particle, then momentum $q$ – the connection on momentum space tells us how to translate these momenta into the “reference frame” of a new observer with momentum shifted relative to the starting point.  Here again, the special momentum scale $m_p$ (which is also a mass scale since a momentum has a corresponding kinetic energy) is a “deformation” parameter – for momenta that are small compared to this scale, things seem to work linearly as usual.

There’s some discussion in the paper which relates this to DSR (either “doubly” or “deformed” special relativity), which is another postulated limit of quantum gravity, a variation of SR with both a special velocity and a special mass/momentum scale, to consider “what SR looks like near the Planck scale”, which treats spacetime as a noncommutative space, and generalizes the Lorentz group to a Hopf algebra which is a deformation of it.  In DSR, the noncommutativity of “position space” is directly related to curvature of momentum space.  In the “relative locality” view, we accept a classical phase space, but not a classical spacetime within it.

Physical Implications

We should understand this scale as telling us where “quantum gravity effects” should start to become visible in particle interactions.  This is a fairly large scale for subatomic particles.  The Planck mass as usually given is about 21 micrograms: small for normal purposes, about the size of a small sand grain, but very large for subatomic particles.  Converting to momentum units with $c$, this is about 6 kg m/s: on the order of the momentum of a kicked soccer ball or so.  For a subatomic particle this is a lot.

This scale does raise a question for many people who first hear this argument, though – that quantum gravity effects should become apparent around the Planck mass/momentum scale, since macro-objects like the aforementioned soccer ball still seem to have linearly-additive momenta.  Laurent explained the problem with this intuition.  For interactions of big, extended, but composite objects like soccer balls, one has to calculate not just one interaction, but all the various interactions of their parts, so the “effective” mass scale where the deformation would be seen becomes $N m_p$ where $N$ is the number of particles in the soccer ball.  Roughly, the point is that a soccer ball is not a large “thing” for these purposes, but a large conglomeration of small “things”, whose interactions are “fundamental”.  The “effective” mass scale tells us how we would have to alter the physical constants to be able to treat it as a “thing”.  (This is somewhat related to the question of “effective actions” and renormalization, though these are a bit more complicated.)

There are a number of possible experiments suggested in the paper, which Laurent mentioned in the talk.  One involves a kind of “twin paradox” taking place in momentum space.  In “spacetime”, a spaceship travelling a large loop at high velocity will arrive where it started having experienced less time than an observer who remained there (because of the Lorentzian metric) – and a dual phenomenon in momentum space says that particles travelling through loops (also in momentum space) should arrive displaced in space because of the relativity of localization.  This could be observed in particle accelerators where particles make several transits of a loop, since the effect is cumulative.  Another effect could be seen in astronomical observations: if an observer is observing some distant object via photons of different wavelengths (hence momenta), she might “localize” the object differently – that is, the two photons travel at “the same speed” the whole way, but arrive at different times because the observer will interpret the object as being at two different distances for the two photons.

This last one is rather weird, and I had to ask how one would distinguish this effect from a variable speed of light (predicted by certain other ideas about quantum gravity).  How to distinguish such effects seems to be not quite worked out yet, but at least this is an indication that there are new, experimentally detectible, effects predicted by this “relative locality” principle.  As Laurent emphasized, once we’ve noticed that not accepting this principle means making an a priori assumption about the geometry of momentum space (even if only in some particular approximation, or limit, of a true theory of quantum gravity), we’re pretty much obliged to stop making that assumption and do the experiments.  Finding our assumptions were right would simply be revealing which momentum space geometry actually obtains in the approximation we’re studying.

A final note about the physical interpretation: this “relative locality” principle can be discovered by looking (in the relevant limit) at a Lagrangian for free particles, with interactions described in terms of momenta.  It so happens that one can describe this without referencing a “real” spacetime: the part of the action that allows particles to interact when “close” only needs coordinate functions, which can certainly exist here, but are an observer-dependent construct.  The conservation of (non-linear) momenta is specified via a Lagrange multiplier.  The whole Lagrangian formalism for the mechanics of colliding particles works without reference to spacetime.  Now, even though all the interactions (specified by the conservation of momentum terms) happen “at one location”, in that there will be an observer who sees them happening in the momentum space of her own location.  But an observer at a different point may disagree about whether the interaction was local – i.e. happened at a single point in spacetime.  Thus “relativity of localization”.

Again, this is no more bizarre (mathematically) than the fact that distant, relatively moving, observers in special relativity might disagree about simultaneity, whether two events happened at the same time.  They have their own coordinates on spacetime, and transferring between them mixes space coordinates and time coordinates, so they’ll disagree whether the time-coordinate values of two events are the same.  Similarly, in this phase-space picture, two different observers each have a coordinate system for splitting phase space into “spacetime” and “energy-momentum” coordinates, but switching between them may mix these two pieces.  Thus, the two observers will disagree about whether the spacetime-coordinate values for the different interacting particles are the same.  And so, one observer says the interaction is “local in spacetime”, and the other says it’s not.  The point is that it’s local for the particles themselves (thinking of them as observers).  All that’s going on here is the not-very-astonishing fact that in the conventional picture, we have no problem with interactions being nonlocal in momentum space (particles with very different momenta can interact as long as they collide with each other)… combined with the inability to globally and invariantly distinguish position and momentum coordinates.

What this means, philosophically, can be debated, but it does offer some plausibility to the claim that space and time are auxiliary, conceptual additions to what we actually experience, which just account for the relations between bits of matter.  These concepts can be dispensed with even where we have a classical-looking phase space rather than Hilbert space (where, presumably, this is even more true).

Edit: On a totally unrelated note, I just noticed this post by Alex Hoffnung over at the n-Category Cafe which gives a lot of detail on issues relating to spans in bicategories that I had begun to think more about recently in relation to developing a higher-gauge-theoretic version of the construction I described for ETQFT. In particular, I’d been thinking about how the 2-group analog of restriction and induction for representations realizes the various kinds of duality properties, where we have adjunctions, biadjunctions, and so forth, in which units and counits of the various adjunctions have further duality. This observation seems to be due to Jim Dolan, as far as I can see from a brief note in HDA II. In that case, it’s really talking about the star-structure of the span (tri)category, but looking at the discussion Alex gives suggests to me that this theme shows up throughout this subject. I’ll have to take a closer look at the draft paper he linked to and see if there’s more to say…

In the first week of November, I was in Montreal for the biannual meeting of the Philosophy of Science Association, at the invitation of Hans Halvorson and Steve Awodey.  This was for a special session called “Category Theoretical Reflections on the Foundations of Physics”, which also had talks by Bob Coecke (from Oxford), Klaas Landsman (from Radboud University in Nijmegen), and Gonzalo Reyes (from the University of Montreal).  Slides from the talks in this session have been collected here by Steve Awodey.  The meeting was pretty big, and there were a lot of talks on a lot of different topics, some more technical, and some less.  There were enough sessions relating to physics that I had a full schedule just attending those, although for example there were sessions on biology and cognition which I might otherwise have been interested in sitting in on, with titles like “Biology: Evolution, Genomes and Biochemistry”, “Exploring the Complementarity between Economics and Recent Evolutionary Theory”, “Cognitive Sciences and Neuroscience”, and “Methodological Issues in Cognitive Neuroscience”.  And, of course, more fundamental philosophy of science topics like “Fictions and Scientific Realism” and “Kinds: Chemical, Biological and Social”, as well as socially-oriented ones such as “Philosophy of Commercialized Science” and “Improving Peer Review in the Sciences”.  However, interesting as these are, one can’t do everything.

In some ways, this was a really great confluence of interests for me – physics and category theory, as seen through a philosophical lens.  I don’t know exactly how this session came about, but Hans Halvorson is a philosopher of science who started out in physics (and has now, for example, learned enough category theory to teach the course in it offered at Princeton), and Steve Awodey is a philosopher of mathematics who is interested in category theory in its own right.  They managed to get this session brought in to present some of the various ideas about the overlap between category theory and physics to an audience mostly consisting of philosophers, which seems like a good idea.  It was also interesting for me to get a view into how philosophers approach these subjects – what kind of questions they ask, how they argue, and so on.  As with any well-developed subject, there’s a certain amount of jargon and received ideas that people can refer to – for example, I learned the word and current usage (though not the basic concept) of supervenience, which came up, oh, maybe 5-10 times each day.

There are now a reasonable number of people bringing categorical tools to bear on physics – especially quantum physics.  What people who think about the philosophy of science can bring to this research is the usual: careful, clear thinking about the fundamental concepts involved in a way that tries not to get distracted by the technicalities and keep the focus on what is important to the question at hand in a deep way.  In this case, the question at hand is physics.  Philosophy doesn’t always accomplish this, of course, and sometimes get sidetracked by what some might call “pseudoquestions” – the kind of questions that tend to arise when you use some folk-theory or simple intuitive understanding of some subtler concept that is much better expressed in mathematics.  This is why anyone who’s really interested in the philosophy of science needs to learn a lot about science in its own terms.  On the whole, this is what they actually do.

And, of course, both mathematicians and physicists try to do this kind of thinking themselves, but in those fields it’s easy – and important! – to spend a lot of time thinking about some technical question, or doing extensive computations, or working out the fiddly details of a proof, and so forth.  This is the real substance of the work in those fields – but sometimes the bigger “why” questions, that address what it means or how to interpret the results, get glossed over, or answered on the basis of some superficial analogy.  Mind you – one often can’t really assess how a line of research is working out until you’ve been doing the technical stuff for a while.  Then the problem is that people who do such thinking professionally – philosophers – are at a loss to understand the material because it’s recent and technical.  This is maybe why technical proficiency in science has tended to run ahead of real understanding – people still debate what quantum mechanics “means”, even though we can use it competently enough to build computers, nuclear reactors, interferometers, and so forth.

Anyway – as for the substance of the talks…  In our session, since every speaker was a mathematician in some form, they tended to be more technical.  You can check out the slides linked to above for more details, but basically, four views of how to draw on category theory to talk about physics were represented.  I’ve actually discussed each of them in previous posts, but in summary:

• Bob Coecke, on “Quantum Picturalism”, was addressing the monoidal dagger-category point of view, which looks at describing quantum mechanical operations (generally understood to be happening in a category of Hilbert spaces) purely in terms of the structure of that category, which one can see as a language for handling a particular kind of logic.  Monoidal categories, as Peter Selinger as painstakingly documented, can be described using various graphical calculi (essentially, certain categories whose morphisms are variously-decorated “strands”, considered invariant under various kinds of topological moves, are the free monoidal categories with various structures – so anything you can prove using these diagrams is automatically true for any example of such categories).  Selinger has also shown that, for the physically interesting case of dagger-compact closed monoidal categories, a theorem is true in general if and only if it’s true for (finite dimensional) Hilbert spaces, which may account for why Hilbert spaces play such a big role in quantum mechanics.  This program is based on describing as much of quantum mechanics as possible in terms of this kind of diagrammatic language.  This stuff has, in some ways, been explored more through the lens of computer science than physics per se – certainly Selinger is coming from that background.  There’s also more on this connection in the “Rosetta Stone” paper by John Baez and Mike Stay,
• My talk (actually third, but I put it here for logical flow) fits this framework, more or less.  I was in some sense there representing a viewpoint whose current form is due to Baez and Dolan, namely “groupoidification”.  The point is to treat the category $Span(Gpd)$ as a “categorification” of (finite dimensional) Hilbert spaces in the sense that there is a representation map $D : Span(Gpd) \rightarrow Hilb$ so that phenomena living in $Hilb$ can be explained as the image of phenomena in $Span(Gpd)$.  Having done that, there is also a representation of $Span(Gpd)$ into 2-Hilbert spaces, which shows up more detail (much more, at the object level, since Tannaka-Krein reconstruction means that the monoidal 2-Hilbert space of representations of a groupoid is, at least in nice cases, enough to completely reconstruct it).  This gives structures in $2Hilb$ which “conceptually” categorify the structures in $Hilb$, and are also directly connected to specific Hilbert spaces and maps, even though taking equivalence classes in $2Hilb$ definitely doesn’t produce these.  A “state” in a 2-Hilbert space is an irreducible representation, though – so there’s a conceptual difference between what “state” means in categorified and standard settings.  (There’s a bit more discussion in my notes for the talk than in the slides above.)
• Klaas Landsman was talking about what he calls “Bohrification“, which, on the technical side, makes use of Topos theory.  The philosophical point comes from Niels Bohr’s “doctrine of classical concepts” – that one should understand quantum systems using concepts from the classical world.  In practice, this means taking a (noncommutative) von Neumann algebra $A$ which describes the observables a quantum system and looking at it via its commutative subalgebras.  These are organized into a lattice – in fact, a site.  The idea is that the spectrum of $A$ lives in the topos associated to this site: it’s a presheaf that, over each commutative subalgebra $C \subset A$, just gives the spectrum of $C$.  This is philosophically nice in that the “Bohrified” propositions actually behave in a logically sensible way.  The topos approach comes from Chris Isham, developed further with Andreas Doring. (Note the series of four papers by both from 2007.  Their approach is in some sense dual to that of Lansman, Heunen and Spitters, in the sense that they look at the same site, but look at dual toposes – one of sheaves, the other of cosheaves.  The key bit of jargon in Isham and Doring’s approach is “daseinization”, which is a reference to Heidegger’s “Being and Time”.  For some reason this makes me imagine Bohr and Heidegger in a room, one standing on the ceiling, one on the floor, disputing which is which.)
• Gonzalo Reyes talked about synthetic differential geometry (SDG) as a setting for building general relativity.  SDG is a way of doing differential geometry in a category where infinitesimals are actually available, that is, there is a nontrivial set $D = \{ x \in \mathbb{R} | x^2 = 0 \}$.  This simplifies discussions of vector fields (tangent vectors will just be infinitesimal vectors in spacetime).  A vector field is really a first order DE (and an integral curve tangent to it is a solution), so it’s useful to have, in SDG, the fact that any differentiable curve is, literally, infinitesimally a line.  Then the point is that while the gravitational “field” is a second-order DE, so not a field in this sense, the arguments for GR can be reproduced nicely in SDG by talking about infinitesimally-close families of curves following geodesics.  Gonzalo’s slides are brief by necessity, but happily, more details of this are in his paper on the subject.

The other sessions I went to were mostly given by philosophers, rather than physicists or mathematicians, though with exceptions.  I’ll briefly present my own biased and personal highlights of what I attended.  They included sessions titled:

Quantum Physics“: Edward Slowik talked about the “prehistory of quantum gravity”, basically revisiting the debate between Newton and Leibniz on absolute versus relational space, suggesting that Leibniz’ view of space as a classification of the relation of his “monads” is more in line with relational theories such as spin foams etc.  M. Silberstein and W. Stuckey – gave a talk about their “relational blockworld” (described here) which talks about QFT as an approximation to a certain discrete theory, built on a graph, where the nodes of the graph are spacetime events, and using an action functional on the graph.

Meinard Kuhlmann gave an interesting talk about “trope bundles” and AQFTTrope ontology is an approach to “entities” that doesn’t assume there’s a split between “substrates” (which have no properties themselves), and “properties” which they carry around.  (A view of ontology that goes back at least to Aristotle’s “substance” and “accident” distinction, and maybe further for all I know).  Instead, this is a “one-category” ontology – the basic things in this ontology are “tropes”, which he defined as “individual property instances” (i.e. as opposed to abstract properties that happen to have instances).  “Things” then, are just collections of tropes.  To talk about the “identity” of a thing means to pick out certain of the tropes as the core ones that define that thing, and others as peripheral.  This struck me initially as a sort of misleading distinction we impose (say, “a sphere” has a core trope of its radial symmetry, and incidental tropes like its colour – but surely the way of picking the object out of the world is human-imposed), until he gave the example from AQFT.  To make a long story short, in this setup, the key entites are something like elementary particles, and the core tropes are those properties that define an irreducible representation of a $C^{\star}$-algebra (things like mass, spin, charge, etc.), whereas the non-core tropes are those that identify a state vector within such a representation: the attributes of the particle that change over time.

I’m not totally convinced by the “trope” part of this (surely there are lots of choices of the properties which determine a representation, but I don’t see the need to give those properties the burden of being the only ontologically primaries), but I also happen to like the conclusions because in the 2Hilbert picture, irreducible representations are states in a 2-Hilbert space, which are best thought of as morphisms, and the state vectors in their components are best thought of in terms of 2-morphisms.  An interpretation of that setup says that the 1-morphism states define which system one’s talking about, and the 2-morphism states describe what it’s doing.

New Directions Concerning Quantum Indistinguishability“: I only caught a couple of the talks in this session, notably missing Nick Huggett’s “Expanding the Horizons of Quantum Statistical Mechanics”.  There were talks by John Earman (“The Concept of Indistinguishable Particles in Quantum
Mechanics”), and by Adam Caulton (based on work with Jeremy Butterfield) on “On the Physical Content of the Indistinguishability Postulate”.  These are all about the idea of indistinguishable particles, and the statistics thereof.  Conventionally, in QM you only talk about bosons and fermions – one way to say what this means is that the permutation group $S_n$ naturally acts on a system of $n$ particles, and it acts either trivially (not altering the state vector at all), or by sign (each swap of two particles multiplies the state vector by a minus sign).  This amounts to saying that only one-dimensional representations of $S_n$ occur.  It is usually justified by the “spin-statistics theorem“, relating it to the fact that particles have either integer or half-integer spins (classifying representations of the rotation group).  But there are other representations of $S_n$, labelled by Young diagrams, though they are more than one-dimensional.  This gives rise to “paraparticle” statistics.  On the other hand, permuting particles in two dimensions is not homotopically trivial, so one ought to use the braid group $B_n$, rather than $S_n$, and this gives rise again to different statistics, called “anyonic” statistics.

One recurring idea is that, to deal with paraparticle statistics, one needs to change the formalism of QM a bit, and expand the idea of a “state vector” (or rather, ray) to a “generalized ray” which has more dimensions – corresponding to the dimension of the representation of $S_n$ one wants the particles to have.  Anyons can be dealt with a little more conventionally, since a 2D system may already have them.  Adam Caulton’s talk described how this can be seen as a topological phenomenon or a dynamical one – making an analogy with the Bohm-Aharonov effect, where the holonomy of an EM field around a solenoid can be described either dynamically with an interacting Lagrangian on flat space, or topologically with a free Lagrangian in space where the solenoid has been removed.

Quantum Mechanics“: A talk by Elias Okon and Craig Callender about QM and the Equivalence Principle, based on this.  There has been some discussion recently as to whether quantum mechanics is compatible with the principle that relates gravitational and inertial mass.  They point out that there are several versions of this principle, and that although QM is incompatible with some versions, these aren’t the versions that actually produce general relativity.  (For example, objects with large and small masses fall differently in quantum physics, because though the mean travel time is the same, the variance is different.  But this is not a problem for GR, which only demands that all matter responds dynamically to the same metric.)  Also, talks by Peter Lewis on problems with the so-called “transactional interpretation” of QM, and Bryan Roberts on time-reversal.

Why I Care About What I Don’t Yet Know“:  A funny name for a session about time-asymmetry, which is the essentially philosophical problem of why, if the laws of physics are time-symmetric (which they approximately are for most purposes), what we actually experience isn’t.  Personally, the best philosophical account of this I’ve read is Huw Price’s “Time’s Arrow“, though Reichenbach’s “The Direction of Time” has good stuff in it also, and there’s also Zeh’s more technical “The Physical Basis of the Direction of Time“. In the session, Chris Suhler and Craig Callender gave an account of how, given causal asymmetry, our subjective asymmetry of values for the future and the past can arise (the intuitively obvious point being that if we can influence the future and not the past, we tend to value it more).  Mathias Frisch talked about radiation asymmetry (the fact that it’s equally possible in EM to have waves converging on a source than spreading out from it, yet we don’t see this).  Owen Maroney argued that “There’s No Route from Thermodynamics to the Information Asymmetry” by describing in principle how to construct a time-reversed (probabilisitic) computer.  David Wallace spoke on “The Logic of the Past Hypothesis”, the idea inspired by Boltzmann that we see time-asymmetry because there is a point in what we call the “past” where entropy was very low, and so we perceive the direction away from that state as “forward” it time because the world tends to move toward equilibrium (though he pointed out that for dynamical reasons, the world can easily stay far away from equilibrium for a long time).  He went on to discuss the logic of this argument, and the idea of a “simple” (i.e. easy-to-describe) distribution, and the conjecture that the evolution of these will generally be describable in terms of an evolution that uses “coarse graining” (i.e. that repeatedly throws away microscopic information).

The Emergence of Spacetime in Quantum Theories of Gravity“:  This session addressed the idea that spacetime (or in some cases, just space) might not be fundamental, but could emerge from a more basic theory.  Christian Wüthrich spoke about “A-Priori versus A-Posteriori” versions of this idea, mostly focusing on ideas such as LQG and causal sets, which start with discrete structures, and get manifolds as approximations to them.  Nick Huggett gave an overview of noncommutative geometry for the philosophically minded audience, explaining how an algebra of observables can be treated like space by means of all the concepts from geometry which can be imported into the theory of $C^{\star}$-algebras, where space would be an approximate description of the algebra by letting the noncommutativity drop out of sight in some limit (which would be described as a “large scale” limit).  Sean Carroll discussed the possibility that “Space is Not Fundamental – But Time Might Be”, pointing out that even in classical mechanics, space is not a fundamental notion (since it’s possible to reformulate even Hamiltonian classical mechanics without making essential distinctions between position and momentum coordinates), and suggesting that space arises from the dynamics of an actual physical system – a Hamiltonian, in this example – by the principle “Position Is The Thing In Which Interactions Are Local”.  Finally, Sean Maudlin gave an argument for the fundamentality of time by showing how to reconstruct topology in space from a “linear structure” on points saying what a (directed!) path among the points is.

Looks like the Standard Model is having a bad day – Fermilab has detected CP-asymmetry about 50 times what it predicts in some meson decay. As they say – it looks like there might be some new physics for the LHC to look into.

That said, this post is mostly about a particular voting system which has come back into the limelight recently, but also runs off on a few tangents about social choice theory and the assumptions behind it. I’m by no means expert in the mathematical study of game theory and social choice theory, but I do take an observer’s interest in them.

A couple of years ago, during an election season, I wrote a post on Arrow’s theorem, which I believe received more comments than any other post I’ve made in this blog – which may only indicate that it’s more interesting than my subject matter, but I suppose is also a consequence of mentioning anything related to politics on the Internet. Arrow’s theorem is in some ways uncontroversial – nobody disputes that it’s true, and in fact the proof is pretty easy – but what significance, if any, it has for the real world can be controversial. I’ve known people who wouldn’t continue any conversation in which it was mentioned, probably for this reason.

On the other hand, voting systems are now in the news again, as they were when I made the last post (at least in Ontario, where there was a referendum on a proposal to switch to the Mixed Member Proportional system). Today it’s in the United Kingdom, where the new coalition government includes the Liberal Democrats, who have been campaigning for a long time (longer than it’s had that name) for some form of proportional representation in the British Parliament. One thing you’ll notice if you click that link and watch the video (featuring John Cleese), is that the condensed summary of how the proposed system would work doesn’t actually tell you… how the proposed system would work. It explains how to fill out a ballot (with rank-ordering of candidates, instead of selecting a single one), and says that the rest is up to the returning officer. But obviously, what the returning officer does with the ballot is the key of the whole affair.

In fact, collecting ordinal preferences (that is, a rank-ordering of the options on the table) is the starting point for any social choice algorithm in the sense that Arrow’s Theorem talks about. The “social choice problem” is to give a map from the set of possible preference orders for each individual, and produce a “social” preference order, using some algorithm. One can do a wide range of things with this information: even the “first-past-the-post” system can start with ordinal preferences: this method just counts the number of first-place rankings for each option, ranks the one with the largest count first, and declares indifference to all the rest.

The Lib-Dems have been advocating for some sort of proportional representation, but there are many different systems that fall into that category and they don’t all work the same way. The Conservatives have promised some sort of referendum on a new electoral system involving the so-called “Alternative Vote”, also called Instant Runoff Voting (IRV), or the Australian Ballot, since it’s used to elect the Australian legislature.

Now, Arrow’s theorem says that every voting system will fail at least one of the conditions of the theorem. The version I quoted previously has three conditions: Unrestricted Range (no candidate is excluded by the system before votes are even counted); Monotonicity (votes for a candidate shouldn’t make them less likely to win); and Independence of Irrelevant Alternatives (if X beats Y one-on-one, and both beat Z, then Y shouldn’t win in a three-way race). Most voting systems used in practice fail IIA, and surprisingly many fail monotonicity. Both possibilities allow forms of strategic voting, in which voters can sometimes achieve a better result, according to their own true preferences, by stating those preferences falsely when they vote. This “strategic” aspect to voting is what ties this into game theory.

In this case, IRV fails both IIA and monotonicity. In fact, this is involved with the fact that IRV also fails the Condorcet condition which says that if there’s a candidate X who beats every other candidate one-on-one, X should win a multi-candidate race (which, obviously, can only happen if the voting system fails IIA).

So the IRV algorithm, one effectively uses the preference ordering to “simulate” a runoff election, in which people vote for their first choice from $n$ candidates, then the one with the fewest votes is eliminated, and the election is held again with $(n-1)$ candidates, and so on until a single winner emerges. In IRV, this is done by transferring the votes for the discarded candidate to their second-choice candidate, recounding, discarding again, and so on. (The proposal in the UK would be to use this system in each constituency to elect individual MP’s.)

Here’s an example of how IRV might fail these criteria, and permit strategic voting. The way assumes a close three-way election, but this isn’t the only possibility.

Suppose there are three candidates: X, Y, and Z. There are six possible preference orders a voter could have, but to simplify, we’ll suppose that only three actually occur, as follows:

 Percentage Choice 1 Choice 2 Choice 3 36 X Z Y 33 Y Z X 31 Z Y X

One could imagine Z is a “centrist” candidate somewhere between X and Y. It’s clear here that Z is the Condorcet winner: in a two-person race with either X or Y, Z would win by nearly a 2-to-1 margin. Yet under IRV, Z has the fewest first-choice ballots, and so is eliminated, and Y wins the second round. So IRV fails the Condorcet criterion. It also fails the Independence of Irrelevant Alternatives, since X is loses in a two-candidate vote against either Y or Z (by 64-36), hence should be “irrelevant”, yet the fact that X is on the ballot causes Z to lose to Y, whom Z would otherwise beat

This tends to undermine the argument for IRV that it eliminates the “spoiler effect” (another term for the failure of IIA): here, Y is the “spoiler”.

The failure of monotonicity is well illustrated by a slightly differente example, where Z-supporters are split between X and Y, say 16-15. Then X-supporters can get a better result for themselves if 6 of their 36 percent lie, and rank Y first instead of X (even though they like Y the least), followed by X. This would mean only 30% rank X first, so X is eliminated, and Y runs against Z. Then Z wins 61-39 against Y, which X-supporters prefer. Thus, although the X supporters switched to Y – who would otherwise have won – Y now loses. (Of course, switching to Z would also have worked – but this shows that in increase of support for the winning candidate could actually cause that candidate to LOSE, if it comes from the right place). This kind of strategic voting can happen with any algorithm that proceeds in multiple rounds.

Clearly, though, this form of strategic voting is more difficult than the kind seen in FPTP – “vote for your second choice to vote against your third choice”, which is what usually depresses the vote for third parties, even those who do well in polls. Strategic voting always involves having some advance knowledge about what the outcome of the election is likely to be, and changing one’s vote on that basis: under FPTP, this means knowing, for instance, that your favourite candidate is a distant third in the polls, and your second and third choices are the front-runners. Under IRV, it involves knowing the actual percentages much more accurately, and coordinating more carefully with others (to make sure that not too many people switch, in the above example). This sort of thing is especially hard to do well if everyone else is also voting strategically, disguising their true preferences, which is where the theory of such games with imperfect information gets complicated.

So there’s an argument that in practice strategic voting matters less under IRV.

Another criticism of IRV – indeed, of any voting system that selects a single-candidate per district – is that it tends toward a two party system. This is “Duverger’s Law“, (which if it is a law in the sense of a theorem, it must be one of those facts about asymptotic behaviour that depend on a lot of assumptions, since we have a FPTP system in Canada, and four main parties). Whether this is bad or not is contentious – which illustrates the gap between analysis and conclusions about the real world. Some say two-party systems are bad because they disenfranchise people who would otherwise vote for small parties; others say they’re good because they create stability by allowing governing majorities; still others (such as the UK’s LibDems) claim they create instability, by leading to dramatic shifts in ruling party, instead of quantitative shifts in ruling coalitions. As far as I know, none of these claims can be backed up with the kind of solid analysis one has with strategic voting.

Getting back to strategic voting: perverse voting scenarios like the ones above will always arise when the social choice problem is framed as finding an algorithm taking $n$ voters’ preference orders, and producing a “social” preference order. Arrow’s theorem says any such algorithm will fail one of the conditions mentioned above, and the Gibbard-Satterthwaite theorem says that some form of strategic voting will always exist to take advantage of this, if the algorithm has unlimited range. Of course, a “limited range” algorithm – for example, one which always selects the dictator’s preferred option regardless of any votes cast – may be immune to strategic voting, but not in a good way. (In fact, the GS theorem says that if strategic voting is impossible, the system is either dictatorial or a priori excludes some option.)

One suggestion to deal with Arrow’s theorem is to frame the problem differently. Some people advocate Range Voting (that’s an advocacy site, in the US context – here is one advocating IRV which describes possible problems with range voting – though criticism runs both ways). I find range voting interesting because it escapes the Arrow and Gibbard-Satterthwaite theorems; this in turn is because it begins by collecting cardinal preferences, not ordinal preferences, from each voter, and produces cardinal preferences as output. That is, voters give each option a score in the range between 0% and 100% – or 0.0 and 10.0 as in the Olympics. The winner (as in the Olympics) is the candidate with the highest total score. (There are some easy variations in non-single-winner situations: take the candidates with the top $n$ scores, or assign seats in Parliament proportional to total score using a variation on the same scheme). Collecting more information evades the hypotheses of these theorems. The point is that Arrow’s theorem tells us there are fundamental obstacles to coherently defining the idea of the “social preference order” by amalgamating individual ones. There’s no such obstacle to defining a social cardinal preference: it’s just an average.  Then, too: it’s usually pretty clear what a preference order means – it’s less clear for cardinal preferences; so the extra information being collected might not be meaningful.  After all: many different cardinal preferences give the same order, and these all look the same when it comes to behaviour.

Now, as the above links suggest, there are still some ways to “vote tactically” with range voting, but many of the usual incentives to dishonesty (at least as to preference ORDER) disappear. The incentives to dishonesty are usually toward exaggeration of real preferences. That is, falsely assigning cardinal values to ordinal preferences: if your preference order is X > Y > Z, you may want to assign 100% to X, and 0% to Y and Z, to give your preferred candidate the strongest possible help. Another way to put this is: if there are $n$ candidates, a ballot essentially amounts to choosing a vector in $\mathbb{R}^n$, and vote-counting amounts to taking an average of all the vectors. Then assuming one knew in advance what the average were going to be, the incentive in voting is to pick a vector pointing from the actual average to the outcome you want.

But this raises the same problem as before: the more people can be expected to vote strategically, the harder it is to predict where the actual average is going to be in advance, and therefore the harder it is to vote strategically.

There are a number of interesting books on political theory, social choice, and voting theory, from a mathematical point of view. Two that I have are Peter Ordeshook’s “Game Theory and Political Theory”, which covers a lot of different subjects, and William Riker’s “Liberalism Against Populism” which is a slightly misleading title for a book that is mostly about voting theory. I would recommend either of them – Ordeshook’s is the more technical, whereas Riker’s is illustrated with plenty of real-world examples.

I’m not particularly trying to advocate one way or another on any of these topics. If anything, I tend to agree with the observation in Ordeshook’s book – that a major effect of Arrow’s theorem, historically, has been to undermine the idea that one can use terms like “social interest” in any sort of uncomplicated way, and turned the focus of social choice theory from an optimization question – how to pick the best social choice for everyone – into a question in the theory of strategy games – how to maximize one’s own interests under a given social system. I guess what I’d advocate is that more people should understand how to examine such questions (and I’d like to understand the methods better, too) – but not to expect that these sorts of mathematical models will solve the fundamental issues. Those issues live in the realm of interpretation and values, not analysis.

When I made my previous two posts about ideas of “state”, one thing I was aiming at was to say something about the relationships between states and dynamics. The point here is that, although the idea of “state” is that it is intrinsically something like a snapshot capturing how things are at one instant in “time” (whatever that is), extrinsically, there’s more to the story. The “kinematics” of a physical theory consists of its collection of possible states. The “dynamics” consists of the regularities in how states change with time. Part of the point here is that these aren’t totally separate.

Just for one thing, in classical mechanics, the “state” includes time-derivatives of the quantities you know, and the dynamical laws tell you something about the second derivatives. This is true in both the Hamiltonian and Lagrangian formalism of dynamics. The Hamiltonian function, which represents the concept of “energy” in the context of a system, is based on a function $H(q,p)$, where $q$ is a vector representing the values of some collection of variables describing the system (generalized position variables, in some configuration space $X$), and the $p = m \dot{q}$ are corresponding “momentum” variables, which are the other coordinates in a phase space which in simple cases is just the cotangent bundle $T*X$. Here, $m$ refers to mass, or some equivalent. The familiar case of a moving point particle has “energy = kinetic + potential”, or $H = p^2 / m + V(q)$ for some potential function $V$. The symplectic form on $T*X$ can then be used to define a path through any point, which describes the evolution of the system in time – notably, it conserves the energy $H$. Then there’s the Lagrangian, which defines the “action” associated to a path, which comes from integrating some function $L(q, \dot{q})$ living on the tangent bundle $TX$, over the path. The physically realized paths (classically) are critical points of the action, with respect to variations of the path.

This is all based on the view of a “state” as an element of a set (which happens to be a symplectic manifold like $T*X$ or just a manifold if it’s $TX$), and both the “energy” and the “action” are some kind of function on this set. A little extra structure (symplectic form, or measure on path space) turns these functions into a notion of dynamics. Now a function on the space of states is what an observable is: energy certainly is easy to envision this way, and action (though harder to define intuitively) counts as well.

But another view of states which I mentioned in that first post is the one that pertains to statistical mechanics, in which a state is actually a statisticial distribution on the set of “pure” states. This is rather like a function – it’s slightly more general, since a distribution can have point-masses, but any function gives a distribution if there’s a fixed measure $d\mu$ around to integrate against – then a function like $H$ becomes the measure $H d\mu$. And this is where the notion of a Gibbs state comes from, though it’s slightly trickier. The idea is that the Gibbs state (in some circumstances called the Boltzmann distribution) is the state a system will end up in if it’s allowed to “thermalize” – it’s the maximum-entropy distribution for a given amount of energy in the specified system, at a given temperature $T$. So, for instance, for a gas in a box, this describes how, at a given temperature, the kinetic energies of the particles are (probably) distributed. Up to a bunch of constants of proportionality, one expects that the weight given to a state (or region in state space) is just $exp(-H/T)$, where $H$ is the Hamiltonian (energy) for that state. That is, the likelihood of being in a state is inversely proportional to the exponential of its energy – and higher temperature makes higher energy states more likely.

Now part of the point here is that, if you know the Gibbs state at temperature $T$, you can work out the Hamiltonian
just by taking a logarithm – so specifying a Hamiltonian and specifying the corresponding Gibbs state are completely equivalent. But specifying a Hamiltonian (given some other structure) completely determines the dynamics of the system.

This is the classical version of the idea Carlo Rovelli calls “Thermal Time”, which I first encountered in his book “Quantum Gravity”, but also is summarized in Rovelli’s FQXi essay “Forget Time“, and described in more detail in this paper by Rovelli and Alain Connes. Mathematically, this involves the Tomita flow on von Neumann algebras (which Connes used to great effect in his work on the classification of same). It was reading “Forget Time” which originally got me thinking about making the series of posts about different notions of state.

Physically, remember, these are von Neumann algebras of operators on a quantum system, the self-adjoint ones being observables; states are linear functionals on such algebras. The equivalent of a Gibbs state – a thermal equilibrium state – is called a KMS (Kubo-Martin-Schwinger) state (for a particular Hamiltonian). It’s important that the KMS state depends on the Hamiltonian, which is to say the dynamics and the notion of time with respect to which the system will evolve. Given a notion of time flow, there is a notion of KMS state.

One interesting place where KMS states come up is in (general) relativistic thermodynamics. In particular, the effect called the Unruh Effect is an example (here I’m referencing Robert Wald’s book, “Quantum Field Theory in Curved Spacetime and Black Hole Thermodynamics”). Physically, the Unruh effect says the following. Suppose you’re in flat spacetime (described by Minkowski space), and an inertial (unaccelerated) observer sees it in a vacuum. Then an accelerated observer will see space as full of a bath of particles at some temperature related to the acceleration. Mathematically, a change of coordinates (acceleration) implies there’s a one-parameter family of automorphisms of the von Neumann algebra which describes the quantum field for particles. There’s also a (trivial) family for the unaccelerated observer, since the coordinate system is not changing. The Unruh effect in this language is the fact that a vacuum state relative to the time-flow for an unaccelerated observer is a KMS state relative to the time-flow for the accelerated observer (at some temperature related to the acceleration).

The KMS state for a von Neumann algebra with a given Hamiltonian operator has a density matrix $\omega$, which is again, up to some constant factors, just the exponential of the Hamiltonian operator. (For pure states, $\omega = |\Psi \rangle \langle \Psi |$, and in general a matrix becomes a state by $\omega(A) = Tr(A \omega)$ which for pure states is just the usual expectation value value for A, $\langle \Psi | A | \Psi \rangle$).

Now, things are a bit more complicated in the von Neumann algebra picture than the classical picture, but Tomita-Takesaki theory tells us that as in the classical world, the correspondence between dynamics and KMS states goes both ways: there is a flow – the Tomita flow – associated to any given state, with respect to which the state is a KMS state. By “flow” here, I mean a one-parameter family of automorphisms of the von Neumann algebra. In the Heisenberg formalism for quantum mechanics, this is just what time is (i.e. states remain the same, but the algebra of observables is deformed with time). The way you find it is as follows (and why this is right involves some operator algebra I find a bit mysterious):

First, get the algebra $\mathcal{A}$ acting on a Hilbert space $H$, with a cyclic vector $\Psi$ (i.e. such that $\mathcal{A} \Psi$ is dense in $H$ – one way to get this is by the GNS representation, so that the state $\omega$ just acts on an operator $A$ by the expectation value at $\Psi$, as above, so that the vector $\Psi$ is standing in, in the Hilbert space picture, for the state $\omega$). Then one can define an operator $S$ by the fact that, for any $A \in \mathcal{A}$, one has

$(SA)\Psi = A^{\star}\Psi$

That is, $S$ acts like the conjugation operation on operators at $\Psi$, which is enough to define $S$ since $\Psi$ is cyclic. This $S$ has a polar decomposition (analogous for operators to the polar form for complex numbers) of $S = J \Delta$, where $J$ is antiunitary (this is conjugation, after all) and $\Delta$ is self-adjoint. We need the self-adjoint part, because the Tomita flow is a one-parameter family of automorphisms given by:

$\alpha_t(A) = \Delta^{-it} A \Delta^{it}$

An important fact for Connes’ classification of von Neumann algebras is that the Tomita flow is basically unique – that is, it’s unique up to an inner automorphism (i.e. a conjugation by some unitary operator – so in particular, if we’re talking about a relativistic physical theory, a change of coordinates giving a different $t$ parameter would be an example). So while there are different flows, they’re all “essentially” the same. There’s a unique notion of time flow if we reduce the algebra $\mathcal{A}$ to its cosets modulo inner automorphism. Now, in some cases, the Tomita flow consists entirely of inner automorphisms, and this reduction makes it disappear entirely (this happens in the finite-dimensional case, for instance). But in the general case this doesn’t happen, and the Connes-Rovelli paper summarizes this by saying that von Neumann algebras are “intrinsically dynamic objects”. So this is one interesting thing about the quantum view of states: there is a somewhat canonical notion of dynamics present just by virtue of the way states are described. In the classical world, this isn’t the case.

Now, Rovelli’s “Thermal Time” hypothesis is, basically, that the notion of time is a state-dependent one: instead of an independent variable, with respect to which other variables change, quantum mechanics (per Rovelli) makes predictions about correlations between different observed variables. More precisely, the hypothesis is that, given that we observe the world in some state, the right notion of time should just be the Tomita flow for that state. They claim that checking this for certain cosmological models, like the Friedman model, they get the usual notion of time flow. I have to admit, I have trouble grokking this idea as fundamental physics, because it seems like it’s implying that the universe (or any system in it we look at) is always, a priori, in thermal equilibrium, which seems wrong to me since it evidently isn’t. The Friedman model does assume an expanding universe in thermal equilibrium, but clearly we’re not in exactly that world. On the other hand, the Tomita flow is definitely there in the von Neumann algebra view of quantum mechanics and states, so possibly I’m misinterpreting the nature of the claim. Also, as applied to quantum gravity, a “state” perhaps should be read as a state for the whole spacetime geometry of the universe – which is presumably static – and then the apparent “time change” would then be a result of the Tomita flow on operators describing actual physical observables. But on this view, I’m not sure how to understand “thermal equilibrium”.  So in the end, I don’t really know how to take the “Thermal Time Hypothesis” as physics.

In any case, the idea that the right notion of time should be state-dependent does make some intuitive sense. The only physically, empirically accessible referent for time is “what a clock measures”: in other words, there is some chosen system which we refer to whenever we say we’re “measuring time”. Different choices of system (that is, different clocks) will give different readings even if they happen to be moving together in an inertial frame – atomic clocks sitting side by side will still gradually drift out of sync. Even if “the system” means the whole universe, or just the gravitational field, clearly the notion of time even in General Relativity depends on the state of this system. If there is a non-state-dependent “god’s-eye view” of which variable is time, we don’t have empirical access to it. So while I can’t really assess this idea confidently, it does seem to be getting at something important.

In my post about my short talk at CQC, I mentioned that the groupoidification program in physics is based on a few simple concepts (most research programs are, I suppose). The ones I singled out are: state, symmetry, and history. But since concepts tend to seem simpler if you leave them undefined, there are bound to be subtleties here. Recently I’ve been thinking about the first one, state. What is a state? What is this supposedly simple concept?

Etymology isn’t an especially reliable indicator of what a word means, or even the history of a concept (words change meanings, and concepts shift over time), but it’s sometimes interesting to trace. The English word “state” comes from the Latin verb stare, meaning “to stand”, whose past participle is status, which is also borrowed directly into English. The Proto-Indoeuropean root sta- also means “stand”, which in turn comes from this root, but this time via Germanic (along with “standard”). However, most of the words with this root come via various Latin intermediaries: state, stable, status, statue, stationary, station, and also substance, understand and others. The state of affairs is sometimes referred to as being “how things stand”, how they are, the current condition. Most of the words based on the sta- root imply non-motion (i.e. “stasis”). If anything, “state” (like “status”) carries this connotation less strongly than most, since the state of affairs can change – but it emphasizes how things stand now and not how they’re changing. From this sense, we also get the political meaning of “a state”, a reified version of a term originally meaning the political condition of a country (by analogy with Latin expressions like status rei publicae, the “condition of public affairs”).

So, narrowing focus now, the “state” of a physical system is the condition it’s in. In different models of physics, this is described in different ways, but in each case, by the “condition” we mean something like a complete description of all the facts about the system we can get. But this means different things in different settings. So I just want to take a look at some of them.

Think of these different settings for physics as being literally “settings” (but please excuse the pun) of the switches on a machine. Three of the switches are labelled Thermal, Quantum, and Relativistic. The “Thermal” switch varies whether or not we’re talking about thermodynamics or ordinary mechanics. The “Quantum” switch varies whether we’re talking about a quantum or classical system.

The “Relativistic” switch, which I’ll ignore for this post, specifies what kind of invariance we have: Galileian for Newton’s physics; Lorentzian for Special Relativity; general covariance for General Relativity. But this gets into dynamics, and “state” implies things are, well, static – that is, it’s about kinematics. At the very least, in Relativity, it’s not canonical what you mean by “now”, and so the definition of a state must include choosing a reference frame (in SR), or a Cauchy hypersurface (in GR). So let’s gloss over that for now.

When all these switches are in the “off” position, we have classical mechanics. Here, we think of a state as – at a first level of approximation, an element of a set. Now, for serious classical mechanics, this set will be a symplectic manifold, like the cotangent bundle $T^*M$ of some manifold $M$. This is actually a bit subtle already, since a point in $T^*M$ represents a collection of positions and momenta (or some generalization thereof): that is, we can start with a space of “static” configurations, parametrized by the values of some observable quantities, but a state (contrary to what etymology suggests) also includes momenta describing how those quantities are changing with time (which, in classical mechanics, is a fairly unproblematic notion).

The Hamiltonian picture of the dynamics of the system then tells us: given its state, what will be the acceleration, which we can then use to calculate states at future time. This requires a Hamiltonian, $H$, which we think of as the energy, which can be calculated from the state. So, for example, kinetic plus potential energy: in the case of a particle moving in a potential on a line, $H = K + V = p^2/m + V(q)$. The space of states can be described without much reference to the Hamiltonian, but once we have $H$, we get a flow on that space, transforming old states into new states with time.

Now if we turn on the “Thermal” switch, we have a different notion of state. The standard image for the classical mechanical system is that we may be talking about a particle, or a few particles, or perhaps a rigid object, moving in space, maybe subject to some constraints. In thermodynamics, we are thinking of a statistical ensemble of objects – in the simplest case, $N$ identical objects – and want to ask how energy is distributed among them. The standard image is of a box full of gas at some temperature: it’s full of molecules, each with its own trajectory, and they interact through collisions and exchange energy and momentum. Rather than tracking the exact positions of molecules, in thermodynamics a “state” is a distribution, or more precisely a probability measure, on the space of such states. We don’t assume we know the detailed microstate of the system – the positions and momenta of all the particles in the gas – but only something about how these are distributed among them. This reflects the real fact that we can only measure things like pressure, temperature, etc. The measure is telling us the proportion of particles with positions and momenta in a given range.

This is a big difference for something described by the same word “state”. Even assuming our underlying space of “microstates” is still the same $T^*M$, the state is no longer a point. One way to interpret the difference is that here the state is something epistemic. It describes what we know about the system, rather than everything about it. The measure answers the question: “given what we know, what is the likelihood the system is in microstate X?” for each X. Now, of course, we could take a space of all such measures: given our previous classical system, it’s a space of functionals on $C(T^*M)$. Then the state can again be seen as an element of a set. But it’s more natural to keep in view its nature as a measure, or, if it’s nice enough, as a positive function on the space of states. (It’s interesting that this is an object of the same type as the Hamiltonian – this is, intuitively, the basis of what Carlo Rovelli calls the “Thermal Time Hypothesis”, summarized here, which is secretly why I wanted to write on this topic. But more on that in a later post. For one thing, before I can talk about it, I have to talk about what comes next.)

Now turn off the “Thermal” switch, and think about the “Quantum” switch. Here there are a couple of points of view.

To begin with, we describe a system in terms of a Hilbert space, and a state is a vector in a Hilbert space. Again, this could be described as an element of a set, but the complex linear structure is important, so we keep thinking of it as fundamental to the type of a state. In geometric quantization, one often starts with a classical system with a state space like $T^*M = X$, and then takes the Hilbert space $\mathcal{H}=L^2(X)$, so that a state is (modulo analysis issues) basically a complex-valued function on $X$. This is something like the (positive real-valued) measure which gives a thermodynamic state, but the interpretation is trickier. Of course, if $\mathcal{H}$ is an $L^2$-space, we can recover a probability measure, since the square modulus of $\phi \in \mathcal{H}$ has finite total measure (so we can normalize it). But this isn’t enough to describe $\phi$, and the extra information of phases goes missing. In any case, the probability measure no longer has the obvious interpretation of describing the statistics of a whole ensemble of identical systems – only the likelihood of measuring particular values for one system in the state $\phi$. (In fact, there are various no-go theorems getting in the way of a probablity interpretation of $\phi$, though this again involves dynamics – a recurring theme is that it’s hard to reason sensibly about states without dynamics). So despite some similarity, this concept of “state” is very different, and phase is a key part of how it’s different. I’ll be jiggered if I can say why, though: most of the “huh?” factor in quantum mechanics lives right about here.

Another way to describe the state of a quantum system is related to this probability, though. The inner product of $\mathcal{H}$ (whether we found it as an $L^2$-space or not) gives a way to talk about statistics of the system under repeated observations. Observables, which for the classical picture are described by functions on the state space $X$, are now self-adjoint operators on $\mathcal{H}$. The expectation value for an observable $A$ in the state $\phi$ is $\langle \phi | A | \phi \rangle$ (note that the Dirac notation implicitly uses self-adjointness of $A$). So the state has another, intuitively easier, interpretation: it’s a real-valued functional on observables, namely the one I just described.

The observables live in the algebra $\mathcal{A} = \mathcal{B}(\mathcal{H})$ of bounded operators on $\mathcal{H}$. Setting both Thermal and Quantum switches of our notion of “state” gives quantum statistical mechanics. Here, the “C*-algebra” (or von Neumann-algebra) picture of quantum mechanics says that really it’s the algebra $\mathcal{A}$ that’s fundamental – it corresponds to actual operations we can perform on the system. Some of them (the self-adjoint ones) represent really very intuitive things, namely observables, which are tangible, measurable quantities. In this picture, $\mathcal{H}$ isn’t assumed to start with at all – but when it is, the kind of object we’re dealing with is a density matrix. This is (roughly) a positive operator on $\mathcal{H}$ of unit trace). In general a state on a von Neumann algebra is a linear functional with unit trace.

This is analogous to the view of a state as a probability measure (positive function with unit total integral) in the classical realm: if an observable is a function on states (giving the value of that observable in each state), then a measure is indeed a functional on the space of observables. A probability measure, in fact, is the functional giving the expectation value of the observable. (And, since variance and all the higher moments of the probability distribution for that observable are themselves defined as expectation values, it also tells us all of those.)

On the other hand, the Gelfand-Naimark-Segal theorem says that, given a state $\phi : \mathcal{A} \rightarrow \mathbb{R}$, there’s a representation of $\mathcal{A}$ as an algebra of operators on some Hilbert space, and a vector $v$ for which this $\phi$ is just $\phi(A) = \langle v | A | v \rangle$. This is the GNS representation (and in fact it’s built by taking the regular representation of $\mathcal{A}$ on itself by multiplication, with $\mathcal{A}$ made into a Hilbert space by definining the inner product to make this property work, and with $v = 1$). So the view here is that a state is some kind of operation on observables – a much more epistemic view of things. So although the GNS theorem relates this to the vector-in-Hilbert-space view of “state”, they are quite different conceptually. (For one thing, the GNS representation is giving a different Hilbert space for each state, which undermines the sense that the space of ALL states is fundamentally “there”, but in both pictures $\mathcal{A}$ is the same for all states.)

(This von Neumann-algebra point of view, by the way, gets along nicely with the 2-Hilbert space lens for looking at quantum mechanics, which may partly bridges the gap between it and the Hilbert-space view. The category of representations of a von Neumann algebra is a 2-Hilbert space. A “2-vector” (or “2-state”, if you like) in this category is a representation of the algebra. So the GNS representation itself is a “2-state”. This raises the question about 2-algebras of 2-operators, and John Baez’ question: “What is the categorified GNS theorem?” But let’s leave 2-states for later along with the rest.)

So where does this leave us regarding the meaning of “state”? The classical view is that a state is an element of some (structured) set. The usual quantum picture is that a state is, depending on how precise you want to be, either a vector in a Hilbert space, or a 1-d subspace of that Hilbert space – that is, a point in the projective Hilbert space. What these two views have in common is that there is some space of all “possible worlds”, i.e. of all ways things can be in the system being studied. A state is then a way of selecting one of these. The difference is in what this space of possible worlds is like – that is, which category it lives in – and how exactly one “selects” a state. How they differ is in the possibility of taking combinations of states. As for selecting states, $Sets$ is a Cartesian category, with a terminal object $1 = {*}$: an element of a set is a map from $1$ into it. $Hilb$ is a monoidal category, but not Cartesian: selecting a single vector has no obvious categorical equivalent, though selecting a 1-D subspace amounts to a map from $\mathbb{C}$ (up to isomorphism). So the model of an “element” isn’t a singleton, it’s the complex line – and it relates to other possible spaces differently: not as a terminal object, but as a monoidal unit. This is a categorical way of saying how the idea of “state” is structurally different.

The thermal point of view is a little more epistemically subtle: for both classical and quantum pictures, it’s best thought of as, not a possible world, but a function acting on observables (that is, conditions of knowledge). In the classical picture, this is directly related to a space of possible worlds – it’s a measure on it, which we can think of as saying how a large ensemble of systems are distributed in that space. In the quantum picture, in some ways the most (epistemically) natural view, in terms of von Neumann algebras, breaks the connection to this notion of “possible worlds” altogether, since $\mathcal{A}$ has representations on many different Hilbert spaces?

So a philosophical question is: what do these different concepts have in common that lets us use them all to represent the “same” root idea? Without actually answering this, I’ll just mention that at some point I’d like to talk a bit about “2-states” as 2-vectors, and in general how to categorify everything above.

It’s taken me a while to write this up, since I’ve been in the process of moving house – packing and unpacking and all the rest. However, a bit over a week ago, I was in Montreal, attending MakkaiFest ’09 at the Centre de Recherches Mathematiques at the University of Montréal (and a pre-conference workshop hosted at McGill, which I’m including in the talks I mention here). This was in honour of the 70th birthday of Mihaly (Michael) Makkai, of McGill University. Makkai has done a lot of important foundational work in logic, model theory, and category theory, and a great many of the talks were from former students who’d gone on and been inspired by him, so one got sense of the range of things he’s worked on through his life.

The broad picture of Makkai’s work was explained to us by J.P. Marquis, from the Philosophy department at U of M. He is interested in philosophy of mathematics, and described Makkai’s project by contrast with the program of axiomatization of the early 20th century, along the lines suggested by Hilbert. This program provided a formal language for concrete structures – the problem, which category theory is part of a solution to, is to do the same for abstract structures. Contrast, for instance, the concrete description of a group $G$ as a (particular) set with some (particular) operation, with the abstract definition of a group object in a category. Makkai’s work in categorical logic, said Marquis, is about formalizing the process of abstraction that example illustrates.

Model Theory/Logic

This matter – of the relation between abstract theories and concrete models of the theories – is really what model theory is about, and this is one of the major areas Makkai has worked on. Roughly, a theory is most basically a schema with symbols for types, members of types, and some function symbols – and a collection of sentences built using these symbols (usually generated from some axioms by rules of logical inference). A model is (intuitively), an interpretation of the terms: a way of assigning concrete data to the symbols – say, a symbol for a type is assigned the set of all entities of that type, and a function symbol is assigned an actual function between sets, and so on – making all propositions true. A morphism of models is a map that preserves all the properties of the model that can be stated using first order logic.

This is an older way to say things – Victor Harnik gave an expository talk called “Model Theory vs. Categorical Logic” in which he compared two ways of adding an equivalence relation to a theory. The model theory way (invented by Shelah) involves taking the theory (list of sentences) $T$ and extending it to a new theory $T^{eq}$. This has, for instance, some new types – if we had a type for “element of group”, for example, we might then get a new type “equivalence class of elements of group”, and so on. Now, this extension is “tight” in the sense that the categories of all models of $T$ and of $T^{eq}$ are equivalent (by a forgetful functor $Mod(T^{eq}) \rightarrow Mod(T)$) – but one can prove new theorems in the extended theory. To make this clear, he described work (due to Makkai and Reyes) about pretopos completion. Here, one has the concept of a “Boolean logical category” – $Set$ is an example, as is, for any theory, a certain category whose objects are the formulas of the theory. This is related to Lawvere theories (see below). There are logical functors between such categories – functors into $Set$ are models, but there are also logical functors between theories. The point is that a theory $T$ embeds into $T^{eq}$ (abusing notation here – these are now the boolean logical categories). Then the point is that $T^{eq}$ arises as a kind of completion of $T$ – namely, it’s a boolean pretopos (not just category). Moreover, it has some nice universal properties, making this point of view a bit more natural than the model-theoretic construction.

Bradd Hart’s talk, “Conceptual Completeness for Cantinuous Logic”, was a bit over my head, but made some use of this kind of extension of a theory to $T^{eq}$. The basic point seems to be to add some kind of continuous structure to logic. One example comes from a metric structure – defining a metric space of terms, where the metric function $d(x,y)$ is some sum $\sum_n \phi_n (x,y)$, where the $\phi_n$ are formulas with two variables, either true or false – where true gives a $0$, and false gives a $1$ in this sum. This defines a distance from $x$ to $y$ associated to the given list of formulas $\phi_n$. A continuous logic is one with a structure like this. The business about equivalence relations arises if we say two things are equivalent when the distance between them is 0 – this leads to a concept of completion, and again there’s a notion that the categories of models are equivalent (though proving it here involves some notion of approximating terms to arbitrary epsilon, which doesn’t appear in standard logic).

Anand Pillay gave a talk which used model theory to describe some properties of the free group on n generators. This involved a “theory of the free group” which applies to any free group, and regard each such group as a model of the theory – in fact a submodel of some large model, and using model-theoretic methods to examine “stability” properties, in some sense which amounts to a notion of defining “generic” subsets of the group.

Logic and Higher Categories

A number of talks specifically addressed the ground where logic meets higher dimensional categories, since Makkai has worked with both.

In one talk, Robert Paré described a way of thinking about first-order theories as examples of “double Lawvere theories”. Lawvere’s way of formalizing “theories and models” was to say that the theory is a category itself (which has just the objects needed to describe the kind of structure it’s a theory of) – and a model is a functor into $Sets$ (or some other category – a model of the theory of groups in topological spaces, say, is a topological group). For example, the theory of groups includes an object $G$ and powers of it, multiplication and inverse maps, and expresses the axioms by the fact that certain diagrams commute. A model is a functor $M : Th(Grp) \rightarrow Sets$, assigning to the “group object” a set of elements, which then get the group structure from the maps. Instead of a category, this uses a double category. There are two kinds of morphisms – horizontal and vertical – and these are used to represent two kinds of symbols: function symbols, and relation symbols. (For example, one can talk about the theory of an ordered field – so one needs symbols for multiplication and addition and so forth, but also for the order relation $\leq$). Then a model of such a theory is a double functor into the double category whose objects are sets, and whose horizontal and vertical morphisms are respectively functions and relations.

André Joyal gave a talk about the first order logic of higher structures. He started by commenting on some fields which began life close together, and are now gradually re-merging: logic and category theory; category theory and homotopy theory (via higher categories); homotopy theory and algebraic geometry. The higher categories Joyal was thinking of are quasicategories, or “$( \infty, 1)$-categories, which are simplicial sets satisfying a weak version of a horn-filling condition (the “strict” version of this, a Kan complex, includes as example $N(C)$, the nerve of a category $C$ – there’s an n-simplex for each sequence of n composable morphisms, whose other edges are the various composites, and whose faces are “compositors”, “associators”, and so on – which for $N(C)$ are identities). The point of this is that one can reproduce most of category theory for quasicategories – in particular, he mentioned limits and colimits, factorization systems, pretoposes, and model theory.

Moving to quasicategories on one side of the parallel between category theory and logic has a corresponding move on the other side – on the logic side, one aspect is that the usual notion of a language is replaced by what’s called Martin-Löf type theory. This, in fact, was the subject of Michael Warren’s talk, “Martin-Löf complexes” (I reported on a similar talk he gave at Octoberfest last year). The idea here is to start by defining a globular set, given a theory and type $A$ – a complex whose n-cells have two faces, of dimension (n-1). The 0-cells are just terms of some type $A$. The 1-cells are terms of types like $\underline{A}(a,b)$, where $a$ and $b$ are variables of type $A$ – the type has an interpretation as a proposition that $a=b$ “extensionally” (i.e. not via a proof – but as for instance when two programs with non-equivalent code happen to always produce the same output). This kind of operation can be repeated to give higher cells, like $\underline{A(a,b)}(f,g)$, and so on. Given a globular set $G$, one gets a theory by an adjoint construction. Putting the two together, one has a monad on the category of globular sets – algebras for the monad are Martin-Löf complexes. Throwing in syntactic rules to truncate higher cells (I suppose by declaring all cells to be identities) gives n-truncated versions of these complexes, $MLC_n$. Then there is some interesting homotopy theory, in that the category of n-truncated Martin-Löf complexes is expected to be a model for homotopy n-types. For example, $MLC_0$ is equivalent to $Sets$, and there is an adjunction (in fact, a Quillen equivalence – that is, a kind of “homotopy” equivalence) between $MLC_1$ and $Gpd$.

Category Theory/Higher Categories

There were a number of talks that just dealt with categories – including higher categories – in their own right. Makkai has worked, for example, on computads, which were touched on by Marek Zawadowski in one of his two talks (one in the pre-conference workshop, the other in the conference). The first was about categories of “many-to-one shapes”, which are important to computads – these are a notion of higher-category, where every cell takes many “input” faces to one “output” face. Zawadowski described a “shape” of an n-cell as an initial object in a certain category built from the category of computads with specified faces. Then there’s a category of shapes, and an abstract description of “shape” in terms of a graded tensor theory (graded for dimension, and tensor because there’s a notion of composition, I believe). Zawadowski’s second talk, “Opetopic Sets in Lax Monoidal Fibrations”, dealt with a similar topic from a different point of view. A lax monoidal fibration (LMF) is a kind of gadget for dealing with multi-level structures (categories, multicategories, quasicategories, etc). There’s a lot of stuff here I didn’t entirely follow, but just to illustrate: categories arise as LMF, by the fibration $cod : Set^{B} \rightarrow Set$, where $B$ is the category with two objects $M, O$, and two arrows from $M$ to $O$. An object in the functor category $Set^{B}$ consists of a “set of morphisms and set of objects” with maps – making this a category involves the monoidal structure, and how composition is defined, and the real point is that this is quite general machinery.

Joachim Lambek and Gonzalo Reyez, both longtime collaborators and friends of Makkai, also both gave talks that touched on physics and categories, though in very different ways. Lambek talked about the “Lorentz category” and its appearance in special relativity.  This involves a reformulation of SR in terms of biquaternions: like complex numbers, these are of the form $u + iv$, but $u$ and $v$ are quaternions.  They have various conjugation operations, and the geometry of SR can be described in terms of their algebra (just as, say, rotations in 3D can be described in terms of quaternions).  The Lorentz category is a way of organizing this – its two objects correspond to “unconjugated” and “conjugated” states.

Gonzalo Reyez gave a derivation of General Relativity in the context of synthetic differential geometry.  The substance of this derivation is not so different from the usual one, but with one exception.  Einstein’s field equations can be derived in terms of the motions of small regions full of of freely falling test particles – synthetic differential geometry makes it possible to do the same analysis using infinitesimals rigorously all the way through.  The basic point here is that in SDG one replaces the real line as usually conceived, with a “real line with infinitesimals” (think of the ring $\mathbb{R}[\epsilon]/\langle \epsilon^2 \rangle$, which is like the reals, but has the infinitesimal $\epsilon$, whose square is zero).

Among other talks: John Power talked about the correspondence between Lawvere theories in universal algebra and finitary tree monads on sets – and asked about what happens to the left hand side of this correspondence when we replace “sets” with other categories on the righ hand side. Jeff Egger talked about measure theory from a categorical point of view – namely, the correspondence of NCG between C*-algebras and “noncommutative” topological spaces, and between W*-algebras and “noncommutative” measure spaces, thought of in terms of locales. Hongde Hu talked about the “codensity theorem”, and a way to classify certain kinds of categories – he commented on how it was inspired by Makkai’s approach to mathematics: 1) Find new proofs of old theorems, (2) standardize the concepts used in them, and (3) prove new theorems with those concepts. Fred Linton gave a talk describing Heath’s “V-space”, which is a half-plane with a funny topology whose open sets are “V” shapes, and described how the topos of locally finite sheaves over it has surprising properties having to do with nonexistence of global sections. Manoush Sadrzadeh, whom I met recently at CQC (see the bottom of the previous post) was again talking about linguistics using monoidal categories – she described some rules for “clitic movement” and changes in word order, and what these rules look like in categorical terms.

Other

A few other talks are a little harder for me to fit into the broad classification above.  There was Charles Steinhorn’s talk about ordered “o-minimal” structures, which touched on a bit of economics – essentially, a lot of economics is based on the assumption that preference orders can be made into real-valued functions, but in fact in many cases one has (variants on) “lexicographic order”, involving ranked priorities.  He talked about how typically one has a space of possibilities which can be cut up into cells, with one sort of order in each cell.  There was Julia Knight, talking about computable structures of “high Scott rank” – in particular, this is about infinite structures that can still be dealt with computably – for example, infinitary logical formulas involving an infinite number of “OR” statements where all the terms being joined are of some common form.  This ends up with an analysis of certain infinite trees.  Hal Kierstead gave a talk about Ramsey theory which I found notable because it used the kind of construction based on a game: to prove that any colouring of a graph (or hypergraph) has some property, one devises a game where one player tries to build a graph, and the other tries to colour it, and proves a winning strategy for one player.  Finally, Michael Barr gave a talk about a duality between certain categories of modules over commutative rings.

All in all, an interesting conference, with plenty of food for thought.