### musing

There is no abiding thing in what we know. We change from weaker to stronger lights, and each more powerful light pierces our hitherto opaque foundations and reveals fresh and different opacities below. We can never foretell which of our seemingly assured fundamentals the next change will not affect.

H.G. Wells, A Modern Utopia

So there’s a recent paper by some physicists, two of whom work just across the campus from me at IST, which purports to explain the Pioneer Anomaly, ultimately using a computer graphics technique, Phong shading. The point being that they use this to model more accurately than has been done before how much infrared radiation is radiating and reflecting off various parts of the Pioneer spacecraft. They claim that with the new, more accurate model, the net force from this radiation is just enough to explain the anomalous acceleration.

Well, plainly, any one paper needs to be rechecked before you can treat it as definitive, but this sort of result looks good for conventional General Relativity, when some people had suggested the anomaly was evidence some other theory was needed.  Other anomalies in the predictions of GR – the rotational profiles of galaxies, or redshift data, have also suggested alternative theories.  In order to preserve GR exactly on large scales, you have to introduce things like Dark Matter and Dark Energy, and suppose that something like 97% of the mass-energy of the universe is otherwise invisible.  Such Dark entities might exist, of course, but I worry it’s kind of circular to postulate them on the grounds that you need them to make GR explain observations, while also claiming this makes sense because GR is so well tested.

In any case, this refined calculation about Pioneer is a reminder that usually the more conservative extension of your model is better. It’s not so obvious to me whether a modified theory of gravity, or an unknown and invisible majority of the universe is more conservative.

And that’s the best segue I can think of into this next post, which is very different from recent ones.

Fundamentals

I was thinking recently about “fundamental” theories.  At the HGTQGR workshop we had several talks about the most popular physical ideas into which higher gauge theory and TQFT have been infiltrating themselves recently, namely string theory and (loop) quantum gravity.  These aren’t the only schools of thought about what a “quantum gravity” theory should look like – but they are two that have received a lot of attention and work.  Each has been described (occasionally) as a “fundamental” theory of physics, in the sense of one which explains everything else.  There has been a debate about this, since they are based on different principles.  The arguments against string theory are various, but a crucial one is that no existing form of string theory is “background independent” in the same way that General Relativity is. This might be because string theory came out of a community grounded in particle physics – it makes sense to perturb around some fixed background spacetime in that context, because no experiment with elementary particles is going to have a measurable effect on the universe at infinity. “M-theory” is supposed to correct this defect, but so far nobody can say just what it is.  String theorists criticize LQG on various grounds, but one of the more conceptually simple ones would be that it can’t be a unified theory of physics, since it doesn’t incorporate forces other than gravity.

There is, of course, some philosophical debate about whether either of these properties – background independence, or unification – is really crucial to a fundamental theory.   I don’t propose to answer that here (though for the record my hunch at he moment is that both of them are important and will hold up over time).  In fact, it’s “fundamental theory” itself that I’m thinking about here.

As I suggested in one of my first posts explaining the title of this blog, I expect that we’ll need lots of theories to get a grip on the world: a whole “atlas”, where each “map” is a theory, each dealing with a part of the whole picture, and overlapping somewhat with others. But theories are formal entities that involve symbols and our brain’s ability to manipulate symbols. Maybe such a construct could account for all the observable phenomena of the world – but a-priori it seems odd to assume that. The fact that they can provide various limits and approximations has made them useful, from an evolutionary point of view, and the tendency to confuse symbols and reality in some ways is a testament to that (it hasn’t hurt so much as to be selected out).

One little heuristic argument – not at all conclusive – against this idea involves Kolmogorov complexity: wanting to explain all the observed data about the universe is in some sense to “compress” the data.  If we can account for the observations – say, with a short description of some physical laws and a bunch of initial conditions, which is what a “fundamental theory” suggests – then we’ve found an upper bound on its Kolmogorov complexity.  If the universe actually contains such a description, then that must also be a lower bound on its complexity.  Thus, any complete description of the universe would have to be as big as the whole universe.

Well, as I said, this argument fails to be very convincing.  Partly because it assumes a certain form of the fundamental theory (in particular, a deterministic one), but mainly because it doesn’t rule out that there is indeed a very simple set of physical laws, but there are limits to the precision with which we could use them to simulate the whole world because we can’t encode the state of the universe perfectly.  We already knew that.  At most, that lack of precision puts some practical limits on our ability to confirm that a given set of physical laws we’ve written down is  empirically correct.  It doesn’t preclude there being one, or even our finding it (without necessarily being perfectly certain).  The way Einstein put it (in this address, by the way) was “As far as the laws of mathematics refer to reality, they are not certain; and as far as they are certain, they do not refer to reality.”  But a lack of certainty doesn’t mean they aren’t there.

However, this got me thinking about fundamental theories from the point of view of epistemology, and how we handle knowledge.

Reduction

First, there’s a practical matter. The idea of a fundamental theory is the logical limit of one version of reductionism. This is the idea that the behaviour of things should be explained in terms of smaller, simpler things. I have no problem with this notion, unless you then conclude that once you’ve found a “more fundamental” theory, the old one should be discarded.

For example: we have a “theory of chemistry”, which says that the constituents of matter are those found on the periodic table of elements.  This theory comes in various degrees of sophistication: for instance, you can start to learn the periodic table without knowing that there are often different isotopes of a given element, and only knowing the 91 naturally occurring elements (everything up to Uranium, except Technicium). This gives something like Mendeleev’s early version of the table. You could come across these later refinements by finding a gap in the theory (Technicium, say), or a disagreement with experiment (discovering isotopes by measuring atomic weights). But even a fairly naive version of the periodic table, along with some concepts about atomic bonds, gives a good explanation of a huge range of chemical reactions under normal conditions. It can’t explain, for example, how the Sun shines – but it explains a lot within its proper scope.

Where this theory fits in a fuller picture of the world has at least two directions: more fundamental, and less fundamental, theories.  What I mean by less “fundamental” is that some things are supposed to be explained by this theory of chemistry: the great abundance of proteins and other organic chemicals, say. The behaviour of the huge variety of carbon compounds predicted by basic chemistry is supposed to explain all these substances and account for how they behave.  The millions of organic compounds that show up in nature, and their complicated behaviour, is supposed to be explained in terms of just a few elements that they’re made of – mostly carbon, hydrogen, oxygen, nitrogen, sulfur, phosphorus, plus the odd trace element.

By “more fundamental”, I mean that the periodic table itself can start to seem fairly complicated, especially once you start to get more sophisticated, including transuranic elements, isotopes, radioactive decay rates, and the like. So it was explained in terms of a theory of the atom. Again, there are refinements, but the Bohr model of the atom ought to do the job: a nucleus made of protons and neutrons, and surrounded by shells of electrons.  We can add that these are governed by the Dirac equation, and then the possible states for electrons bound to a nucleus ought to explain the rows and columns of the periodic table. Better yet, they’re supposed to explain exactly the spectral lines of each element – the frequencies of light atoms absorb and emit – by the differences of energy levels between the shells.

Well, this is great, but in practice it has limits. Hardly anyone disputes that the Bohr model is approximately right, and should explain the periodic table etc. The problem is that it’s largely an intractable problem to actually solve the Schroedinger equation for the atom and use the results to predict the emission spectrum, chemical properties, melting point, etc. of, say, Vanadium…  On the other hand, it’s equally hard to use a theory of chemistry to adequately predict how proteins will fold. Protein conformation prediction is a hard problem, and while it’s chugging along and making progress, the point is a theory of chemistry alone isn’t enough: any successful method must rely on a whole extra body of knowledge.  This suggests our best bet at understanding all these phenomena is to have a whole toolbox of different theories, each one of which has its own body of relevant mathematics, its own domain-specific ontology, and some sense of how its concepts relate to those in other theories in the tookbox. (This suggests a view of how mathematics relates to the sciences which seems to me to reflect actual practice: it pervades all of them, in a different way than the way a “more fundamental” theory underlies a less fundamental one.  Which tends to spoil the otherwise funny XKCD comic on the subject…)

If one “explains” one theory in terms of another (or several others), then we may be able to put them into at least a partial order.  The mental image I have in mind is the “theoretical atlas” – a bunch of “charts” (the theories) which cover different parts of a globe (our experience, or the data we want to account for), and which overlap in places.  Some are subsets of others (are completely explained by them, in principle). Then we’d like to find a minimal (or is it maximal) element of this order: something which accounts for all the others, at least in principle.  In that mental image, it would be a map of the whole globe (or a dense subset of the surface, anyway).  Because, of course, the Bohr model, though in principle sufficient to account for chemistry, needs an explanation of its own: why are atoms made this way, instead of some other way? This ends up ramifying out into something like the Standard Model of particle physics.  Once we have that, we would still like to know why elementary particles work this way, instead of some other way…

An Explanatory Trilemma

There’s a problem here, which I think is unavoidable, and which rather ruins that nice mental image.  It has to do with a sort of explanatory version of Agrippa’s Trilemma, which is an observation in epistemology that goes back to Agrippa the Skeptic. It’s also sometimes called “Munchausen’s Trilemma”, and it was originally made about justifying beliefs.  I think a slightly different form of it can be applied to explanations, where instead of “how do I know X is true?”, the question you repeatedly ask is “why does it happen like X?”

So, the Agrippa Trilemma as classically expressed might lead to a sequence of questions about observation.  Q: How do we know chemical substances are made of elements? A: Because of some huge body of evidence. Q: How do we know this evidence is valid? A: Because it was confirmed by a bunch of experimental data. Q: How do we know that our experiments were done correctly? And so on. In mathematics, it might ask a series of questions about why a certain theorem is true, which we chase back through a series of lemmas, down to a bunch of basic axioms and rules of inference. We could be asked to justify these, but typically we just posit them. The Trilemma says that there are three ways this sequence of justifications can end up:

1. we arrive at an endpoint of premises that don’t require any justification
2. we continue indefinitely in a chain of justifications that never ends
3. we continue in a chain of justifications that eventually becomes circular

None of these seems to be satisfactory for an experimental science, which is partly why we say that there’s no certainty about empirical knowledge. In mathematics, the first option is regarded as OK: all statements in mathematics are “really” of the form if axioms A, B, C etc. are assumed, then conclusions X, Y, Z etc. eventually follow. We might eventually find that some axioms don’t apply to the things we’re interested in, and cease to care about those statements, but they’ll remain true. They won’t be explanations of anything very much, though.  If we’re looking at reality, it’s not enough to assume axioms A, B, C… We also want to check them, test them, see if they’re true – and we can’t be completely sure with only a finite amount of evidence.

The explanatory variation on Agrippa’s Trilemma, which I have in mind, deals with a slightly different problem.  Supposing the axioms seem to be true, and accepting provisionally that they are, we also have another question, which if anything is even more basic to science: we want to know WHY they’re true – we look for an explanation.

This is about looking for coherence, rather than confidence, in our knowledge (or at any rate, theories). But a similar problem appears. Suppose that elementary chemistry has explained organic chemistry; that atomic physics has explained why chemistry is how it is; and that the Standard model explains why atomic physics is how it is.  We still want to know why the Standard Model is the way it is, and so on. Each new explanation gives an account for one phenomenon in terms of different, more basic phenomenon. The Trilemma suggests the following options:

1. we arrive at an endpoint of premises that don’t require any explanation
2. we continue indefinitely in a chain of explanations that never ends
3. we continue in a chain of explanations that eventually becomes circular

Unless we accept option 1, we don’t have room for a “fundamental theory”.

Here’s the key point: this isn’t even a position about physics – it’s about epistemology, and what explanations are like, or maybe rather what our behaviour is like with regard to explanations. The standard version of Agrippa’s Trilemma is usually taken as an argument for something like fallibilism: that our knowledge is always uncertain. This variation isn’t talking about the justification of beliefs, but the sufficiency of explanation. It says that the way our mind works is such that there can’t be one final summation of the universe, one principle, which accounts for everything – because it would either be unaccounted for itself, or because it would have to account for itself by circular reasoning.

This might be a dangerous statement to make, or at least a theological one (theology isn’t as dangerous as it used to be): reasoning that things are the way they are “because God made it that way” is a traditional answer of the first type. True or not, I don’t think you can really call an “explanation”, since it would work equally well if things were some other way. In fact, it’s an anti-explanation: if you accept an uncaused-cause anywhere along the line, the whole motivation for asking after explanations unravels.  Maybe this sort of answer is a confession of humility and acceptance of limited understanding, where we draw the line and stop demanding further explanations. I don’t see that we all need to draw that line in the same place, though, so the problem hasn’t gone away.

What seems likely to me is that this problem can’t be made to go away.  That the situation we’ll actually be in is (2) on the list above.  That while there might not be any specific thing that scientific theories can’t explain, neither could there be a “fundamental theory” that will be satisfying to the curious forever.  Instead, we have an asymptotic approach to explanation, as each thing we want to explain gets picked up somewhere along the line: “We change from weaker to stronger lights, and each more powerful light pierces our hitherto opaque foundations and reveals fresh and different opacities below.”

Looks like the Standard Model is having a bad day – Fermilab has detected CP-asymmetry about 50 times what it predicts in some meson decay. As they say – it looks like there might be some new physics for the LHC to look into.

That said, this post is mostly about a particular voting system which has come back into the limelight recently, but also runs off on a few tangents about social choice theory and the assumptions behind it. I’m by no means expert in the mathematical study of game theory and social choice theory, but I do take an observer’s interest in them.

A couple of years ago, during an election season, I wrote a post on Arrow’s theorem, which I believe received more comments than any other post I’ve made in this blog – which may only indicate that it’s more interesting than my subject matter, but I suppose is also a consequence of mentioning anything related to politics on the Internet. Arrow’s theorem is in some ways uncontroversial – nobody disputes that it’s true, and in fact the proof is pretty easy – but what significance, if any, it has for the real world can be controversial. I’ve known people who wouldn’t continue any conversation in which it was mentioned, probably for this reason.

On the other hand, voting systems are now in the news again, as they were when I made the last post (at least in Ontario, where there was a referendum on a proposal to switch to the Mixed Member Proportional system). Today it’s in the United Kingdom, where the new coalition government includes the Liberal Democrats, who have been campaigning for a long time (longer than it’s had that name) for some form of proportional representation in the British Parliament. One thing you’ll notice if you click that link and watch the video (featuring John Cleese), is that the condensed summary of how the proposed system would work doesn’t actually tell you… how the proposed system would work. It explains how to fill out a ballot (with rank-ordering of candidates, instead of selecting a single one), and says that the rest is up to the returning officer. But obviously, what the returning officer does with the ballot is the key of the whole affair.

In fact, collecting ordinal preferences (that is, a rank-ordering of the options on the table) is the starting point for any social choice algorithm in the sense that Arrow’s Theorem talks about. The “social choice problem” is to give a map from the set of possible preference orders for each individual, and produce a “social” preference order, using some algorithm. One can do a wide range of things with this information: even the “first-past-the-post” system can start with ordinal preferences: this method just counts the number of first-place rankings for each option, ranks the one with the largest count first, and declares indifference to all the rest.

The Lib-Dems have been advocating for some sort of proportional representation, but there are many different systems that fall into that category and they don’t all work the same way. The Conservatives have promised some sort of referendum on a new electoral system involving the so-called “Alternative Vote”, also called Instant Runoff Voting (IRV), or the Australian Ballot, since it’s used to elect the Australian legislature.

Now, Arrow’s theorem says that every voting system will fail at least one of the conditions of the theorem. The version I quoted previously has three conditions: Unrestricted Range (no candidate is excluded by the system before votes are even counted); Monotonicity (votes for a candidate shouldn’t make them less likely to win); and Independence of Irrelevant Alternatives (if X beats Y one-on-one, and both beat Z, then Y shouldn’t win in a three-way race). Most voting systems used in practice fail IIA, and surprisingly many fail monotonicity. Both possibilities allow forms of strategic voting, in which voters can sometimes achieve a better result, according to their own true preferences, by stating those preferences falsely when they vote. This “strategic” aspect to voting is what ties this into game theory.

In this case, IRV fails both IIA and monotonicity. In fact, this is involved with the fact that IRV also fails the Condorcet condition which says that if there’s a candidate X who beats every other candidate one-on-one, X should win a multi-candidate race (which, obviously, can only happen if the voting system fails IIA).

So the IRV algorithm, one effectively uses the preference ordering to “simulate” a runoff election, in which people vote for their first choice from $n$ candidates, then the one with the fewest votes is eliminated, and the election is held again with $(n-1)$ candidates, and so on until a single winner emerges. In IRV, this is done by transferring the votes for the discarded candidate to their second-choice candidate, recounding, discarding again, and so on. (The proposal in the UK would be to use this system in each constituency to elect individual MP’s.)

Here’s an example of how IRV might fail these criteria, and permit strategic voting. The way assumes a close three-way election, but this isn’t the only possibility.

Suppose there are three candidates: X, Y, and Z. There are six possible preference orders a voter could have, but to simplify, we’ll suppose that only three actually occur, as follows:

 Percentage Choice 1 Choice 2 Choice 3 36 X Z Y 33 Y Z X 31 Z Y X

One could imagine Z is a “centrist” candidate somewhere between X and Y. It’s clear here that Z is the Condorcet winner: in a two-person race with either X or Y, Z would win by nearly a 2-to-1 margin. Yet under IRV, Z has the fewest first-choice ballots, and so is eliminated, and Y wins the second round. So IRV fails the Condorcet criterion. It also fails the Independence of Irrelevant Alternatives, since X is loses in a two-candidate vote against either Y or Z (by 64-36), hence should be “irrelevant”, yet the fact that X is on the ballot causes Z to lose to Y, whom Z would otherwise beat

This tends to undermine the argument for IRV that it eliminates the “spoiler effect” (another term for the failure of IIA): here, Y is the “spoiler”.

The failure of monotonicity is well illustrated by a slightly differente example, where Z-supporters are split between X and Y, say 16-15. Then X-supporters can get a better result for themselves if 6 of their 36 percent lie, and rank Y first instead of X (even though they like Y the least), followed by X. This would mean only 30% rank X first, so X is eliminated, and Y runs against Z. Then Z wins 61-39 against Y, which X-supporters prefer. Thus, although the X supporters switched to Y – who would otherwise have won – Y now loses. (Of course, switching to Z would also have worked – but this shows that in increase of support for the winning candidate could actually cause that candidate to LOSE, if it comes from the right place). This kind of strategic voting can happen with any algorithm that proceeds in multiple rounds.

Clearly, though, this form of strategic voting is more difficult than the kind seen in FPTP – “vote for your second choice to vote against your third choice”, which is what usually depresses the vote for third parties, even those who do well in polls. Strategic voting always involves having some advance knowledge about what the outcome of the election is likely to be, and changing one’s vote on that basis: under FPTP, this means knowing, for instance, that your favourite candidate is a distant third in the polls, and your second and third choices are the front-runners. Under IRV, it involves knowing the actual percentages much more accurately, and coordinating more carefully with others (to make sure that not too many people switch, in the above example). This sort of thing is especially hard to do well if everyone else is also voting strategically, disguising their true preferences, which is where the theory of such games with imperfect information gets complicated.

So there’s an argument that in practice strategic voting matters less under IRV.

Another criticism of IRV – indeed, of any voting system that selects a single-candidate per district – is that it tends toward a two party system. This is “Duverger’s Law“, (which if it is a law in the sense of a theorem, it must be one of those facts about asymptotic behaviour that depend on a lot of assumptions, since we have a FPTP system in Canada, and four main parties). Whether this is bad or not is contentious – which illustrates the gap between analysis and conclusions about the real world. Some say two-party systems are bad because they disenfranchise people who would otherwise vote for small parties; others say they’re good because they create stability by allowing governing majorities; still others (such as the UK’s LibDems) claim they create instability, by leading to dramatic shifts in ruling party, instead of quantitative shifts in ruling coalitions. As far as I know, none of these claims can be backed up with the kind of solid analysis one has with strategic voting.

Getting back to strategic voting: perverse voting scenarios like the ones above will always arise when the social choice problem is framed as finding an algorithm taking $n$ voters’ preference orders, and producing a “social” preference order. Arrow’s theorem says any such algorithm will fail one of the conditions mentioned above, and the Gibbard-Satterthwaite theorem says that some form of strategic voting will always exist to take advantage of this, if the algorithm has unlimited range. Of course, a “limited range” algorithm – for example, one which always selects the dictator’s preferred option regardless of any votes cast – may be immune to strategic voting, but not in a good way. (In fact, the GS theorem says that if strategic voting is impossible, the system is either dictatorial or a priori excludes some option.)

One suggestion to deal with Arrow’s theorem is to frame the problem differently. Some people advocate Range Voting (that’s an advocacy site, in the US context – here is one advocating IRV which describes possible problems with range voting – though criticism runs both ways). I find range voting interesting because it escapes the Arrow and Gibbard-Satterthwaite theorems; this in turn is because it begins by collecting cardinal preferences, not ordinal preferences, from each voter, and produces cardinal preferences as output. That is, voters give each option a score in the range between 0% and 100% – or 0.0 and 10.0 as in the Olympics. The winner (as in the Olympics) is the candidate with the highest total score. (There are some easy variations in non-single-winner situations: take the candidates with the top $n$ scores, or assign seats in Parliament proportional to total score using a variation on the same scheme). Collecting more information evades the hypotheses of these theorems. The point is that Arrow’s theorem tells us there are fundamental obstacles to coherently defining the idea of the “social preference order” by amalgamating individual ones. There’s no such obstacle to defining a social cardinal preference: it’s just an average.  Then, too: it’s usually pretty clear what a preference order means – it’s less clear for cardinal preferences; so the extra information being collected might not be meaningful.  After all: many different cardinal preferences give the same order, and these all look the same when it comes to behaviour.

Now, as the above links suggest, there are still some ways to “vote tactically” with range voting, but many of the usual incentives to dishonesty (at least as to preference ORDER) disappear. The incentives to dishonesty are usually toward exaggeration of real preferences. That is, falsely assigning cardinal values to ordinal preferences: if your preference order is X > Y > Z, you may want to assign 100% to X, and 0% to Y and Z, to give your preferred candidate the strongest possible help. Another way to put this is: if there are $n$ candidates, a ballot essentially amounts to choosing a vector in $\mathbb{R}^n$, and vote-counting amounts to taking an average of all the vectors. Then assuming one knew in advance what the average were going to be, the incentive in voting is to pick a vector pointing from the actual average to the outcome you want.

But this raises the same problem as before: the more people can be expected to vote strategically, the harder it is to predict where the actual average is going to be in advance, and therefore the harder it is to vote strategically.

There are a number of interesting books on political theory, social choice, and voting theory, from a mathematical point of view. Two that I have are Peter Ordeshook’s “Game Theory and Political Theory”, which covers a lot of different subjects, and William Riker’s “Liberalism Against Populism” which is a slightly misleading title for a book that is mostly about voting theory. I would recommend either of them – Ordeshook’s is the more technical, whereas Riker’s is illustrated with plenty of real-world examples.

I’m not particularly trying to advocate one way or another on any of these topics. If anything, I tend to agree with the observation in Ordeshook’s book – that a major effect of Arrow’s theorem, historically, has been to undermine the idea that one can use terms like “social interest” in any sort of uncomplicated way, and turned the focus of social choice theory from an optimization question – how to pick the best social choice for everyone – into a question in the theory of strategy games – how to maximize one’s own interests under a given social system. I guess what I’d advocate is that more people should understand how to examine such questions (and I’d like to understand the methods better, too) – but not to expect that these sorts of mathematical models will solve the fundamental issues. Those issues live in the realm of interpretation and values, not analysis.

When I made my previous two posts about ideas of “state”, one thing I was aiming at was to say something about the relationships between states and dynamics. The point here is that, although the idea of “state” is that it is intrinsically something like a snapshot capturing how things are at one instant in “time” (whatever that is), extrinsically, there’s more to the story. The “kinematics” of a physical theory consists of its collection of possible states. The “dynamics” consists of the regularities in how states change with time. Part of the point here is that these aren’t totally separate.

Just for one thing, in classical mechanics, the “state” includes time-derivatives of the quantities you know, and the dynamical laws tell you something about the second derivatives. This is true in both the Hamiltonian and Lagrangian formalism of dynamics. The Hamiltonian function, which represents the concept of “energy” in the context of a system, is based on a function $H(q,p)$, where $q$ is a vector representing the values of some collection of variables describing the system (generalized position variables, in some configuration space $X$), and the $p = m \dot{q}$ are corresponding “momentum” variables, which are the other coordinates in a phase space which in simple cases is just the cotangent bundle $T*X$. Here, $m$ refers to mass, or some equivalent. The familiar case of a moving point particle has “energy = kinetic + potential”, or $H = p^2 / m + V(q)$ for some potential function $V$. The symplectic form on $T*X$ can then be used to define a path through any point, which describes the evolution of the system in time – notably, it conserves the energy $H$. Then there’s the Lagrangian, which defines the “action” associated to a path, which comes from integrating some function $L(q, \dot{q})$ living on the tangent bundle $TX$, over the path. The physically realized paths (classically) are critical points of the action, with respect to variations of the path.

This is all based on the view of a “state” as an element of a set (which happens to be a symplectic manifold like $T*X$ or just a manifold if it’s $TX$), and both the “energy” and the “action” are some kind of function on this set. A little extra structure (symplectic form, or measure on path space) turns these functions into a notion of dynamics. Now a function on the space of states is what an observable is: energy certainly is easy to envision this way, and action (though harder to define intuitively) counts as well.

But another view of states which I mentioned in that first post is the one that pertains to statistical mechanics, in which a state is actually a statisticial distribution on the set of “pure” states. This is rather like a function – it’s slightly more general, since a distribution can have point-masses, but any function gives a distribution if there’s a fixed measure $d\mu$ around to integrate against – then a function like $H$ becomes the measure $H d\mu$. And this is where the notion of a Gibbs state comes from, though it’s slightly trickier. The idea is that the Gibbs state (in some circumstances called the Boltzmann distribution) is the state a system will end up in if it’s allowed to “thermalize” – it’s the maximum-entropy distribution for a given amount of energy in the specified system, at a given temperature $T$. So, for instance, for a gas in a box, this describes how, at a given temperature, the kinetic energies of the particles are (probably) distributed. Up to a bunch of constants of proportionality, one expects that the weight given to a state (or region in state space) is just $exp(-H/T)$, where $H$ is the Hamiltonian (energy) for that state. That is, the likelihood of being in a state is inversely proportional to the exponential of its energy – and higher temperature makes higher energy states more likely.

Now part of the point here is that, if you know the Gibbs state at temperature $T$, you can work out the Hamiltonian
just by taking a logarithm – so specifying a Hamiltonian and specifying the corresponding Gibbs state are completely equivalent. But specifying a Hamiltonian (given some other structure) completely determines the dynamics of the system.

This is the classical version of the idea Carlo Rovelli calls “Thermal Time”, which I first encountered in his book “Quantum Gravity”, but also is summarized in Rovelli’s FQXi essay “Forget Time“, and described in more detail in this paper by Rovelli and Alain Connes. Mathematically, this involves the Tomita flow on von Neumann algebras (which Connes used to great effect in his work on the classification of same). It was reading “Forget Time” which originally got me thinking about making the series of posts about different notions of state.

Physically, remember, these are von Neumann algebras of operators on a quantum system, the self-adjoint ones being observables; states are linear functionals on such algebras. The equivalent of a Gibbs state – a thermal equilibrium state – is called a KMS (Kubo-Martin-Schwinger) state (for a particular Hamiltonian). It’s important that the KMS state depends on the Hamiltonian, which is to say the dynamics and the notion of time with respect to which the system will evolve. Given a notion of time flow, there is a notion of KMS state.

One interesting place where KMS states come up is in (general) relativistic thermodynamics. In particular, the effect called the Unruh Effect is an example (here I’m referencing Robert Wald’s book, “Quantum Field Theory in Curved Spacetime and Black Hole Thermodynamics”). Physically, the Unruh effect says the following. Suppose you’re in flat spacetime (described by Minkowski space), and an inertial (unaccelerated) observer sees it in a vacuum. Then an accelerated observer will see space as full of a bath of particles at some temperature related to the acceleration. Mathematically, a change of coordinates (acceleration) implies there’s a one-parameter family of automorphisms of the von Neumann algebra which describes the quantum field for particles. There’s also a (trivial) family for the unaccelerated observer, since the coordinate system is not changing. The Unruh effect in this language is the fact that a vacuum state relative to the time-flow for an unaccelerated observer is a KMS state relative to the time-flow for the accelerated observer (at some temperature related to the acceleration).

The KMS state for a von Neumann algebra with a given Hamiltonian operator has a density matrix $\omega$, which is again, up to some constant factors, just the exponential of the Hamiltonian operator. (For pure states, $\omega = |\Psi \rangle \langle \Psi |$, and in general a matrix becomes a state by $\omega(A) = Tr(A \omega)$ which for pure states is just the usual expectation value value for A, $\langle \Psi | A | \Psi \rangle$).

Now, things are a bit more complicated in the von Neumann algebra picture than the classical picture, but Tomita-Takesaki theory tells us that as in the classical world, the correspondence between dynamics and KMS states goes both ways: there is a flow – the Tomita flow – associated to any given state, with respect to which the state is a KMS state. By “flow” here, I mean a one-parameter family of automorphisms of the von Neumann algebra. In the Heisenberg formalism for quantum mechanics, this is just what time is (i.e. states remain the same, but the algebra of observables is deformed with time). The way you find it is as follows (and why this is right involves some operator algebra I find a bit mysterious):

First, get the algebra $\mathcal{A}$ acting on a Hilbert space $H$, with a cyclic vector $\Psi$ (i.e. such that $\mathcal{A} \Psi$ is dense in $H$ – one way to get this is by the GNS representation, so that the state $\omega$ just acts on an operator $A$ by the expectation value at $\Psi$, as above, so that the vector $\Psi$ is standing in, in the Hilbert space picture, for the state $\omega$). Then one can define an operator $S$ by the fact that, for any $A \in \mathcal{A}$, one has

$(SA)\Psi = A^{\star}\Psi$

That is, $S$ acts like the conjugation operation on operators at $\Psi$, which is enough to define $S$ since $\Psi$ is cyclic. This $S$ has a polar decomposition (analogous for operators to the polar form for complex numbers) of $S = J \Delta$, where $J$ is antiunitary (this is conjugation, after all) and $\Delta$ is self-adjoint. We need the self-adjoint part, because the Tomita flow is a one-parameter family of automorphisms given by:

$\alpha_t(A) = \Delta^{-it} A \Delta^{it}$

An important fact for Connes’ classification of von Neumann algebras is that the Tomita flow is basically unique – that is, it’s unique up to an inner automorphism (i.e. a conjugation by some unitary operator – so in particular, if we’re talking about a relativistic physical theory, a change of coordinates giving a different $t$ parameter would be an example). So while there are different flows, they’re all “essentially” the same. There’s a unique notion of time flow if we reduce the algebra $\mathcal{A}$ to its cosets modulo inner automorphism. Now, in some cases, the Tomita flow consists entirely of inner automorphisms, and this reduction makes it disappear entirely (this happens in the finite-dimensional case, for instance). But in the general case this doesn’t happen, and the Connes-Rovelli paper summarizes this by saying that von Neumann algebras are “intrinsically dynamic objects”. So this is one interesting thing about the quantum view of states: there is a somewhat canonical notion of dynamics present just by virtue of the way states are described. In the classical world, this isn’t the case.

Now, Rovelli’s “Thermal Time” hypothesis is, basically, that the notion of time is a state-dependent one: instead of an independent variable, with respect to which other variables change, quantum mechanics (per Rovelli) makes predictions about correlations between different observed variables. More precisely, the hypothesis is that, given that we observe the world in some state, the right notion of time should just be the Tomita flow for that state. They claim that checking this for certain cosmological models, like the Friedman model, they get the usual notion of time flow. I have to admit, I have trouble grokking this idea as fundamental physics, because it seems like it’s implying that the universe (or any system in it we look at) is always, a priori, in thermal equilibrium, which seems wrong to me since it evidently isn’t. The Friedman model does assume an expanding universe in thermal equilibrium, but clearly we’re not in exactly that world. On the other hand, the Tomita flow is definitely there in the von Neumann algebra view of quantum mechanics and states, so possibly I’m misinterpreting the nature of the claim. Also, as applied to quantum gravity, a “state” perhaps should be read as a state for the whole spacetime geometry of the universe – which is presumably static – and then the apparent “time change” would then be a result of the Tomita flow on operators describing actual physical observables. But on this view, I’m not sure how to understand “thermal equilibrium”.  So in the end, I don’t really know how to take the “Thermal Time Hypothesis” as physics.

In any case, the idea that the right notion of time should be state-dependent does make some intuitive sense. The only physically, empirically accessible referent for time is “what a clock measures”: in other words, there is some chosen system which we refer to whenever we say we’re “measuring time”. Different choices of system (that is, different clocks) will give different readings even if they happen to be moving together in an inertial frame – atomic clocks sitting side by side will still gradually drift out of sync. Even if “the system” means the whole universe, or just the gravitational field, clearly the notion of time even in General Relativity depends on the state of this system. If there is a non-state-dependent “god’s-eye view” of which variable is time, we don’t have empirical access to it. So while I can’t really assess this idea confidently, it does seem to be getting at something important.

Last Friday, UWO hosted a Distinguished Colloquium talk by Gregory Chaitin, who was talking about a proposal for a new field he calls “metabiology”, which he defined in the talk (and on the website above) as “a field parallel to biology, dealing with the random evolution of artificial software (computer programs) rather than natural software (DNA), and simple enough that it is possible to prove rigorous theorems or formulate heuristic arguments at the same high level of precision that is common in theoretical physics.” This field doesn’t really exist to date, but his talk was intended to argue that it should, and to suggest some ideas as to what it might look like. It was a well-attended talk with an interdisciplinary audience including (at least) people from the departments of mathematics, computer science, and biology. As you might expect for such a talk, it was also fairly nontechnical.

A lot of the ideas presented in the talk overlapped with those in this outline, but to summarize… One of the motivating ideas that he put forth was that there is currently no rigorous proof that Darwin-style biological evolution can work – i.e. that operations of mutation and natural selection can produce systems of very high complexity. This is a fundamental notion in biology, summarized by the slogan, “Nothing in biology makes sense except in light of evolution”. This phrase, funnily, was coined as the title of a defense of a “theistic evolution” – not obviously a majority position among scientists, but also not to be confused with “intelligent design” which claims that evolution can’t account for observed features of organisms. This is a touchy political issue in some countries, and it’s not obvious that a formal proof that mutation and selection CAN produce highly complex forms would resolve it. Even so, as Chaitin said, it seems likely that such a proof could exist – but if there’s a rigorous proof of the contrary, that would be good to know also!

Of course, such a formal proof doesn’t exist because formal proof doesn’t play much role in biology, or any other empirical science – since living things are very complex, and incompletely understood. Thus the proposal of a different field, “metabiology”, which would study simpler formal objects: “artificial software” in the form of Turing machines or program code, as opposed to “natural software” like DNA. This abstracts away everything about an organism except its genes (which is a lot!), with the aim of simplifying enough to prove that mutation and selection in this toy world can generate arbitrarily high levels of complexity.

Actually stating this precisely enough to prove ties in to the work that Chaitin is better known for, namely the study of algorithmic complexity and theoretical computer science. The two theorems Chaitin stated (but didn’t prove in the talk) did not – he admitted – really meet that goal, but perhaps did point in that direction. One measure of complexity is computability – that is, the size of a Turing machine (for example, though a similar definition applies to other universal ways of describing algorithms) which is needed to generate a particular pattern. A standard example is the “Busy Beaver function“, and one way to define
it is to say that $B(n)$ is the largest number printed out by an $n$-state Turing machine which then halts. Since the halting problem is uncomputable (i.e. there’s no Turing machine which, given a description of another machine, can always decide whether or not it halts), for reasons analogous to Cantor’s diagonal argument or Godel’s incompleteness theorem, generating $B(n)$, or a sequence of the same order, is a good task to measure complexity.

So the first toy model involved a single organism, being replaced in each generation by a mutant form. The “organism” is a Turing machine (or a program in some language, etc. – one key result from complexity theory is that all these different ways to specify an algorithm can simulate each other, with the addition of at most a fixed-size prefix, which is the part of the algorithm describing how to do the simulation). In each generation, it is mutated. The mutant replaces the original organism if: (a) the new code halts, and (b) outputs a number which (c) is larger than the number produced by the original. Now, this decision procedure is uncomputable since it requires solving the halting problem – so in particular, there’s no way to simulate this process. But the theorem says that, in exponential time (i.e. $t(n) \sim O(e^n)$), this process will produce a machine which produces a number of order $B(n)$. That is, as long as the “environment” (the thing doing the selection) can recognize and reward complexity, mutation is sufficient to produce it. But these are pretty big assumptions, which is one reason this theorem isn’t quite what’s wanted.

Still, within it’s limited domain, he also stated a theorem to the effect that, for any given level of complexity (in the above sense), there is a path through the space of possible programs which reaches it, such that the “mutation distance” (roughly, the negative logarithm of the probability of a mutation occurring) at each step is bounded, and the complexity (therefore fitness, in this toy model) increases at each step. He indicated that one could prove this using the bits of the halting probability Omega – he didn’t specify how, and this isn’t something I’m very familiar with, but apparently (as describeded in the linked article), there are somewhat standard ways to do this kind of thing.

So anyway, this little toy model doesn’t really do the job Chaitin is saying ought to be done, but it illustrates what the kind of theorems he’s asking for might look like. My reaction is that it would be great to have theorems like this that could tell us something meaningful about real biology (so the toy model certainly is too simple), though I’m not totally convinced there needs to be a “new field” for such study. But certainly theoretical biology seems to be much less developed than, say, theoretical physics, and even if rigorous proofs aren’t going to be as prominent there, if some can be found, it probably couldn’t hurt.

After the talk, there was some interesting discussion about other things going on in theoretical biology and “systems biology“.  Chaitin commented that a lot of the work in this field involves detailed simulations of models of real systems, made as accurate as possible – which, while important, is different from the kind of pursuit of basic theoretical principles he was talking about.  So this would include things like: modeling protein folding; studying patterns in big databases of gene frequencies in populations and how they change in time; biophysical modeling of organs and the biochemical reactions in them; simulating the dynamics of individual cells, their membranes and the molecular machinery that makes them work; and so on.  All of which has been moving rapidly in recent years,  but is only tangentially related to fundamental principles about how life works.

On the other hand, as audience members pointed out, there is another thread, exemplified by the Santa Fe Institute, which is more focused on understanding the dynamics of complex systems.  Some well-known names in this area would be Stuart Kauffman, John Holland and Per Bak, among others.  I’ve only looked into this stuff at the popular level, but there are some interesting books about their work – Holland’s “Hidden Order”, Kauffman’s “The Origins of Order” (more technical) and “At Home in the Universe” (more popular), and Solé and Goodwin’s “Signs of Life” (a popular survey, but with equations, of various
aspects of mathematical approaches to biological complexity).  Chaitin’s main comment on this stuff is that it has produced plenty of convincing heuristic arguments, simulations and models with suggestive behaviour, and so on – but not many rigorous theorems.  So: it’s good, but not exactly what he meant by “metabiology”.

Summarizing this stuff would be a big task in itself, but it does connect to Chaitin’s point that it might be nice to know (rigorously) if Darwinian evolution by itself were NOT enough to explain the complexity of living things.  Stuart Kauffman, for example, has suggested that certain kinds of complex order tend to arise through “self-organization”.  Philosopher Daniel Dennett
commented on this in “Darwin’s Dangerous Idea”, saying that although this might be true, at most it tells us more detail about what kinds of things Darwinian selection has available to act on.

This all seems to tie into the question over which appeared first as life was first coming into being: self-replicating molecules like RNA (and later DNA), or cells with metabolic reactions occurring inside.  Organisms obviously both reproduce and metabolize, but these are two quite different kinds of process, and there seems to be a “chicken-and-egg” problem with which came first.  Kauffman, among others, has looked at the emergence of “autocatalytic networks” of chemical reactions: these are collections of chemical reactions, some or all of which needing a catalyst, such that all the catalysts needed to make them run are products of some reaction in the network.  They’ve shown in simulation that such networks can arise spontaneously under certain conditions – suggesting that metabolism might have come into existence without DNA or similar molecules around (one also thinks of larger phenomena, like the nitrogen cycle).  In any case, this is the kind of thing which people sometimes point to when suggesting that Darwinian selection isn’t enough to completely explain the structure of organisms actually existing today.  Which is a different claim (mind you) than the claim that Darwinian evolution could not possibly produce complex organisms.  Chaitin’s whole motivation was to suggest that it should be provable one way or the other (and, he presumes, in the affirmative) whether mutation and selection CAN do this job.  If it could be proved that it can’t – at least there are some other ingredients to consider.

All in all, I found the talk thought-provoking, in spite (or because) of being partial and inconclusive.  Biology may be less rigorous than physics, but this could just be a sign that there’s a lot to learn and do in the field – and a lot of it is being done!

First off, a nice recent XKCD comic about height.

I’ve been busy of late starting up classes, working on a paper which should appear on the archive in a week or so on the groupoid/2-vector space stuff I wrote about last year.  I resolved the issue I mentioned in a previous post on the subject, which isn’t fundamentally that complicated, but I had to disentangle some notation and learn some representation theory to get it figured out.  I’ll maybe say something about that later, but right now I felt like making a little update.  In the last few days I’ve also put together a little talk to give at Octoberfest in Montreal, where I’ll be this weekend.  Montreal is a lovely city to visit, so that should be enjoyable.

A little while ago I had a talk with Dan’s new grad student – something for a class, I think – about classical and modern differential geometry, and the different ideas of curvature in the two settings.  So the Gaussian curvature of a surface embedded in $\mathbb{R}^3$ has a very multivariable-calculus feel to it: you think of curves passing through a point, parametrized by arclength.  The have a moving orthogonal frame attached: unit tangent vector, its derivative, and their cross-product.  The derivative of the unit tangent is always orthogonal (it’s not changing length), so you can imagine it to be the radius of a circle, with length $r$, the radius of curvature.  Then you have $\kappa = \frac{1}{r}$ curvature along that path.  At any given point on a surface, you get two degrees of freedom – locally, the curve looks like a hyperboloid or an ellipse, or whatever, so there’s actually a curvature form.  The determinant gives the Gaussian curvature $K$.  So it’s a “second derivative” of the surface itself (if you think of it as ).  The Gaussian curvature, unlike the curvature in particular directions, is intrinsic – preserved by isometry of the surface, so it’s not really dependent on the embedding.  But this fact takes a little thinking to get to.  Then there’s the trace – the scalar curvature.

In a Riemannian manifold, you  need to have a connection to see what the curvature is about.  Given a metric, there’s the associated Levi-Civita connection, and of course you’d get a metric on a surface embedded in $\mathbb{R}^3$, inherited from the ambient space.  But the modern point of view is that the connection is the important object: the ambient space goes away entirely.  Then you have to think of what the curvature represents differenly, since there’s no normal vector to the surface any more.  So now we’re assuming we want an intrinsic version of the “second derivative of the surface” (or n-manifold) from the get-go.  Here you look at the second derivative of the connection in any given coordinate system.  You’re finding the infinitesimal noncommutativity of parallel transport w.r.t two coordinate directions: take a given vector, and transport it two ways around an infinitesimal square, and take the difference, get a new vector.  This all is written as a (3,1)-form, the Riemann tensor.  Then you can contract it down and get a matrix again, and then contract on the last two indices (a trace!) and you get back the scalar curvature again – but this is all in terms of the connection (the coordinate dependence all disappears once you take the trace).

I hadn’t thought about this stuff in coordinates for a while, so it was interesting to go back and work through it again.

In the noncommutative geometry seminar, we’ve been talking about classical mechanics – the Lagrangian and Hamiltonian formulation.  So it reminded me of the intuition that curvature – a kind of second derivative – often shows up in Lagrangians for field theories using connections because it’s analogous to kinetic energy.  A typical mechanics Lagrangian is something like (kinetic energy) – (potential energy), but this doesn’t appear much in the topological field theories I’ve been thinking about because their curvature is, by definition, zero.  Topological field theory is kind of like statics, as opposed to mechanics, that way.  But that’s a handy simplification for the program of trying to categorify everything.  Since the whole space of connections is infinite dimensional, worrying about categorified action principles opens up a can of worms anyway.

So it’s also been interesting to remember some of that stuff and discuss it in the seminar – and it was inially suprising that it’s the introduction to “noncommutative geometry”.  It does make sense, though, since that’s related to the formalism of quantum mechanics: operator algebras on Hilbert spaces.

Finally, I was looking for something on 2-monads for various reasons, and found a paper by Steve Lack which I wanted to link to here so I don’t forget it.

The reason I was looking was that (a) Enxin Wu, after talking about deformation theory of algebras, was asking after monads and the bar construction, which we talked about at the UCR “quantum gravity” seminar, so at some point we’ll take a look at that stuff.  But it reminded me that I was interested in the higher-categorical version of monads for a different reason. Namely, I’d been talking to Jamie Vicary about his categorical description of the harmonic oscillator, which is based on having a monad in a nice kind of monoidal category.  Since my own category-theoretic look at the harmonic oscillator fits better with this groupoid/2-vector space program I’ll be talking about at Octoberfest (and posting about a little later), it seemed reasonable to look at a categorified version of the same picture.

But first things first: figuring out what the heck a 2-monad is supposed to be.  So I’ll eventually read up on that, and maybe post a little blurb here, at some point.

Anyway, that update turned out to be longer than I thought it would be.

I mentioned before that I wanted to try expanding the range of things I blog about here. One example is that I have a long-standing interest in game theory, which I think began when I was an undergrad at U of Waterloo. I don’t (currently) do research in game theory, and have nothing particularly novel to say about it (though naturally a default question for me would be: can you categorify it?), but it is regularly used to analyze situations in evolutionary theory, and social sciences like economics, and politics. So it seems to be a fairly fundamental discipline, and worth looking at, I think.

For a little over a week now, Canada has been in a campaign leading up to a federal election in October. Together with the constant coverage of the US election campaign, this means the news is full of politicking, projections of results, etc. Also as usual, there are less-well-covered people agitating for electoral reform. So it seemed as good a time as any to blog something about social choice theory, which is an application of game theory to the special problem of making choices in groups which reflect the preferences of its members.

This is the defining problem for democracy, which is (in recent times) the world’s most popular political system, so it has been pretty extensively studied. One aspect of democracy is voting. One aspect of social choice theory is the study of voting systems – algorithms which collect information about the populations preferences among some alternatives, and produce a “social choice function”. For example, the US Presidential election process (in its ideal form, and excluding Primaries) collects the name of one candidate from each voter, and returns the name of a single winner. The Australian process collects more information from each voter: an ordered list of candidates. It uses a variant of the voting scheme called STV, which can return more than one winner.

Okay, so there are many different voting methods. Why should there be so many? Why not just use the best? For one thing, there are different criteria what is meant by “best”, and every voting system has some sort of limitation. Depending on your precise criteria, you may prefer one system or another. A more precise statement of this comes in the form of Arrow’s theorem.

Suppose we have a set $C$ of “choices” (a general term for alternatives, not necessarily candidates for office), and for each voter $v$ in the population of voters $V$, there is $P_v$, a total order on $C$ (a “preference order”). Call this information a “profile”. Then we’d like to collect some information about the profile (perhaps the whole thing, perhaps not), and produce a “social preference order” $P$. This is a function $c : \mathcal{O}(C)^V \rightarrow \mathcal{O}(C)$ (where $\mathcal{O}(C)$ is the set of orders on $C$). The functon $c$ should satisfy some conditions, of course, and many possible conditions have been defined. Arrow’s theorem is usually stated with five conditions, but an equivalent form uses these:

1. Unrestricted Domain: $c$ is surjective. (I.e. any preference order could potentially occur as output of the algorithm – e.g. in a single-winner election, the system should allow any candidate to win, given enough votes)
2. Independence of Irrelevant Alternatives: for any $S \subset C$, the restriction $c\|_{S} : \mathcal{O}(S)^V \rightarrow \mathcal{O}(S)$ agrees with $c$ (i.e. when applied to the restrictions of the orders $P_v$ to $S$, it produces the restriction of $P$ to $S$. In particular, removing a non-winning candidate from the ballot should not affect the outcome.)
3. Monotonicity: if $P$ prefers alternative $a$ to $b$, then changing any $P_v$ so that it prefers $a$ to $b$ (assuming it did not) should not cause $P$ to prefer $b$ to $a$ (i.e. no voter’s ballot has a harmful effect on their preferred candidate).

Arrow’s theorem says that no algorithm satisfies all three conditions (and of course some algorithms don’t satisfy any). (A different formulation and its proof can be found here.)

Most popular voting systems satisfy (1) and (3) since these are fairly obvious criteria of fairness: every candidate has a chance, and nobody’s votes have negative weight (though note that a popular reform suggestion in the US, Instant Runoff Voting, fails monotonicity!). Condition (2) is the one that most people seem to find non-obvious. Failures of condition (2) can take various forms. One is “path dependence”, which may occur if the decision is made through a number of votes, and the winner can (for some profiles) depend on the order in which the votes are taken (for instance, the runoff voting used in French presidential elections is path dependent). When a voting system has this property, the outcome can sometimes be manipulated by the “agenda-setter” (if there is one).

Another way (2) can fail is by creating the possibility of strategic voting: creating a situation in which voters have a rational incentive to give false information about their preferences. For instance, voters may want to avoid “splitting” the vote. In the US election in 2000, the presence of the “irrelevant” (i.e. non-winning) alternative Ralph Nader (allegedly) split the vote which otherwise would have gone to Gore, allowing Bush to win certain states. Somewhat similarly, in the current Canadian election, there is a single “right wing” party (Conservative), two or three “left wing” parties, depending on the riding (Liberal, NDP, Bloc Quebecois), and one which is neither (Green). In some competitive districts, for example, voters who prefer NDP to Liberal, but Liberal to Conservative, have a rational reason to vote for their second choice in order to avoid their third choice – assuming they know the “real” race is between Conservative and Liberal in their riding. (In my riding, North London Centre, this is not an issue since the outcome – Liberal – is not in doubt at all.)

These, like all voting system flaws, only become apparent when there are at least three options: with only two options, condition (2) doesn’t apply, since removing one option leaves nothing to vote on. (This is one reason why many voting systems are said to “favour the two-party system”, where its flaws are not apparent: when the vote is split, voters have an incentive to encourage parties to merge. This is why Canada now has only one “right-wing” party).

These two flaws also allow manipulation of the vote only when the manipulator knows enough about the profile of preferences. Apart from allowing parties to find key competitive ridings (or states, etc.), this is probably one of the most important uses of polling data. (Strategic voting is hard in Canada, since a good statistical sample of, say, 1000 in each of 308 ridings would require polling about 1% of the total population of the country, so one usually has only very imperfect information about the profile. Surveying 1000 people in each of 50 US states is relatively much easier. Even at that, projections are hard: try reading any of the methodology details on, say, fivethirtyeight.com for illustration.)

Now, speaking of strategic voting, the Gibbard-Satterthwaite theorem, closely related to Arrow’s theorem, applies to social choice systems which aim to select a single winner based on a number of votes. (Arrow’s theorem originally doesn’t specifically apply to voting: it applies to any multi-criterion decision-making process). The G-S theorem says that any (deterministic) voting system satisfies one of three conditions:

1. It is dictatorial (i.e. there is a voter $v$ such that the winner depends only on $P_v$
2. It is restricted (i.e. there is some candidate who cannot win no matter the profile)
3. It allows strategic voting for some profiles

So it would seem that the possibility of strategic voting can’t reasonably be done away with. This suggests the point of view that voting strategically is no more a problem than, say, playing chess strategically. The fact that an analysis of voting in terms of the theory of games of strategy suggests this point of view is probably not a coincidence…

As I remarked, here in London North Centre, the outcome of the vote is in no doubt, so, strategically speaking, any vote is as good as any other, or none. This curious statement is, paradoxically, only true if not believed by most voters – voting strategically in a context where other voters are doing the same, or worse yet, answering pollsters strategically, is a much more complicated game with incomplete (and unreliable) information. This sort of thing is probably why electoral reform is a persistent issue in Canada.

I recently got back to London, Ontario from a trip to Ottawa, the first purpose of which was to attend the Ottawa Mathematics Conference. The other purpose was to visit family and friends, many of whom happen to be located there, which is one reason it’s taken me a week or so to get around to writing about the trip. Now, the OMC was a general-purpose conference, mainly for grad students, and some postdocs, to give short talks (plus a couple of invited faculty from Ottawa’s two universities – the University of Ottawa, and Carleton University – who gave lengthier talks in the mornings). This is not a type of conference I’ve been to before, so I wasn’t sure what to expect.

From one, fairly goal-oriented, point of view, the style of the conference seemed a little scattered. There was no particular topic of focus, for instance. On the other hand, for someone just starting out in mathematical research, this type of thing has some up sides. It gives a chance to talk about new work, see what’s being done across a range of subjects, and meet people in the region (in this case, mainly Ottawa, but also elsewhere across Eastern and Southern Ontario). The only other general-purpose mathematics conference I’ve been to so far was the joint meeting of the AMS in New Orleans in 2007, which had 5000 people and anyone attending talks would pick special sessions suiting their interests. I do think it’s worthwhile to find ways of circumventing the various pressures toward specialization in research – it may be useful in some ways, but balance is also good. Particularly for Ph.D. students, for whom specialization is the name of the game.

One useful thing – again, particularly for students – is the reminder that the world of mathematics is broader than just one’s own department, which almost certainly has its own specialties and peculiarities. For example, whereas here at UWO “Applied” mathematics (mostly involving computer modelling) is done in a separate department, this isn’t so everywhere. Or, again, while my interactions in the UWO department focus a lot on geometry and topology (there are active groups in homotopy theory and noncommutative geometry, for example), it’s been a while since I saw anyone talk about combinatorics, or differential equations. Since I actually did a major in combinatorics at U of Waterloo, it was kind of refreshing to see some of that material again.

There were a couple of invited talks by faculty. Monica Nevins from U of Ottawa gave a broad and enthusiastic survey of representation theory for graduate students. Brett Stevens from Carleton talked about “software testing”, which surprised me by actually being about combinatorial designs. Basically, it’s about the problem of how, if you have many variables with many possible values each, to design a minimal collection of “settings” for those variables which tests all possible combinations of, say, two variables (or three, etc.). One imagines the variables representing circumstances software might have to cope with – combinations of inputs, peripherals, and so on – so the combinatorial problem is if there are 10 variables with 10 possible values each, you can’t possibly test all 10 billion combinations – but you might be able to test all possible settings of any given PAIR of variables, and much more efficiently than just an exhaustive search, by combining some tests together.

Among the other talks were several combinatorial ones – error correcting codes using groups, path ideals in simplicial trees (which I understand to be a sort of generalization to simplicial sets of what trees are for graphs), heuristic algorithms for finding minimal cost collections of edges in weighted graphs that leave the graph with at least a given connectivity, and so on. Charles Starling from U of O gave an interesting talk about how to associate a topological space to an aperiodic tiling (roughly, any finite-size region in an aperiodic tiling is repeated infinitely many times – so the points of the space are translations, and two translations are within $\epsilon$ of one another if they produce matching regions about the origin of size $\frac{1}{\epsilon}$ – then the thing is to study cohomology of such spaces, and so forth).

The talk immediately following mine was by Mehmetcik Pamuk about homotopy self-equivalences of 4-manifolds, which used a certain braid of exact sequences of groups of automorphisms (among other things). I expected this to be very interesting, and it was certainly intriguing, but I can’t adequately summarize it – whatever he was saying, it proved to be hard to pick up from just a 25 minute talk. I did like something he said in his introduction, though: nowadays, if a topologist says they’re doing “low-dimensional” topology, they mean dimension 3, and “high-dimensional” means dimension 4. This is a glib but indicative way to point out that topology of manifolds in dimensions 1 and 2 is well understood (the connected components are, respectively, circles and n-holed tori), and in dimension 5 and above have been straightened out more recently thanks to Smale.

There were some quite applied talks which I missed, though I did catch one on “gravity waves”, which turn out not to be gravitational waves, but the kind of waves produced in fluids of varying density acted on by gravity. (In particular, due to layers of temperature and pressure in the atmosphere, sometimes denser air sits above less dense air, and gravity is trying to reverse this, producing waves. This produces those long rippling patterns you sometimes see in high-altitude clouds. Lidia Nikitina told us about some work modelling these in situations where the ground topography matters, such as near mountains – and had some really nice pictures to illustrate both the theory and the practice.)

On the second day there were quite a few talks of an algebraic or algebra-geometric flavour – about rings of algebraic invariants, about enumerating lines in special “blow-up” varieties, function fields associated to hyperelliptic curves, and so on – but although this is interesting, I had a harder time extracting informative things to say about these, so I’ll gloss over them glibly. However, I did appreciate the chance to gradually absorb a little more of this area of math by osmosis.

The flip side of seeing what many other people are doing was getting a chance to see what other people had to say about my own talk – about groupoids, spans, and 2-vector spaces. One of the things I find is that, while here at UWO the language of category theory is widely used (at least by the homotopy theorists and noncommutative geometry people I’ve been talking to), it’s not as familiar in other places. This seems to have been going on for some time – since the 1970’s if I understand the stories correctly. After MacLane and Eilenberg introduced categories in the 1940’s, the concept had significant effects in algebraic geometry/topology, homological algebra, and spread out from there. There was some deep enthusiasm – possibly well-founded, though I won’t claim so – that category theory was a viable replacement for set theory as a “foundation” for mathematics. True or not, that idea seemed to be one of those which was picked up by mathematicans who didn’t otherwise know much about category theory, and it seems to be one that’s still remembered. So maybe it had something to do with the apparent fall from fashion of category theory. I’ve heard that theory suggested before: roughly, that many mathematicians thought category theory was supposed to be a new foundation for mathematics, couldn’t see the point, and lost interest.

Now, my view of foundations is roughly suggested in my explanation of the title of this blog. I tend to think that our understanding of the world comes in bits and pieces, which we refine, then try to stick together into larger and more inclusive bits and pieces – the “Atlas” of charts of the title. This isn’t really just about the physical world, but the mathematical world as well (in fact I’m not really a Platonist who believes in a separate “world” of mathematical objects – though that’s a different conversation). This is really just a view of epistemology – namely, empirical methods work best because we don’t know things for sure, not being infinitely smart. So the “idealist”-style program of coming up with some foundational axioms (say, for set theory), and deriving all of mathematics from them without further reference to the outside doesn’t seem like the end of the story. It’s useful as a way of generating predictions in physics, but not of testing them. In mathematics, it generates many correct theorems, but doesn’t help identify interesting, or useful, ones.

So could category theory be used in foundations of mathematics? Maybe – but you could also say that mathematics consists of manipulating strings in a formal language, and strings are just words in a free monoid, so actually all of mathematics is the theory of monoids with some extra structure (giving rules of inference in the formal language). Yet monoid theory – indeed, algebra generally – is not mainly interesting as foundations, and probably neither is category theory.

On the whole, it was an interesting step out of the usual routine.

Writing sizeable chunks of math blog takes longer than I expected. Here are a few non-intensive things that occurred to me.

While I was walking home from the UWO campus, I was reminded of the nature of Canada in late November: everything, from sky to plantlife to earth, is in shades of grey, brown, ochre, and the occasional desaturated greenish-whatever. Autumn leaves have pretty much stopped falling, and are on the ground turning greyish versions of whatever colours they were before. There are whole vistas of bare branches, dead underbrush, and so on.

Which seems dreary for a while, until you’re immersed in it, as I am on the particular route I walk home, along London, Ontario’s Thames River (not to be confused with the River Thames in London, England), which is lined with parks. Then, after a while, all the subtle differences in shading and texture start to jump out at you more and more, until brownish moss on a tree under overcast late-afternoon light is vibrant green, a patch of snow is glowing bluish white, the occasional flicker of sunset through the cloud cover is warm pumpkin-orange, that one particular bush’s leaves look startlingly red… and then you see something artificial, like someone’s nylon jacket, or a kid’s plastic play-structure, and their colours look implausibly oversaturated, like a badly photoshopped picture.

Which got me to thinking about fine distinctions that seem drab outside their context – the way these colours look at first. Or nitpicky, like having 30 different words for “cold” and the different qualities it can have, or recognizing 15 different types of snowflake from a distance. Coming back to Canada after several years in California, I noticed all this specialized knowledge I’d forgotten about, and seems terribly arcane outside its native habitat. It occurred to me that this is how mathematics probably seems to outsiders – like physicists, or statisticians… (I jest)

For instance: I often have the experience of using the term “categorification” in describing something I’m doing – often in scare-quotes, followed by some kind of explanation – only to have it echoed back as “categorization”, and wonder whether to risk pedantry and explain that they’re not the same thing at all. “Categorification, not to be confused with…”

On another note, I went looking for this paper by Carter, Kauffman and Saito, on a kind of invariant of 4-manifolds which generalizes 3D Dijkgraaf-Witten invariants, on the supposition that it would be closely related to some things I’ve been thinking about, from a diagrammatic point of view I’ve not paid much attention to in the last year or so. As I was looking through seach results, I noticed a paper from about 10 years ago by Kauffman and Smolin with an interesting sounding title, A Possible Solution to the Problem of Time in Quantum Cosmology. Since Lee Smolin has written on linking topological field theory and quantum gravity, I guessed it would also be interesting to look at. Only after reading the first few pages did I notice that the first listed author was not Louis Kauffman, who studies knot theory (and things tangent thereto), but Stuart Kauffman, who studies biocomplexity and complex systems.

I happen to be interested in the work of both Kauffmans – more immediately and professionally that of Louis, but I also read a couple of Stuart’s more accessible books, “At Home in the Universe”, and “Investigations” – and since the paper was short, I finished reading it. The basic premise is that the configuration space for 4D quantum gravity may not be constructible by any finite procedure (classifying spin networks, they say, might present a problem; doing path integrals over all 4-manifold topologies certainly does). So the “problem of time”, that there’s no role for time in describing dynamics in terms of paths through a configuration space, wouldn’t make sense – at least for a constructivist. (Or indeed a constructivist, though of course they shouldn’t be confused.) One thing that threw me off in noticing which Kauffman was involved was that part of this portion of the argument was about classifying knots.

That cleared itself up when they got to the part proposing a solution – that the total space of possible states isn’t a-priori given, but time re-enters the situation as the universe evolves, at each time step having some amplitude to move into each configuration in a (newly defined!) space called the adjacent possible. Having read Stuart K.’s books, this was when I realized my mistake – he describes this concept in “Investigations” in the context of a biosphere, or an economy, where a theorist also doesn’t have an explicit description of all possible future states given in advance.

It seems like this idea has a lot in common with type theory as a solution to Russell’s paradox: the collection of all sets isn’t a set, and so to get at it, sets are generated starting with nothing in successive stages. Whether this also doubles as a solution to the problem of time, I don’t know. In any case, it’s an interesting idea. It definitely would be a problem to have to do path integrals over a space of all topologies for 4-manifolds, when these can’t be classified, so some sort of suggestions are definitely a good thing here.