The Road to Maxwell's Demon: Conceptual Foundations of Statistical Mechanics

Placeholder book cover

Meir Hemmo and Orly R. Shenker, The Road to Maxwell's Demon: Conceptual Foundations of Statistical Mechanics, Cambridge University Press, 2012, 337pp., $95.00 (hbk), ISBN 9781107019683.

Reviewed by Charlotte Werndl, London School of Economics


This is a stimulating and welcome book. It is intended for novices as well as experienced researchers in the field. By working with diagrams and by largely avoiding mathematical equations, Hemmo and Shenker provide a very accessible introduction to basic ideas in statistical mechanics such as dynamical blobs and macrostates, although at a few places the presentation is sloppy. Many stimulating and novel ideas (e.g., on probability and typicality) are put forward, which deserve to be widely known.

The book is organized into thirteen chapters. After an introduction in Chapter 1, Chapter 2 outlines thermodynamics. Chapter 3 introduces classical mechanics and criticizes the ergodic approach, and Chapter 4 is about time and time-reversal in classical mechanics. Chapter 5 discusses macrostates, and Chapter 6 is about probability in statistical mechanics. Chapter 7 deals with entropy, Loschmidt's objection and Zermelo's objection, and Chapter 8 argues that the typicality approach is untenable. Chapter 9 conceptualizes measurement, and Chapter 10 outlines the role of the past in statistical mechanics. Chapter 11 provides a link to Gibbsian statistical mechanics, and Chapter 12 is about erasure of information about the state of a system. Chapter 13 argues that an anti-statistical mechanical demon (as well as an anti-thermodynamic demon) is possible. In the remainder of the review, I will focus on two major themes of the book: probability and typicality. In order to advance the discussion, I will focus on my criticism. However, this should not distract from the originality and importance of their work.


Hemmo and Shenker argue that the only notion of probabilities in statistical mechanics of empirical significance are transition probabilities, such as that a system starting in macrostate M0 at time t0 will end up in macrostate M1 at time t1. Formally, these transition probabilities are given by the relative size μ of the overlap of the dynamical blob B(t1) (i.e., the macrostate M0 evolved forward over t1-t0 time steps) with the macrostate M1 (p. 131):

P(M1, t| M0, t0)=μ(B(t1) ∩ M1). (Probability Rule) (1)

The authors claim that the relative frequencies of arriving in macrostate Mwhen starting in macrostate Mare determined from experiments, and the measure μ in equation (1) should be chosen such that it (approximately) equals these frequencies. There might be several possible measures which fit this desideratum, and then any of these will work. Recall that a measure μ is absolutely continuous with respect to the Lebesgue measure λ exactly when

If λ(A)=0 for a set A, then μ(A)=0. (2)

Hemmo and Shenker stress that the measure μ might well be different from measures absolutely continuous with respect to the Lebesgue measure and hence criticise the common practice of working with measures absolutely continuous with respect to the Lebesgue measure (pp. 134-135).

This is a novel and worthwhile contribution to understanding probabilities in statistical mechanics. It differs significantly from common approaches where talk about probabilities refers to initial probability distributions, giving the probability that the system starts in a certain microstate (Frigg and Werndl 2011, 2012a, 2012b; Malament and Zabell 1980; Leeds 1989). For Hemmo and Shenker the transition probabilities are primitive and the initial probabilities are only derivative. This has the consequence that the initial probability distributions are underdetermined. To give a simple example: suppose one is interested in the transition probabilities when starting in macrostate Mat time t0. Also, suppose one either ends up in macrostate M1 or Mat t1 and that the transition probabilities are P(M1, t| M0, t0)=1/2 and P(M2, t| M0, t0)=1/2. Further, suppose that the dynamics is such that B(t1) ∩M1 and B(t1) ∩ M2 are open regions of phase space. Then there is a great flexibility on the measure μ in the Probability Rule (equation (1)). For instance, one possibility would be to choose a measure μ1 which assigns 1/2 to a single point in B(t1) ∩ M1 and 1/2 to a single point in B(t1) ∩ M2. Another possibility would be to choose a measure μwhich is uniform and sums up to 1/2 over B(t1) ∩ M1 and to 1/2 over B(t1) ∩ M2. These measures correspond to very different initial probability distribution: μcorresponds to an initial probability distribution where the system always starts in one of two specific initial microstates (each with probability 1/2). The latter corresponds to an initial probability distribution which is absolutely continuous with respect to the Lebesgure measure[1], and where the system can start in infinitely many different initial states. Defenders of transition probabilities might not be bothered by this because for them only transition probabilities are important. Still, I suspect that others will regard this underdetermination as an undesirable consequence because it seems to be a matter of fact (and not a matter of choice) in which initial states a system starts.

According to Hemmo and Shenker, the transition probabilities are estimated from experiments (such as for the paradigmatic experiment of a gas expanding in a container, p. 138). Yet I wonder whether physicists really do this. Rather than calculating precise frequencies, they only seem to make much coarser observations such as that it is highly likely that the system eventually ends up in the equilibrium region. Some more thoughts on this issue would have been welcome. Furthermore, it seems possible that the transition probabilities differ for different experimental setups or scientists (even if the observed history is the same), and it would have been nice if the book had addressed this.

Hemmo and Shenker make the very important points that one has to carefully distinguish between physical probability and the formal notion of a probability measure, and that the same probability measure might be interpreted differently in different contexts (p. 136, p. 158). Furthermore, they are right to stress that one cannot know for sure whether the measures m (and also initial probability distribution in statistical mechanics) are absolutely continuous with respect to the Lebesgue measure because our knowledge is limited. As also Leeds (2002) and Davey (2008) have complained, all too often the Lebesgue measure or the microcanonical measure is uncritically assumed to refer to physical probabilities.

However, Hemmo and Shenker go much further than this: they argue that the only compelling consideration concerning the choice of the probability measure comes from experience of relative frequencies. They reject any plausibility argument that the measure m in the Probability Rule (equation (1)) is of a certain kind, for instance absolutely continuous with respect to the Lebesgue measure (personal communication with Hemmo and Shenker). Yet from the relative frequencies alone, for realistic systems one has not been able to determine the measures m of the Probability Rule. Mathematical calculations in statistical mechanics which appeal to measures absolutely continuous with respect to the Lebesgue measure are very successful and lead to many correct predictions. From the account given in the book, because one does not know anything about the measures m which figure in the Probability Rule, it remains unexplained why these calculations are so successful. To give an example, Hemmo and Shenker stress that the Gibbsian method has to use the measure m of the Probability Rule (p. 239). But then it remains unexplained why Gibbsian phase space averaging based on the microcanonical measure is so successful.

I differ with the authors in thinking that it can be worthwhile to consider plausibility arguments that measures are of a certain kind (for instance, absolutely continuous with respect to the Lebesgue measure). If, based on reasonable plausibility arguments, one can explain why certain calculations in statistical mechanics are successful and lead to correct predictions, this represents some progress. Here are two examples of plausibility arguments which are worthy of consideration. First, whenever in the Probability Rule (equation (1)) the overlap of the dynamical blob B(t1) with the macrostate M1 contains an open set, then m can be chosen to be absolutely continuous with respect to the Lebesgue measure.[2] That the overlap contains an open set has some plausibility, and in all the diagrams of the book these overlaps are depicted as containing open sets. Second, if an initial probability distribution is absolutely continuous with respect to the Lebesgue measure, then the measure m in the Probability Rule is absolutely continuous with respect to the Lebesgue measure.[3] Physicists often endorse absolute continuity of the initial probability distributions (cf. Leeds 1989; Maudlin 2007). Furthermore, in their classical paper Malament and Zabell (1980) have argued that there are good reasons to believe that initial probability distributions in statistical mechanics are absolutely continuous with respect to the Lebesgue measure (because one does not have sufficient accuracy to create any other probability measures).


The typicality approach is a recent popular approach in contemporary Boltzmannian statistical mechanics. Here measures are interpreted as typicality measures (i.e. as counting states), and the notion of typicality is invoked in statistical-mechanical explanations. For instance, it is claimed that typical states show thermodynamic-like behaviour. Hemmo and Shenker claim that typicality measures are usually motivated by principles of mechanics (such as that typicality measures have to be invariant under the dynamics) together with a priori considerations. Because of this, they argue, it remains unclear how the typicality measure relates to experiments and to empirical matters, and thus the typicality approach fails (pp. 188-189).

Their arguments are welcome and important. It is indeed often the case that typicality measures are motivated solely by principles of mechanics and a priori considerations. Then it could well be that the empirical world is such that a system always starts in an atypical state, implying that the typicality approach is not fruitful. Yet there seems to be a way out of this by relating typicality measures to physical probabilities. For instance, one possibility is to require that typicality measures should serve as a shortcut to make claims about a class of initial probability distributions of interest (this idea is developed in detail in Werndl 2013). Hemmo and Shenker seem to have thought about this possibility when they write:

Prima facie, one could try to address this difficulty by postulating a probability distribution (say, a uniform distribution relative to the L-measure) over the initial conditions. Here, the probability distribution is meant to have a physical content, namely to describe relative frequencies . . . The physical significance of this idea is that there is some random state generator external to the subsystem, which prepares the subsystem in its initial microstate. But if this random state generator is itself a mechanical system, then its randomness can come only from some other external random state generator, and so on until the beginning of the universe. But then it turns out that the first random state generator of the microstate of the universe is external to the universe. Therefore, it cannot be physical. (p. 189)

This is a contentious issue, but it can be questioned whether an external random state generator is needed. Deterministic systems can be random (cf. Eagle 2005). Even if the universe were a mechanical system and deterministic, it could well be that internally the system produces initial microstates of, say, a gas which are (to a good approximation) well described by an initial probability distribution. So the above argument does not seem to force one to abandon initial probability distributions, which are widely employed in both the physics and the foundations literature (e.g., Malament and Zabell 1980; Maudlin 2007).

Finally, let me give two examples of technical sloppiness in the book. The first example is the definition of absolute continuity. Measures absolutely continuous with respect to the Lebesgue measure are introduced by stating that "these measures agree with the Lebesgue measure on the sets of points that have measures 0 and 1" (p. 67, footnote 29). However, as is clear from equation (2), this is not true: absolutely continuous measures are allowed to assign measure zero to a set for which the Lebesgue measure is positive and measure one to a set for which the Lebesgue measure is smaller than one. This is of some importance for the arguments in the book because the correct definition of absolute continuity is significantly weaker than their notion. Hence it is more plausible that the measure m in the Probability Rule (equation (1)) is absolutely continuous with respect to the Lebesgue measure according to the correct definition. Second, consider the claim that "such a unique property is obtained for ergodic systems: if a system is ergodic, then the only measure conserved under the dynamics is the Lebesgue measure (and measures absolutely continuous with it)" (p. 70). This is not correct: for instance, the Dirac measure over any periodic point is conserved under the dynamics but not absolutely continuous with respect to the Lebesgue measure. It seems that this claim is confused with the following uniqueness theorem in ergodic theory: given a system which is ergodic relative to the Lebesgue measure, if an arbitrary measure is invariant under the dynamics and absolutely continuous with respect to the Lebesgue measure, then the measure equals the Lebesgue measure (cf. Cornfeld et al. 1982).

Critical comments aside, this is a very welcome book on the foundations of statistical mechanics. It is full of interesting and original ideas, which deserve to be discussed further.


I am grateful to Roman Frigg, Meir Hemmo and Orly Shenker for helpful discussions and comments.


Cornfeld, I. P., Fomin, S. V. and Sinai, Ya. G. (1982), Ergodic Theory, Berlin et al.: Springer.

Eagle, A. (2005). "Randomness is unpredictability", The British Journal for the Philosophy of Science, 56: 749-790.

Frigg, R. and Werndl, C. (2011), "Explaining thermodynamic-like behaviour in terms of epsilon-ergodicity", Philosophy of Science 78, 628-652.

Frigg, R. and Werndl, C. (2012a), "Demystifying typicality", Philosophy of Science 79, 917-929.

Frigg, R. and Werndl, C. (2012b), "A new approach to the approach to equilibrium". In: Y. Ben-Menahem and M. Hemmo (eds), Probability in Physics. The Frontiers Collection. Springer, 99-114.

Leeds, S. (1989), "Malament and Zabell on Gibbs phase averaging", Philosophy of Science 56, 325-340.

Malament, D. and Zabell, S. (1980), "Why Gibbs phase averages work -- the role of ergodic theory", Philosophy of Science 47, 339-349.

Maudlin, T. (2007), "What could be objective about probabilities?", Studies in History and Philosophy of Modern Physics 38, 275-291.

Davey, K. (2008), "The justification of probability measures in statistical mechanics", Philosophy of Science 75, 28-44.

Werndl, C. (2013), "Justifying Typicality Measures of Boltzmannian Statistical Mechanics and Dynamical Systems", Unpublished Draft.

[1] This can be seen as follows. Suppose that λ(A)=0 but the initial probability of A is greater than 0 for a subset A of macrostate M0. Then, because the Lebesgue measure or the microcanonical measure (which is equivalent to the Lebesgue measure) is invariant under the dynamics, λ(f(t1-t0)(A))=0, where f(t1-t0)(A) is the set A evolved t1- t0 time steps forward. Because μ2 is the uniform measure over open sets, if λ(B)=0, then μ2(B)=0 for any set B. Hence μ2(f(t1-t0)(A))=0, which contradicts the assumption that the initial probability of A is greater than 0.

[2] For example, let m be simply the measure which assigns uniform probability to any such overlap. This measure is absolutely continuous with respect to the Lebesgue measure.

[3] Suppose this were not the case and for D= B(t1) ∩ M1 one finds that m(D)>0 but λ(D)=0. Because the Lebesgue measure or the microcanonical measure (which is equivalent to the Lebesgue measure) is invariant under the dynamics, it follows that λ(f-(t1-t0)(D))=0 where f-(t1-t0)(D)) is the set D evolved (t1-t0) step backwards. Because the initial probability distribution is absolutely continuous with respect to the Lebesgue measure, also the initial probability of f-(t1-t0)(D) is zero, which contradicts the claim that  m(D)>0.