Este artículo discute algunos procedimientos desarrollados recientemente en el campo del aprendizaje automático para inferir direcciones causales a partir de datos observacionales. Se enfatiza el papel de la independencia y la invarianza. A la luz de estas ideas, se discuten varios ejemplos familiares, incluyendo el problema del mástil de Hempel. Después, se aplica este marco a problemas relacionados con la dirección explicativa en explicaciones no causales.

This paper discusses some procedures developed in recent work in machine learning for inferring causal direction from observational data. The role of independence and invariance assumptions is emphasized. Several familiar examples, including Hempel’s flagpole, problem are explored in the light of these ideas. The framework is then applied to problems having to do with explanatory direction in non-causal explanation.

James Woodward*

Department of History and Philosophy of Science, University of Pittsburgh

* Correspondence to: James Woodward. Department of History and Philosophy of Science, University of Pittsburgh, 1101 Cathedral of Learning, 4200 Fifth Avenue, Pittsburgh, PA USA 15260 – jfw@pitt.edu

How to cite: Woodward, James (2022). «Flagpoles anyone? Causal and explanatory asymmetries»; Theoria. An International Journal for Theory, History and Foundations of Science, 37(1), 7-52. (https://doi.org/10.1387/theoria.21921).

Received: 2020-07-20; Final version: 2020-10-01.

ISSN 0495-4548 - eISSN 2171-679X / © 2022 UPV/EHU

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License

A long-standing puzzle in philosophy of science concerns the direction of explanation (and causation). As a familiar illustration, discussed by Hempel (1965), suppose that we are given information about the height H=h of a flagpole, the length S = s of the shadow it casts on the ground (assumed to be level and at right angles to the pole) as a result of the light provided by the sun and the angle A=a between the shadow and the sun. Then from the values of any two of these variables and laws concerning the rectilinear propagation of light we can derive or deduce the value of the third. None the less only one of these derivations (from H and A to S) is thought to be explanatory (or to track the direction of explanation)—a derivation of H from S and A is no explanation. What is the source of this asymmetry or directionality? Why do we regard one of these derivations as explanatory and the other as not? How can we tell whether we have got the direction of explanation right? Or is there even such a thing as an objectively correct direction in such cases?

A very similar issue (arguably the same issue, at least insofar as our focus is on causal explanation) arises in connection with causal inference. Suppose that X and Y are correlated

This essay explores some of these issues. For most of this paper by “explanation” I will mean causal explanation. The penultimate section (12) will consider the extent to which the framework I provide might be extended to asymmetries present in non-causal explanations. The background theory of causation I will assume is the interventionist theory described in Woodward (2003). For our purposes, we will need only the following simple version:

An intervention on X is an unconfounded manipulation of X that changes a second variable Y, if at all, only through the change in X. For present purposes we can think of it as broadly the same notion as is captured by Pearl’s “do” operator

My emphasis in what follows, however, will be not so much on the role of interventions per se but rather on certain other ideas intimately associated with interventionism—particularly on various notions of independence and invariance which are characterized below. I will attempt to show how these notions connect both to the asymmetric features of causal relations and to interventionist treatments of causal claims. In doing so I hope to cast light both on the asymmetries and on the significance of invariance/independence notions for understanding causation. I stress that what follows is not intended as an argument for an interventionist account of causation. Rather, I am going to assume that something in the neighborhood of this account is correct and then use it to try to illuminate some features of explanatory and causal asymmetries.

To motivate and explain this project, I begin with the observation that in one sense the asymmetries under discussion can be captured or represented perfectly well just by linking claims about causal direction to claims about what happens under interventions. Suppose that X and Y are statistically dependent and assume that if X(Y) causes Y(X), Y(X) does not cause X(Y).

Alternatively (and to anticipate discussion below) we might reason in terms of “soft interventions” as follows: Suppose that we confine ourselves to the example as originally discussed by Hempel and others, and thus assume that H, A and S are the only relevant variables. Thus there are no omitted common causes of H, S and A and the goal is to capture the difference between the following two alternatives: (i) A and H cause S or (ii) A and S cause H. We may then reason that if, in accord with (i), the causal direction is from H and A to S, A will be a soft intervention variable on S in circumstances in which H and A are statistically independent (since H is constant for any given pole, this condition will be satisfied as long as A varies, which will happen over the course of the day). Of course under such interventions on S via A, we observe no changes in H. Assuming that (i) and (ii) are the only alternatives and given the other assumptions above there is no other candidate for a variable that might be used to intervene on S, so we infer that S does not cause H and hence that (i) is correct.

Treatments of this sort seems correct as far as they go,

A second consideration which reinforces the first is this: the notion of an intervention is of course itself a causal notion and as such has a notion of causal direction built into it—the causal direction goes from the intervention I to the variable intervened on. For this reason, if someone is puzzled about the notion of causal direction itself, appeals to what would happen under interventions as a way of understanding causal direction will seem less than fully satisfying.

Before doing this, however, a methodological digression is in order. Some writers who have discussed causal direction frame their discussion around a contrast between, on the one hand, an underlying “metaphysics” having to do with what causal direction “is” or what it “consists in” and, on the other hand, mere “heuristics” that may be epistemically or methodologically useful for inferring causal direction but which have no bearing on what causal direction is, metaphysically speaking.

To the extent that metaphysics/ontology is concerned with what exists or is “out there”, one might, I suppose, think of my exploration of the features G as “metaphysics”. But if so, it is a very different kind of metaphysics than much of what currently falls under that description. It is different in the following respects: in addition to making no claims about what causal direction “consists in”, metaphysically speaking, I also advance no proposals about characteristically metaphysical claims concerning special “truth makers” for causal relationships such as powers, relations of necessitation between universals, systemizations of the Humean mosaic that best balance simplicity and strength and the like. The features G I discuss are ordinary empirical features of the natural world that require no special metaphysics for their characterization.

I thus see what follows as involving a third possible project besides the metaphysical project of specifying what causal direction is and the project of providing mere heuristics which are at best relevant to the epistemology of causal direction. I see this third project as connecting epistemological concerns having to do with how we find out about causal direction with the “what is out there” concerns of metaphysicians, although (again) my answer to the what is out there question does not involve any kind of elaborate metaphysics. My general picture is that causal thinking “works” to the extent that it does because it picks up on or is supported by certain generic features of our world, including in the case of the directional aspects of causal thinking, the features G alluded to above.

I said above that the supporting features G are ordinary empirical features of our world. I believe, as an empirical matter, that they are present in many systems in our world but nothing guarantees that they are always present. Still less will these features be present in all logically possible worlds: their presence is not a matter of conceptual truth. One consequence is that my discussion of causal direction is not intended to apply to worlds that are wildly different from our own: For example, I will not attempt to capture “intuitions” some may have about what causal direction amounts to in universes that contain just two particles. Again, to the extent that a metaphysics of causal direction attempts to address questions about what causal direction consists of in all possible worlds this is not my project.

Having said this, I also want to insist that, independently of what one thinks about the infrastructure project, the epistemological/methodological problem of how one finds out about causal direction in contexts in which experimental manipulation is not possible is an interesting and important one in its own right—both from a philosophy of science perspective and because of its connection with many other disciplines interested in causal inference.

Several further points. First, I suggested above that when one infers causal direction on the basis of non-experimental information what one is in effect doing is inferring what would happen if various interventions were to be performed without actually doing the interventions, relying instead on other features present in such situations—the independence/invariance features G. We should thus think of the features G not as an alternative to the interventionist account of causal direction but rather part of the same package. My basic test for causal direction is the interventionist one described above. As I explain below, I see the features G as relevant to causal direction because they can furnish information relevant to questions about what would happen under interventions. More subtly (as I will try to elucidate) these features help to underwrite the very possibility of interventions.

Second, let me emphasize again that the relationship between causal and explanatory direction and the invariance/independence features G I will be exploring is not proposed as a way of “reducing” the directional features of causal and explanatory claims to invariance/independence claims. For one thing we require a notion of causal direction to properly state the invariance/independence claims. Rather my goal is to “make sense” of the directional features of causal or explanatory claims (or at least some of them) by relating them to worldly structures associated with such claims. Given this conception of my project, I see no reason to suppose—and so will not argue—that there is some single source of the directional features of causation. The treatment that follows accordingly discusses several distinct, albeit related considerations that are relevant to causal direction. Moreover, I do not claim that these are the only features that are relevant to causal direction—there are others that I do not discuss.

Third, although the independence/invariance features on which I focus are features that are present or not in systems in the world, it will often be convenient to speak of these features as also being present or assumed or not in particular scientific theories or causal analyses, meaning by this that if some system is as portrayed by the theory or analysis in question, it will possess those independence/ invariance features. For example, a theory might accurately describe some system in terms of a Cauchy surface along which there is free or independent assignability of initial conditions. As we shall see, such independence among initial conditions is one source of causal directionality. Whether a system or a collection of them exhibit such independence is a fact about the system itself but we can also ask whether the theory assumes the possibility of such free assignability (and whether doing so leads to correct results). This will facilitate the brief discussion below of certain ideas of Wigner’s which are framed in terms of independence assumptions made by various physical theories but where it is also assumed that nature cooperates by (at least often) exhibiting the independence feature in question.

Finally, the examples I discuss in this paper are mainly macroscopic—flagpoles, gases in boxes and so on. Some writers suggest that the directional features of causation are present only in macroscopic systems and are not to be found in microscopic systems. For the most part little will turn in this paper on whether this claim is correct. I’d count it as a success if what I say about causal direction works for macroscopic examples (which I insist are interesting and important in their own right). But that said, I see no reason to suppose that the independence/invariance assumptions to which I appeal, and the treatment of causal direction which follows from them, holds only for macroscopic systems. For example, independence constraints on initial conditions can certainly hold for systems involving atoms and molecules. In general, the idea that we can only make sense of causal direction at a macroscopic scale seems very implausible. When beams of protons collide within one another (C) in the LHC and various scattering events occur (E), does anyone doubt that the causal direction runs from C to E

The rest of this essay is organized as follows. In Sections 3-4 I briefly discuss and put aside two alternative suggestions about causal asymmetries. The first is that these have their source in “pragmatic” considerations. The second is that the asymmetries can be fully understood in terms of time order. Sections 5 and 6 introduce two independence/invariance conditions that are closely bound up with causal direction: value/ relationship independence (VRI) and statistical independence of causally independent initial conditions (CSI). Sections 7 and 8 apply CSI to several familiar examples including the flagpole case. Section 9 explores some relationships between CSI and strategies from the machine learning literature for inferring causal direction in additive error models. Section 10 discusses some examples illustrating the relationship between value/relationship independence and causal direction. Section 11 draws some general morals from the previous discussion about how the directional features of causation sometimes arises, locating this in the relationship between initial and boundary conditions and governing laws, rather in the latter taken alone. Section 12 extends the framework developed in previous sections to asymmetries in non-causal explanations.

A number of authors,

My view is that the best response to this challenge is to identify features that are “objective” and that distinguish causes and effects and explanations of effects in terms of their causes from those that work in the opposite direction. In other words, I see Hempel and a number of other philosophers who have advocated “pragmatic” treatments of causal and explanatory directionality as arguing by default; they think that there are no objective grounds for such judgments of directionality (or at least none that elucidate how directional features contribute to some objectively characterized notion of explanatory goodness) and hence opt for a pragmatic treatment in the absence of any other alternative. One can thus show that the pragmatic treatments are unnecessary or unmotivated by providing the kind of objective account that Hempel and others think does not exist—this is what I aim to do. Of course one of the best ways of arguing for the “objectivity” of causal directionality is to show that there are procedures that reliably identify causal direction and that make use of information about how matters stand in the world, rather than information about our interests or about human psychology.

Another common suggestion about the direction of causation/ explanation takes this to be fully grounded in time order considerations. According to this position, if the only two alternatives are that (i) X causes Y or that (ii) Y causes X, (i) will be true if X or instances

It is certainly true that in many cases we make (and are justified in making) judgments about causal order based on time order considerations.

An even more fundamental problem is that such accounts provide no insight into (or justification for) why time order should matter in the way that it does in explanation and causal judgment. Put differently: even if you are tempted to say that it is true by some definition of causation that effects cannot precede their causes, there is still the question of why we operate with a notion of causation that has this feature. Why shouldn’t we replace our current notion with some notion that permits backward causation or that is undirected? In other words, what work (if any) does the idea that causal relations have a distinctive direction do for us? Saying that we call the event that comes first the cause does not explain the significance of causal direction.

To enlarge on this point, consider Hempel’s view of the flagpole problem. He is perfectly aware that some DN derivations are such the explanans variables take their values before the explanandum variable takes its value, while others have the opposite profile. He asks, in effect, why this should make any difference to the explanatory status of the derivations. In fact, it clearly shouldn’t if, as Hempel thinks, explanation is just a matter of deriving an explanandum from laws and other conditions. A satisfactory response to Hempel needs to show what getting the directional features right contributes to correct explanation and causal judgment. Appeal to time order as a primitive basis for sorting out causal or explanatory direction does not do this. Put differently, what we are looking for is (i) an account of causal explanation and causal claims—an account of what such explanations do when they are good—and (ii) an associated account of causal direction that enables us to understand what (ii) contributes to (i). Skeptics about “objective” treatments of explanatory direction such as Hempel haven’t been answered until we have done this.

This is also the appropriate place to correct a misunderstanding about the relationship between time order considerations and interventionist interpretations of causation and directed graphs. The notion of an intervention I on a variable X presupposes, as I have said, a notion of causal direction: the causal direction is from I to X. However, the notion of an intervention of I on X does not build in (at least in any obvious way) assumptions about time order.

Since the focus of this essay is on considerations relevant to causal direction that are not based on time order considerations, there are many interesting and important questions relating time and causation that I do not address, either at all or in the kind of detail they deserve. For example, there is the issue, noted immediately above, of why our world apparently does not contain instances of “backward” causation in which effects temporally precede their causes. There is also the general issue of the relation between casual directionality and thermodynamic asymmetries, including the connection of these with various cosmological hypotheses, such as the past hypothesis. I touch on this only very briefly in Section 13. My failure to discuss these issues in any depth does not mean that I regard them as unimportant. It is, however, also interesting that there is much that can be said about causal direction without directly discussing time and entropy.

I turn now to a discussion of several different varieties of independence which I claim can be connected to causal and explanatory direction in illuminating ways. I distinguish three of these—(i) independence in the sense of statistical independence of variables that are causally independent (causal to statistical independence or CSI), (ii) independence between the values of cause variables and the causal relations/laws in which they figure (variable relationship independence/invariance or VRI) and, closely related to (ii), (iii) independence of different causal relationships from one another. My main focus will be (i) and (ii).

I begin with (ii) since this is the most natural point of entry. A basic feature of many physical theories and also of structural equation models that purport to represent causal relationships is a distinction or “cut” between what are often called “initial conditions” (hereafter ics)—“accidental” facts about the values certain variables happen to take—and the laws or causal generalizations (hereafter c-generalizations) connecting variables, including those having to do with initial conditions, to one another.

Before proceeding two caveats are in order. First, I use “initial conditions” because this is common parlance; this usage is not meant to imply the initial conditions occur temporally before other behaviors of the system in which we are interested. Second, talk of “initial conditions” is not meant to deny that there are other conditions, including boundary conditions and constraints that are also important in constructing causal analyses and explanations, particularly when these involve differential equations.

In many cases it has proved possible to separate such initial conditions from the c-generalizations in such a way that they satisfy the following condition: the c-generalizations continue to hold—they are stable or robust—under various changes in the ics. In such cases I will say that the c-generalizations are invariant under changes in the ics. For example, initial conditions for application of the Newtonian gravitational law include the values of the masses m1 and m2, and the distance d between them. The law itself continues to hold—that is, it continues to accurately describe what will happen—under changes in the values of these initial conditions, both those that occur in a single system and across different systems. Similarly for other sorts of changes—spatial translations and Galilean transformations of gravitating systems. Plausibly these invariance features are at least part of the reason why we regard the gravitational generalization as a law.

In the case of structural equation modeling it is standardly assumed that if an equation—e.g., Z = aX + bY—describes a causal (or genuinely “structural”) relationship (with X and Y causing Z in the way the equation indicates), then this equation will continue to hold under changes in the values of X and Y (think of these as corresponding to initial conditions) for at least some range of changes in these values. Of course equations meeting this condition in the contexts in which causal modeling techniques are used will typically hold under a much smaller range of changes in initial conditions than the generalizations we regard as physical laws but some degree of invariance of the sort described is plausibly regarded as a necessary condition for those equations to represent causal relationships.

Figuring out how to make the cut between initial conditions and c-generalizations such that the latter are at least to some extent invariant over the former is an extremely important step in constructing an explanatory theory in many cases. That we are sometimes able to separate c-generalizations and ics in this way and that the result allows for accurate predictions of the behavior of many systems is, as emphasized by Wigner (1970) and others, a highly non-trivial fact and one that should not be taken for granted.

To make a connection with what will come later, another way of thinking about the invariance property just described is that involves a kind of independence of c-generalizations from initial conditions: the cut between c-generalizations and initial conditions is made in such a way that (ideally) they are independent of each other. “Independence” in this context obviously cannot mean statistical or probabilistic independence—c-generalizations are not random variables characterized by joint probability distributions involving initial conditions. Nor does it seem right to think of this sort of independence as a kind of causal independence, at least in any straightforward sense. As noted earlier, one way of expressing the basic idea is in terms of counterfactuals: the initial conditions should be such that they can change “independently” of the c-generalizations in the sense that the latter would remain the same (would continue to hold) were the former to change in various ways.

To make this more precise consider the contrast between the following two structures:

Directed arrows represent causal relations in both structures. In both structures there is a correlation between C and E, represented by the undirected edge. Suppose that in structure (i), X is the only cause of C and there is no direct causal relation between X and E (i.e., no causal relation between X and E that does not go through C.) Then if E changes under changes in the value of C (where these are caused by changes in the value of X), this provides good reason to conclude that the correlation between C and E is causal. One basis for this reasoning is that in (i) the change in C due to X is intervention—like and the conclusion that C causes E follows from M. By contrast, if E changes under observed changes in C under (ii) this does not provide good reason to conclude that C causes E, since the correlation between C and E may be entirely due to the common cause X.

When we talk about the relation between C and E being invariant/independent under changes in the value of C, we should require that this invariance holds under changes in C that are caused in the way represented by (i) and not just in the way represented by (ii). This suggests:

Suppose that we are able to determine that changes in initial conditions have occurred due to some appropriately intervention-like process like (i) and that we observe that some c-generalization continues to hold across these changes. This would establish that the kind of independence/invariance under discussion is present. Suppose, by contrast, we observe a change in initial conditions and that some candidate generalization continues to hold across those changes but we are not able to observe or directly determine whether those changes in initial conditions are the result of some intervention-like processes. That is, we observe a correlation between C and E that continues to hold under changes in the value of C but not whether the changes in the value of C are caused by some X that has the properties in structure (i) or alternatively by some X in structure (ii). For example, we observe the joint probability distribution of two variables C (for different values of C) and E (and that they are correlated) but don’t observe the factors that determine P(C). Given some candidate function f (c-generalization) linking C to E, is there some way of determining whether f is (in the sense under discussion) “independent” of P(C)? And if so, can we use this information to infer causal direction? Indeed, what might “independence” mean in this sort of case?

One way of approaching this problem, employed in portions of the machine learning and (in a sense) in the econometrics literature, is in terms of the requirement that there be a kind of informational independence between the c-generalization and the associated initial conditions: information about the values of the initial conditions should not tell us anything specific about the c-generalization linking C to E and conversely. On my interpretation,

We can connect this idea about absence of constraints between initial conditions and c-generalizations to an explicitly interventionist treatment of causation in the following way. Suppose we are given a candidate c-generalization C → E and that it turns out that interventions that change the value of C are accompanied by associated changes in E. What this implies is that there is a way of generating values of C (a relationship R1 that allows for the causation of values of C from some cause of C such as X in (i) above) that is distinct or separate from the relationship R2 linking C to E. If there were no such relationship R1 that might be used to produce values of C where R1 is distinct from the C → E relationship, it would not be possible to intervene (in the technical sense) on C with the observed result.

So far we have been talking about “independence” of c-generalizations from initial conditions or causes. There are, however, additional independence conditions that sometimes seem very natural and that can be imposed on the initial conditions/causes themselves (once we have separated them out from the c-generalizations as described above). One such condition connects causal independence and statistical independence (CSI, as referred to earlier): suppose there are distinct random variables

This is one version of what is sometimes called the principle of the common cause. Something like this is sometimes described in the physics literature as the assumption that “incoming” influences should be uncorrelated (if we understand incoming influences to be causally independent

Let me repeat that my claim is that CSI describes a generic pattern that, as a contingent empirical matter holds widely, if not universally, in our world. I do not claim that CSI reflects a conceptual or metaphysical truth of some kind that holds in “all possible worlds”. My assumption is that CSI and similar principles, although contingent, help to underpin the ways in which we think about causation and causal direction. (They are part of the infrastructure associated with causal direction mentioned earlier.) I will not speculate about how if at all one thinks about causal direction in worlds in which CSI is systematically violated (or which we might find it tempting to describe in that way).

Note that CSI does not, as formulated, embody a temporal asymmetry. It connects causal and statistical independence but says nothing about causes occurring temporally before their effects or about independence being present before causes interact to produce an effect but not after.

Two further points. First, I will understand CSI as having, so to speak, an architectural or strategic component. Given a set of variables and associated causal relations for which CSI appears to fail, it will often be a good strategy to look for new variables and causal relations formulated in terms of them for which CSI holds. (I take this to be one of the themes of Wigner’s discussion: we should try to discover initial conditions which are such that CSI or some similar initial condition holds.) Second, as already suggested, I assume that whether it is possible to do this is in a way that results in an empirically adequate theory is an empirical matter, which depends on what the world is like. It is not a conceptual truth or metaphysical necessity that it will always be possible to formulate successful theories or analyses satisfying CSI.

I will not try to defend CSI here—there is a big literature about this

As noted above, the architectural aspect of CSI suggests that we should look for models or explanations in which the assumed initial conditions or the variables that are represented as exogenous are statistically independent of each other.

Although neither of the two independence conditions VRI and CSI make reference to time both require, for their correct statement, a notion of causal direction. In the case of VRI, the requirement is that the c-generalizations C → E linking cause to effect should be invariant under changes in the values of the cause variable C. This is very different from (indeed, as we shall see, in many cases inconsistent with) the requirement that the C → E generalization be invariant under changes in the value of the effect E. In many cases this latter invariance claim is false.

A similar point holds for CSI. This requires statistical independence among cause variables (in the absence of causal relations connecting those variables) but of course it does not require statistical independence among effect variables. Given a structure that looks like this we expect, in accord with CSI, X and Y to be statistically independent in the absence of further information. On the other hand, if were to reverse the arrows to yield the following structure, we would expect X and Y to be dependent.

It may seem tempting to infer from these observations that in order to use VRI and CSI we must have already identified the correct causal direction. In fact exactly the opposite is true—the features just described often make it possible to infer causal direction. Suppose, for example, we find that a candidate generalization relating C to E is invariant under changes in C (C → E is “independent” of the value of C)—something that, as noted above, can sometimes be determined empirically—but (E → C) is not invariant under changes in E. Then, at least in many cases, we can conclude that the causal direction is from C to E (see Sections 7-9). Similarly, given a case in which there are three variables, two of which are pairwise correlated and one pair of which is independent (as in Figure 4 above), we can, given additional assumptions (see P immediately below), use CSI to infer that the direction of causation is from the two independent variables to the third.

I turn now to more explicit application of these ideas connecting independence to causal asymmetries beginning with the flagpole problem and CSI. Here I will make use of the following principle (which I take to be motivated by CSI):

To apply this principle to the flagpole example, I will follow standard presentations of the problem in assuming that the only two alternatives are that H and A cause (or causally explain) S or that S and A cause H, so that principle (P) applies. (This conforms to the standard formulation of the problem which asks why we should distinguish (and prefer qua explanation) a derivation in which H is in the explanans from a derivation in which S is in the explanans.) Suppose that we observe several flagpoles of different fixed heights h1...hn, at different times of day for each pole, so that A varies. In this case for any given A, there will be a correlation between the heights of the poles and the corresponding shadows of lengths s1...sn but no correlation between H and A. As A varies over the course of the day, we also find, for each pole, a correlation between A and the length of the shadow cast by that pole. Thus we have the following pattern of independence and dependence relations: H_|_A, H_/|_S, A_/|_S. Applying P, we infer that H and A cause S.

There are a number of different ways of thinking about the justification for (P) and its applicability to this case. First and most obviously, the above pattern of dependencies is what we should expect if

is the correct structure but not if

is correct. According to (i) (and assuming that the alternative possibilities are restricted in the way described above) H and A are causally independent and hence by CSI, we expect H_|_ A. By contrast if (ii) is the correct structure then again by CSI we should expect S _|_A, which we do not observe.

Note that although this reasoning relies on CSI, it does not rely on anything stronger such as the Causal Markov condition or on the assumption of faithfulness F which is sometimes assumed in causal modeling.

We can also connect principle P with standard interventionist thinking and thus get further insight into why P “works” as follows. As noted above, within the interventionist framework the claim that H causes S and S does not cause H corresponds to the claim that there are interventions on H that will change S but no interventions on S that will change H. The claim that S causes H has the opposite profile concerning the results of interventions. Again assuming that these are the only two possibilities (and making the assumptions about the absence of common causes etc. described above), the pattern of (in)dependencies A_|_H, H_/|_S, A_/|_S suggests that A functions as a soft intervention variable on S, since it is exogenous and independent of the only other possible cause of S, namely H. Observation shows that changes in this intervention variable A for S are not associated with changes in H. This suggests that S does not cause H. Moreover, if we assume that S causes H, then, under this assumption, there will not be, among the variables in the system, any intervention variable for H that is independent of S, since the only remaining variable, A, is correlated with S.

On this view of the matter, a pattern of (in)dependence relations involving H, A and S conveys information (given the background assumption that one is choosing among a very limited range of possibilities) about what would happen if various interventions were to be performed, even though no interventions are in fact performed. This is an example of what I meant earlier in saying that (in)dependence information can be connected to interventionist ideas concerning causal direction in a way that illuminates how the former can be a source of information about the latter. It also illustrates how observational information, not involving interventions, can be used in conjunction with background assumptions to answer questions about what would happen if certain interventions were performed.

Another related way of thinking about the flagpole example appeals to the desirability of avoiding unexplained coincidences or dependencies when there are equally adequate alternative models that do not require such coincidences. As noted above, when one observes a single flagpole, the naturally occurring changes in A over the course of the day due to changes in position of the sun will be correlated with S. Moreover, S and A will change in concert in just such a way that the value of H remains constant. Thus in a model in which S and A cause H (with no causal connection between A and S) S and A will appear to be precisely “tuned” to each other, varying so as to maintain a constant value for H, despite the absence of a causal connection between these variables. The model in which S and A cause H will thus look like Figure 5 with the undirected arc between S and A representing the fact that they co-vary together, despite the fact that neither is represented as causing the other and they are not represented as having a common cause.

By contrast in a model in which H and A cause S, there is no such unexplained dependency: all of the observed dependencies follow just from the causal structure of the model and what are assumed to be exogenous changes in A (or in H if we are considering populations of poles.) In one obvious sense the model in which S and A cause H is less simple than a model in which H and A cause S—less simple in the sense that the former model requires additional information (in the form of a statistical dependency between A and S) besides the two causal arrows it postulates to account for the observed dependencies while the latter model requires only two causal arrows. There is thus a kind of redundancy in the S → H model since the observed dependencies could be accounted for without postulating the A—S correlation.

There is another, related way of thinking about the flagpole example which will be useful later in our discussion. So far we have been considering causal and correlational relations just involving H, A and S. But (as noted in footnote 41) there is another source of information about causal direction. This has to do with variables that are exogenous causes of H. Often we have at least some information about these. (In realistic cases these often will be hard intervention variables for H.) An obvious candidate for such a variable is the actions/intentions X of the person or machine who fashioned the pole as having one height rather than another (see Figure 6).

As we see from this example, causal information about some variables, including information about causal direction can, when combined with correlational information, constrain causal direction among other variables. As we shall see in Section 12 a similar sort of strategy can work when in some cases involving non-causal explanatory dependencies.

Now consider a more subtle example.

The two examples thus illustrate an important point that will receive more attention later. The causal direction in the examples is not just “in” (or fixed or determined by) the law PV = nRT considered by itself but rather (also) has to do with role played by the initial and boundary conditions and constraints governing the system. This includes information about what is or is not correlated with what among these conditions, but this in turn reflects what is physically fixed and not allowed to vary (as is the case with the container of fixed volume) in contrast to what is allowed to vary (as with the movable piston and fixed weight). That this information is relevant to causal direction is an implication of principle P since what quantities are correlated or not with others may depend (as the two gas examples illustrate) on what is fixed and what can vary in the specific systems we are considering.

One way of thinking about the upshot of my discussion so far is that there is more content or structure present in many explanations and causal claims than what is captured by a simple focus on deductive relationships (or facts about “instantiation” of regularities) of the sort that characterize the DN model (and a number of other models of explanation). Information about which variables are independent of others (including, crucially, information about independence relations among candidate cause variables and which variables are to be regarded as fixed in value) contributes importantly to directionality and to explanatory import—this information is a “working part” of the explanation. Relationships that may look completely symmetrical (such as the relationship between the height of flagpole and the length of its shadow) can be shown to embody asymmetries when one attends to independence relationships. These asymmetries matter for successful explanation—they are tied to the ability of explanations to answer questions about what would happen if initial conditions were different (called w-questions in Woodward, 2003) and to the explanatory virtue of avoiding unexplained coincidences.

In the examples discussed so far, the causal relations are assumed to be deterministic and the values of all three variables figuring in those relations are observed. A body of recent work in machine learning (e.g., Janzig et al., 2012, Peters et. al. 2017, Shimizu et al., 2006, Hoyer et al., 2009) explores a set of different but related problems. Suppose that X and Y are statistically dependent but their relationship is stochastic or noisy where this can be represented by the presence of a noise or error term—i.e., X and Y are related by some function in which a noise term figures. We wish to determine whether X causes Y or conversely. We assume further that no unmeasured common causes are present and that the noise term enters additively into the relationship between X and Y, so that there are just two hypotheses about causal direction—either (i) Y = f(X) + U or (ii) X = g(Y) + U’ where U and U’ are error terms. We can observe X and Y but not U or U’. In one kind of case, the functions f and g are assumed to be linear but the processes that generate the candidate cause variables and the noise term are assumed to be non- Gaussian (more precisely at most one of these is Gaussian). A technique known as independent components analysis (ICA), which separates non-Gaussian distributions into statistically independent components, is used to examine whether it is possible to fit an equation of form (i) to the X, Y distribution with X_|_U and similarly to determine whether it is possible to fit an equation of form (ii) with Y_|_U’. If the error term can be made independent of the candidate independent or cause variable in one direction, but not the other, one infers that the former is the correct causal direction. The assumption of non-Gaussianity is crucial to the success of this procedure since ICA requires this assumption and more generally because the linear Gaussian case is symmetric—in this case it is always possible to fit independent errors in both directions so that the procedure gives no recommendations about causal direction.

In a second kind of case it is again assumed that no unmeasured common causes are present and that the relationship between X and Y involves an additive noise term, so that as before the alternative hypotheses are (i) Y = f(X) + U or (ii) X = g(Y) + U’. However now the functions f and g are assumed to be non-linear. In this case if one can fit a model of form (i) such that X_|_U, then “usually” (with certain exceptions again including the case in which the joint distribution of X and Y is bivariate Gaussian) there is no such additive noise model in the opposite direction from Y to X—that is, no U’ such that (ii) X = g(Y) + U’ with Y_|_U’. (“Usually” means that if (i) holds, the space of functions in which (ii) also holds is of much lower dimension.) Again if there is a model of form (i) with X_|_U and no model of form (ii) with Y_|_U’ one infers that (i) is the correct model.

Both of the methods just described have been tested on real world data for which causal direction is independently known (or at least there are generally accepted beliefs about this). Without going into a lot of detail, as an empirical matter, the methods perform reasonably well on many data sets, with accuracies in the neighborhood of 70 to 80 percent (as opposed to the 50 percent that would be expected from random guessing.) For example, given information about the joint distribution of altitude and rainfall in various areas in Germany, the first method correctly infers that it is more plausible that altitude causes rainfall than conversely. Given data on the duration of an eruption and the time interval between subsequent eruptions of the Old Faithful geyser in Yellowstone National Park, the method involving non-linear functions infers that the correct model is that “current duration causes next interval length” rather than conversely. (Note there is no reliance here on time-order information).

At an abstract level these methods closely resemble the methods described above in connection with the flagpole and gas cases. Both methods make use of statistical (in)dependence information with the guiding idea being that if there is independence among putative causes in one direction and no such independence in the other direction, then the correct direction is one in which the causes are independent. For example, when we find an error U which is independent of X but no error U’ which is independent of Y, we infer that U and X are causes of Y. (Of course the additive error models also make use of additional assumptions, concerning the form of the function linking cause and effect as well as the distribution of the noise term, but in other respects they start with less information than in the previous examples—the error term is unobserved and must be inferred while all three variables are observed in the flagpole and gas cases. In effect the unobservability of the error term is offset by the additional assumptions made in the additive error model case.)

We can provide the same general diagnoses of why the machine learning techniques involving additive error models work that we appealed to in the previous examples. CSI suggests that causes should be independent in the absence of causal relations among them or omitted common causes. So if, e.g., X and U are independent and Y and U’ are dependent, we take X and U to be causes of Y. In addition, the same considerations having to do with unexplained correlations apply. In a model in which Y and U’ are claimed to cause X with U’ and Y dependent there is an unexplained correlation between U’ and Y. By contrast in a model in which X and U cause Y with X_|_U there is no such unexplained correlation. Other things being equal, this favors the latter model.

Similarly, looking at the matter from an interventionist perspective, if, as we are assuming, the only two possibilities are that X and some U cause Y or that Y and some U’ cause X, the existence of a U which is independent of X but not independent of Y strongly suggests that one can intervene on Y (by using U) without changing X, which is diagnostic of the absence of a causal relationship from Y to X. At the same time, assuming that there is some causal relationship R that determines the value of X, the independence of U from X in a relationship of form Y = X + U also suggests that R does not affect U. This in turn suggests that these generating conditions R for X operate so as to change the value of X in a way that is independent of the other causes of Y, represented by U. Since if such changes occur, X and Y remain correlated, we have evidence that X causes Y. In other words, finding an independent error in on direction but not in the other amounts to finding relevant (soft) intervention variables, even if these are not initially observed.

So far we have been considering cases in which the effect variable is the result of two

I will first try to provide some intuition regarding the basic idea and then describe some details. First recall the independence relation VRI discussed above, concerning the “independence” of initial conditions and the c-relationships in which they figure. As noted above, “independence” in this context cannot mean statistical independence; instead in parts of the machine learning literature (e.g. Janzig et al., 2012) independence is instead understood as a kind of informational independence or more formally in terms of “algorithmic independence” defined in terms of Kolmogorov complexity. I will relegate details about the latter to a footnote

To further illustrate the underlying idea, let me switch to a different example:

In such a context it is natural to take the independence or invariance of the conditional probability Pr (Y/X) under changes in Pr(X) (where by changes in Pr(X) I mean a change from one probability distribution Pr1(X) to a different distribution Pr2(X)—i.e., the Pr(X) is not stationary.) as encoding information about the causal relationship, if any, from X to Y. That is, if the causal direction is X → Y, then Pr(X) should be independent of Pr(Y/X) and Pr(Y/X) should be invariant under changes in Pr(X) and conversely. If instead the causal direction is Y → X, then Pr(X/Y) should be independent of Pr(Y) and invariant under changes in this probability distribution.

It is relatively easy to see that invariance/independence in one of these directions under some specified set of changes in the cause variable is inconsistent with invariance in the other direction under the same set of changes given some very natural additional assumptions. Suppose that the conditional probability Pr(Y/X) is invariant under changes in Pr(X) and focus on the case in which X and Y have just two values, 0 and 1. Assume that Pr(Y = 1/X) ≠ 0 or 1 (for either value of X) and that Pr(Y/X = 1) ≠ Pr(Y/X = 0) which is plausibly a necessary condition for X to be causally relevant to Y. We have from Bayes’ theorem:

Suppose Pr(Y = 1) changes in value. We want to know whether the conditional probability on the l.h.s. of (10.1) will remain invariant under this change, given the assumptions that the probabilities Pr(Y/X) are invariant. Since Pr(Y = 1) = Pr(Y = 1/X = 1) Pr(X = 1) + Pr((Y = 1/X = 0) Pr(X = 0) and (we are assuming) the conditional probabilities Pr(Y/X) are invariant under changes in Pr(X), this change in Pr(Y = 1) must involve a change in Pr(X).

Given the relationship between finding invariant relations and correctly identifying causal structure this helps to motivate the assumption that in this sort of case the correct causal direction is given by the direction in which the conditional probabilities are invariant. That is, in a two variable case meeting the conditions just described if Pr(Y/X) is invariant under changes in Pr(X), we should infer that the direction of causation is from X to Y. We thus see that, just as in the flagpole case, the fact that certain quantities are invariant or independent of other quantities can be used to establish asymmetries in what might otherwise look like symmetric situations.

This example also provides an illustration of what would be involved in initial conditions and a c-generalization being “tuned” to one another in such a way that VRI fails. If the causal direction in the example is X → Y, then (assuming the connection between causation and invariance under discussion), as Pr(Y) changes, Pr (X/Y) will also change or adjust systematically in such a way that that the invariance of Pr(Y/X) under Pr (X) is preserved—thus changes in Pr(Y) will be tuned to changes in Pr(X/Y).

In the case as just described we assumed that there was an actual change (“shift”) in the probability distributions Pr(X), Pr(Y) and considered which of the conditional probabilities were invariant under these changes. If we could observe such changes and the relevant conditional probabilities, we could use this to infer causal direction. This strategy is employed by Hoover (2001) in a series of papers investigating the causal direction between economic variables.

I remarked above that in the machine learning literature, these ideas about informational independence can be represented in terms of algorithmic information theory. This allows for the formulation of a notion of informational independence in terms of Kolmogorov complexity that is analogous to statistical independence and that applies to objects that are not random variables (such as functions and probability distributions). Within this framework, with a candidate cause X and a function that f that generates Y from X, the independence notion can be stated as the requirement that the description of X should be algorithmically independent of f or perhaps algorithmically independent of f conditional on some specified body of background knowledge. Although this yields a way of formalizing informational independence and the proof of theorems about it, it is not helpful in the analysis of particular examples, since Kolmogorov complexity is not computable. Practical implementation requires a more operational notion of informational independence.

Here the literature (e.g. Janzig et al., 2012) appeals to more specific mathematical facts, relating various functional forms, including the following. Suppose that X and Y are real variables where Y = f(X) is a differentiable bijective function on the [0, 1] interval with a differentiable inverse f–1. If log f’ and P(x) (the probability density of X) are “independent” in the sense that

then ∫ log(f–1)’ and Pr(y) are positively “correlated”, i.e.,

unless f is the identity.

This suggests a test for directionality that consists in looking for “dependencies” between the derivative f’ of f and the density of the candidate cause variable—in other words one looks at the relation between f’ and Pr(X) and between (f–1)’ and Pr(Y). If, say, the former pair are informationally independent and the latter informationally dependent, one takes this as a reason to conclude that the correct causal direction is from X to Y. As an illustration (Janzig et al., 2012) suppose that X and Y are related as in Figure 7, with Pr(X) uniform and Pr(Y) highly non-uniform:

Consider the regions of large slope for f–1 (small slope for f). These are “correlated” with large peaks for Y, as shown in the diagram. Given the uniform distribution of X, the regions in which f has small slope will transform values of X in those regions to very similar values of Y, so that the density of Y piles up around those values. In this sense there will be an informational dependence between f–1 and Pr(Y)—the slope of f–1 tracks the lumpiness of Pr(Y). By contrast, given the uniform distribution of X, there is no such “correlation” between Pr(X) and f. Thus one concludes that X causes Y rather than Y causing X. Note that in this case just two variables are involved, rather than three as previously. Moreover, the functional relation between them is deterministic and invertible.

This method, like those considered previously, can be tested experimentally on real world data in which the causal direction is known on independent grounds. The method again correctly identifies causal direction at a rate well above chance: for example, accuracy rates are in the neighborhood of 75% depending on details of implementation for sets of observations of water levels at various locations along the Rhine (where it is agreed that upstream levels cause downstream levels rather than conversely.)

This particular operationalization of informational independence obviously requires that the functional relations between cause and effect meet various conditions—the functions must be bijective, differentiable with differentiable inverses etc. In other cases, we may have reason to believe that the functions relating cause and effect will not satisfy these particular conditions but it may be possible to find some alternative operationalization that draws on the same underlying idea about independence of the process that generates the cause from the process that generates the effect being a clue causal direction.

As I have interpreted this method, it attempts to infer what would happen to the function relating cause and effect—in particular, whether this would remain stable under changes in the distribution of the putative cause—from relations of informational independence or their absence that are observed within a single joint distribution, as illustrated in Figure 7 above. Clearly even if it is right that whether or not the relationship X → Y is stable under changes in the distribution of X is a reliable clue regarding causal direction, there is additional inductive risk in trying to infer such stability from informational independence relations in the way described, where we don’t actually observe what happens under distributional changes in X but merely try to infer what would happen were such changes to occur from a single observed distribution of X.

In particular, one worry one might have about the example in figure 7 is that that there are, after all, functions and mechanisms that take relatively non-uniform distributions and produce uniform distributions as outputs—think of gambling devices such as roulette wheels. In such cases, the correct causal direction will be from non-uniform Y to uniform X rather than from uniform X to non-uniform Y, as the method under discussion recommends for the example in Figure 7. In fact, however, a closer look arguably supports the analysis provided above. In non-uniform to uniform cases involving gambling devices the operative dynamics or mechanisms will take any one of a very large range of distributions of initial conditions (e.g., in some treatments any probability density over the initial conditions that is absolutely continuous) into a uniform distribution. Thus what is going on in such cases is that the dynamics is (largely) independent of the initial conditions after all, so that the initial conditions are causes and the distribution of outcomes the effect. In other words, we have information about a non-uniform input → uniform output relation that is stable under changes in input which makes it clear what the correct causal direction is. This contrasts with the information that is available in Figure 7 where we see only a single non-uniform distribution which is associated with a uniform distribution, so that the choice is between a cause-effect function that takes a uniform distribution as input and produces a non-uniform output (as any function with a non-constant derivative will do) and an alternative function that takes a non-uniform distribution as input and exactly undoes the non-uniformity in such a way as to produce a uniform output. When this is the only available information, it is not so obvious that the former choice is unreasonable. It might be argued that functions that undo non-uniformity to produce uniformity are “unusual”.

In any case, my concern here is not to argue for this particular implementation of informational independence but rather to stress the general idea that independence/invariance understood in terms of VRI between the distribution of a variable or its generating mechanism can contain important information about causal direction. Moreover, if my argument so far is correct, this is not merely a superficial symptom that happens to be associated with causal direction. It instead involves a deep structural feature present in causal relationships (or at least many of them): it is exactly when the X → Y relationship is invariant under changes in X and or independent of whatever is responsible for the generation of the distribution of X values that we can use manipulation of X and the X → Y relationship as a way of changing Y. In the remainder of this essay I want to examine some additional implications of this idea and of CSI.

One general moral that can be drawn from the discussion so far is that the directional features of causation are closely bound up with facts about the initial and boundary conditions of the systems we are analyzing and the way in which these are related to or interact with the c-generalizations governing those systems. Thus in many cases, the directional features are not to be found in the governing c-generalizations alone. We saw this in connection with the gas cylinder example, in which systems with different initial and boundary conditions had causal relations with different directions, despite being governed by the same law. Similarly VRI is obviously a condition concerning the relationship between initial conditions and candidate c-generalizations.

This general picture contrasts with a common alternative picture that that is explicitly or tacitly assumed by many philosophers. I call this the “cause in laws” picture. According to this picture, laws of nature (or more generally, governing c-generalizations, whether or not they are laws), taken by themselves, have rich causal content and directly describe causal relationships. Thus the “logical form” of such generalizations or laws is something like: “All Fs cause Gs”, where “cause” has all its usual connotations, including directionality.

It is well known that this picture generates a number of puzzles. First, the word “cause” or equivalent expressions does not explicitly occur in most fundamental physical laws—perhaps in none, depending on what one counts as a law. “Cause” also fails to occur in many c-generalizations employed in sciences outside of physics.

Another more fundamental problem concerns the apparent tension between the directionality or asymmetry of the causal relationships and various “symmetries” of most basic laws. “Symmetry” in this context is used in several different ways. Some writers use it to refer to the fact that fundamental laws are “deterministic” in both temporal directions: from past to future and from future to past. More commonly “symmetry” concerns the time reversal invariance of fundamental laws. (Which of course is different than bi-directional determinism.) Very briefly, characterization of time-reversal requires specification of an operation on the variables within an equation that replaces these with their temporal “inverses”: the time variable t is replaced by -t, the velocity variable v by -v and (according to most) in classical electromagnetism the magnetic field B should be replaced with -B. An equation or law L is then time reversal invariant if, when some physical process P is consistent with L, so is the time reverse of L. For example, according to the laws of classical electromagnetism, an accelerating charge will be associated with electromagnetic radiation radiating outward symmetrically from the charge. These laws also permit the time-reversed process according to which a spherically symmetric wave of electromagnetic radiation converges on a single charge which then accelerates—a process which appears to be rare, absent some special contrivances.

A number of philosophers have thought that time reversal invariance and other sorts of symmetries present in fundamental laws raise problems for the directional or asymmetric features of causal claims; the concern is that there appears to be nothing in fundamental physics that “grounds” or serves as a basis for these directional features.

This in turn has led to several different responses. One is that this shows that the assumption that causation has directional features is a mistake since there is nothing in reality that might serve as a basis for these features. Another possible response (perhaps not sharply distinct from the first) is that that since the directional features (allegedly) have no basis in fundamental physics, they must have some other source—one suggestion is that they derive in some way from facts about us such as a particular perspective we adopt as deliberators. Views of this are defended by Price (2007, 2014) and are discussed by Ismael (2016) among others.

A very different view of the status of the directional and perhaps other features characteristic of causation is that their apparent absence from fundamental physics shows that the equations of physics, in their usual formulation, require additional supplementation in the form of various free-standing “causality principles” that provides those equations with causal content. Such principles might be thought to be at work when, for example, certain solutions to an equation expressing a physical law are discarded on the grounds that they violate the condition that effects cannot temporally precede their causes. Yet another possibility is to reinterpret the equations themselves so that they make straightforward causal claims—e.g., Coulomb’s law may be interpreted as the claim that charges cause electromagnetic forces or fields that in turn causes changes on other charges. Views of this are perhaps suggested in Cartwright (1983).

I think that all of these views rest on the mistaken adoption of the cause in laws idea. That is, advocates of these views assume that if a basis for causal notions (and in particular the directional features of causation) are to be found anywhere in science or in physics, they are to be found in physical laws (or perhaps other governing c-generalizations from sciences besides physics) alone. Not finding such a basis in laws, these writers look for the basis in more anthropocentric sources, or in causal supplements in addition to physical laws as ordinarily formulated or, alternatively, conclude instead that there is no basis. As explained above, my contrary suggestion is that the basis for the directional features of causation is to be found in facts about initial and boundary conditions characterizing the systems we are analyzing and how these relate to (or interact with) laws and c-generalizations. At least some of these facts are captured by conditions like VRI and CSI. Arguably these conditions involve straightforwardly “objective” facts that describe how matters stand in the world—they are not somehow due to our human perspective or projective activities. At the same time, the idea that making sense of causation requires that free standing causal principles or additional causal interpretations be added to basic scientific laws is also unnecessary. Again, laws and governing generalization along with initial and boundary conditions, as ordinarily understood and without any need for supplementation are all that is required.

There is of course another strategy for attempting to make sense of various asymmetries we find in the world (entropic and otherwise, including causal asymmetries). This agrees that we need initial and boundary conditions (or at least what looks like these) as well as more familiar laws to generate the asymmetries. However this strategy appeals to a single boundary-like condition which is imposed just once on the early universe. This is the Past Hypothesis (e.g. Albert, 2000), according to which the very early universe was in a state of very low entropy. For reasons having to do both with space and my own competence, I will not discuss this strategy here. However, I do wish to note that it differs from the considerations to which VRI and CSI appeal. The latter appeal to facts about the “local” initial and boundary conditions characterizing specific typically small systems—flagpoles, gases in cylinders with pistons that may or may not be movable and so on, rather than to some global cosmological condition. This is not intended as a criticism of the past hypothesis but it does underscore that appealing to it is different from the considerations explored in this essay.

Another way of putting this general idea about where causation is “located” (or at least often located) is as follows: to the extent that laws and other governing generalizations are expressed in differential equations, causation is not “in” these equations taken alone but rather in the solutions to those equations which arise when we combine them with specific assumptions about initial and boundary conditions.

As an additional illustration consider again the contrast between the case in which diverging electromagnetic waves are emitted by an accelerating charge and a case in which a coherent spherically symmetric wave comes in from infinity and converges exactly on the charge. The difference between these two scenarios does not fall out of Maxwell’s equations themselves but instead also has to do with the different initial and boundary conditions characterizing the two scenarios. In the diverging wave scenario, if the charge begins accelerating at t0, it is common to assume that the relevant boundary conditions at infinity (or at some considerable distance from the charge) are that there is no electromagnetic radiation at t0 or at earlier times. In the converging wave scenario, by contrast, the boundary conditions involve a coherent wave converging on the charge at some time prior to t0. This asymmetry, combined with Maxwell’s equations themselves, gives rise to the different causal judgments we make about the two scenarios—in the first, the accelerating charge causes the diverging wave, in the second the arrival of the converging wave causes the charge to accelerate.

Of course it is true that the scenario with the converging wave rarely occurs while the diverging wave produced by the accelerating cause is more common. As I see it, this reflects the sorts of considerations that underlie CSI—the idea that causal independence leads to statistical independence. Absent some special contrivance, production of a coherent incoming wave would require a very precise pattern of statistical dependence or coordination among causally independent sources and hence is very unlikely although not impossible. By contrast, when additional fields are absent, it is not surprising that distinct segments of the wave front of an outgoing wave are correlated because this can be traced to a common cause (the accelerating charge). It is for this reason that if we are given a snapshot of the charge as it begins to accelerate and another snapshot of the coherent wave at some distance from the charge and no information about which occurred first and asked to infer which of these is the cause and which the effect, we can confidently infer that the acceleration of the charge caused the wave rather than vice-versa. This reasoning is very similar to the other examples of reasoning about causal direction described earlier in this paper.

According to this interpretation, the diverging, outgoing wave scenario and the converging incoming wave scenario describe distinct physical processes. The physical basis for difference between the scenarios is not to be found in the law governing the scenarios which is the same for both but rather in the facts involving the different initial and boundary conditions that characterize the scenarios. Some writers (e.g., perhaps Price and, if I have understood him correctly Farr, 2020) claim on the contrary, that the two scenarios do not really correspond to different possibilities—the account in terms of the accelerating charge causing the outgoing wave and the account in terms of the converging wave causing the acceleration are just different, equivalent descriptions of the same situation.

I suspect that one of the main reasons why the contribution of initial and boundary conditions to causal direction has been missed is that such conditions are widely thought by philosophers to be modally inert and lacking anything relevant to causal content. Since causal claims, including claims about causal direction, presumably have modal content, it is natural to think that this content must be supplied entirely by laws or c-generalizations. The mistake in this reasoning is the assumption that facts about initial and boundary conditions and relations among these are modally inert. That this is a perhaps most obvious in connection with examples like the gas in a cylinder in which it is specified that the volume of the container can or cannot change. But it is also true that independence assumptions like CSI and VRI carry modal commitments. When it is assumed that different variables, used to specify the values of initial conditions, can change independently of one another, these claims have modal content. Similarly for claims about the independence of various generalizations across changes in initial conditions. Thus both claims about initial and boundary conditions and how these relate to laws as well as the laws themselves carry modal commitments.

My discussion so far has focused on causal directionality and directionality in causal explanations. Recently there has been an upsurge of interest in non-causal explanations of various sorts. Let us assume, for the sake of argument that are such explanations or at least that this is a possibility worth taking seriously. Against this background, the question of whether such explanations have directional or asymmetric features and if so, how we should understand these, becomes important. One way of motivating this question is to note that, however in detail this is understood, causation clearly has directional features. But if an explanation is non-causal, then if it has directional features, these can’t be causal in character. They must instead be understood in some other way. This in turn suggests an argument against the very possibility of non-causal explanation. Suppose that explanation of any kind must be asymmetric—if X explains (causally or non-causally) Y, then Y cannot also explain X.

One way of responding to this argument is to deny that explanation must (always) be asymmetric. However, a number of the most plausible examples of non-causal explanation in the literature do appear to have a distinctive direction (see below). Thus the issue of how if at all these directional features might be understood arises in a natural way—indeed an account of this seems to be required if we are to make sense of many of the supposed examples of non-causal explanation.

In this section I want to briefly explore the possibility of providing such an account by extending the claims developed in previous sections. My basic idea is that in a number of cases the directional features of non-causal explanations can be understood in terms of generalizations or extensions of the ideas about independence and its relation to directionality described previously. I will consider two examples—my treatment of them will be somewhat different but will share a common core.

One plausible candidate for a non-causal explanation is Euler’s graph theoretical explanation of why it is impossible to traverse the bridges of Königsberg via a continuous path in which each bridge is crossed exactly once (an Eulerian path). I will call this explanandum the transversability of the bridges, represented by a variable T that can take two values depending on whether or not the bridges are transversable. Since the Königsberg example has been extensively discussed, I will assume that it is unnecessary to provide details. Suffice it to say that Euler identified a graph theoretical feature F which he proved to be necessary and sufficient for an Eulerian path to exist—the absence of this feature F implies that no Eulerian path exists and hence T has the value= non-transversable. The arrangement of bridges in Königsberg does not possess the feature F. If we let E be a two valued variable representing whether feature F is present and assume for the sake of argument we are dealing with an explanation of some kind, one has the strong intuition that it is the graph theoretical feature E that explains T rather than vice-versa. In my (2018a) I argued that this directionality could be understood in terms of the following consideration: although the explanation of T in terms of E is non-causal, there is a straightforward causal explanation for whether one value or another of E holds—this has to do with the intentions and behavior X of those who constructed the bridges.

In a recent paper Lange (forthcoming) criticizes this suggestion, claiming that there is nothing in the interventionist account that rules out the possibility that X causally explains E while T non-causally explains E. I agree with Lange that my argument rests on additional assumptions about how non-causal explanation work and how these interact with causal explanations. Let me try to make these explicit. I argued above that in a structure in which X is an intervention-like cause of Y (so that X and Y are statistically dependent), Y and Z are statistically dependent and X and Z are statistically dependent (where the intervention-like character of X is understood to rule out the possibility of confounding by additional common causes (no W that is a common cause of Y and Z etc.), it is reasonable to conclude that the causal direction runs from Y to Z rather than from Z to Y. The contrary conclusion—that Z causes Y—does not explain why X and Z are dependent and instead postulates two independent causes of Y that happen to be correlated with each other, but where no explanation is provided for this correlation. My suggestion is that in the absence of some specific reason to think otherwise, it is reasonable to assume that structures that involve both causal and non-causal explanations will obey a similar principle. That is, if, in the Konigsburg bridge example, X (the intentions of the builders) causes E and E and T and X and T are statistically dependent as they clearly are, then, at least in the absence of some further explanation of these dependencies, we should infer that the direction of non-causal explanation runs from E to T rather than conversely. (I will say more shortly about the qualification introduced by the italicized phrase.) The contrary assumption—that E has two explanations, one in terms of X that is causal and the other in terms of T that is non-causal but where X and T just happen to be correlated even though no explanation is provided for this fact—is less plausible.

What about the italicized phrase above? This qualification is necessary because it does seem possible that an explanandum M might have two explanations, one, E1, that is causal and the other, E2, that is non-causal.

This reasoning rests on the assumption that reasoning about directionality in non-causal explanation obeys, in the respect described, a similar principle to that employed in reasoning about causal directionality. Of course this assumption may be wrong but (i) it yields what most suppose to be the “right” answer in this case (as well as in a range of other cases of alleged non-causal explanation) and (ii) there is a rationale for the assumption when it is understood as an extension of a principle that applies to causal explanation. Someone who wishes to deny the assumption owes an account of non-causal explanation that shows why the assumption fails.

A second putative example of non-casual explanation, discussed far more tentatively in Woodward (2018a) concerns the explanation of the stability (of perhaps the possible stability) of the planetary orbits in terms of the three dimensionality of space, in conjunction with assumptions about the form of the gravitational potential in spaces of different dimensions (that this involves a generalization of Poisson’s equation) and Newton’s laws of motion. Given the latter assumptions it can be shown that the orbits will be unstable in spaces of dimensionality greater than three, so that there is a sense in which the stability of the orbits appears to depend on the dimensionality of space. Woodard (2018a) suggested that if one finds it plausible that this is an explanation (and thus that the correct direction doesn’t run instead from the stability of the orbits to the dimensionality of space), this is likely because one is willing to make certain independence assumptions that parallel those that we make in the case of causal explanation. In particular one assumes that (i) Newton’s laws of motion and the form for a generalized gravitational potential in an n-dimensional space are independent of (ii) the dimensionality of the space in the sense that (i) and (ii) can vary independently of each other. (This is the non-causal analog of the idea that the causes of an effect should be capable of varying independently of each other.) We appeal to this independence assumption when we argue, as envisioned in the explanation above, that if the dimensionality of space had been different from three, Newton’s laws of motions and the form of the gravitational potential would have been the same. It is this assumption about independence, I claim, which allows us to give content to the contention that the correct direction of explanation runs from spatial dimensionality to stability.

As noted earlier, many philosophers have attempted to connect asymmetries associated with causal direction with issues having to do with thermodynamic asymmetries, entropy increase, the supposed need for a “past hypothesis” and the direction of time. The assumption seems to be that getting clear about these (broadly) “entropic” issues is required for an understanding of the directional features of causation. I certainly don’t want to question the interest and value of developing accounts of these entropic issues. Nor do I claim that they have nothing to do with the independence features on which I have focused. On the contrary, I think the independence features are closely bound up entropic behavior. I want to suggest, however, that it is worth considering the possibility that the connection between causal and thermodynamic asymmetries may take a different form than is commonly supposed by philosophers. Rather than (or perhaps in addition to) thermodynamic/entropic asymmetries providing a sort of ground or basis for causal asymmetries (with the former being more fundamental) it may be instead that both asymmetries (thermodynamic and causal) at least in part derive from (or have a common source in) facts about independence and the absence of special kinds of tuning but where the most natural way of expressing these facts employs causal language.

The reader may well wonder why, with all of this illustrious help, this paper is not a lot better. This is a causal inference problem and the answer is the obvious one.

I will add that to my ear, talk of what causation or causal direction “consists in” or what “constitutes” them sets up the expectation that there is some “material” or “stuff” out of which these are “composed”. Such questions about constitution make sense in many cases (e.g., one can sensibly ask what gold consists of) but my view is that causation and causal direction are not like this. Instead we need to understand them functionally: what causal relations have in common is that they support various kinds of manipulation and control, rather than that they are all composed of the same kind of stuff. Let me also emphasize, in response to an issue implicitly raised by a referee for this paper, that in saying that “cause” is not like “gold” I do not mean to espouse anti-realism about causal claims or about claims concerning causal direction. I endorse realism about causal claims if one understands by such realism the position that causal claims including claims about causal direction, as well as claims about what would happen if various interventions were to occur, are true or false and which they are depends on what the world is like. I deny that this kind of realism requires that we identify some stuff out of which causal relations are composed. Woodward (forthcoming a) calls this minimal realism.

In this connection it is also worth noting an obvious trade-off. An advantage of using assumptions like Causal Markov and Faithfulness is that one does not need to restrict the hypothesis space in the way I have above. On the other hand, if we do restrict the hypothesis space we can get by with assumptions weaker than CMC and Faithfulness. I don’t think that either strategy is necessarily better than the other—it depends on what you think that you know. In general, the machine learning strategies I discuss proceed in part by restricting the hypothesis space (e.g., by restricting the functional forms considered or assuming the absence of confounding). This allows for results that would not be possible without such restrictions.

Let me also add that what the argument in the text above shows is that if Pr(Y/X) is invariant under some change in Pr(X), then for the associated change in Pr(Y) implied by this change in Pr(X), Pr(X/Y) will not be invariant under this change in Pr(Y). In other words, one and the same change to the joint distribution Pr(X,Y) cannot be a case in which Pr(Y|X) is invariant across the change in Pr(X) and also be a case in which Pr(X|Y) is invariant across the change in Pr(Y). However, it remains possible that Pr(X|Y) is invariant under some changes in Pr(Y) and Pr(Y|X) is invariant under some other changes Pr(X), involving a different change in the joint distribution. If we are willing to also assume that there are just two possible alternatives—either (i) Pr(Y/X) is invariant under all changes within some range of values of Pr(X) or (ii) Pr(X/Y) is invariant under the associated range of changes in Pr(Y), then the argument above establishes that only one of these alternative holds. Many thanks to Jiji Zhang for helpful correspondence regarding this point and for correcting a misinterpretation of mine.

It will perhaps help to add another related way of making these points which I owe to David Wallace. We can see that the time-reversed reassembling vase system involves different causal relationships from the original breaking vase by noting that the relations to other systems are different in the two cases. In particular in a real situation in which the shattered vase reassembles, the relations to other systems that must be present will include many forces and causes that are different from those that are present in the original non-time reversed system and that act on the shards in a coordinated way. Of course it might be replied that these new forces can themselves be generated by the appropriate additional time reversals of additional variables and so on. If followed through consistently this seems to lead to the application of the time reversal operation to the whole universe. Here I am inclined to think (following what I take to be Wallace’s view) that such a time reversal of the entire universe will not differ in any clear way from the actual universe. However, this is consistent with its being the case that the time reversal of subsystems of the universe will differ in their causal structure from non- time reversed systems. It is causal direction in such subsystems which is the focus of this paper.