# The Logic of Science: 2.2

Problem 2.2 of E.T. Jaynes’ *Probability Theory: The Logic of Science*:

Now suppose we have a set of propositions \( \{A_i, …, A_n\} \) which on information \( X \) are mutually exclusive: \( p(A_iA_j|X) = p(A_i|X) {\delta}_{ij} \). Show that \( p(C|(A_1 + \dots + A_n)X) \) is a weighted average of the separate plausibilities \(p(C|A_iX) \).

$$\begin{equation} \begin{split} p(C|(A_1 + \dots + A_n)X) & = p(C|A_1X + A_2X + \dots + A_nX)

& = \frac{\sum_i{p(A_i|X)p(C|A_iX)}}{\sum_i{p(A_i|X)}} \end{split} \end{equation} \tag{1}\label{1} $$

## Part I: The Proof

If we treat the disjunction \((A_1 + \dots + A_n)\) as a single proposition, we can expand the leftmost expression using the product rule.

$$ \textrm{Let } B = (A_1 + \dots + A_n) $$

$$ p(C|(A_1 + \dots + A_n)X) = p(C|BX) = \frac{p(CB|X)}{p(B|X)} \tag{2} $$

The ratio we get from that is starting to look something like the ratio of sums that we need. Let’s take the numerator and denominator and expand them separately, plugging the initial disjunction back in for \(B\).

### Numerator

$$ \begin{equation} \begin{split} p(CB|X) & = p(C(A_1 + \dots + A_n)|X) \newline & = p(CA_1 + CA_2 + \dots + CA_n|X) \newline & = p(CA_1 + C(A_2 + \dots + A_n)|X) \newline & = p(CA_1|X) + p(C(A_2 + \dots + A_n)|X) - p(CA_1(A_2 + \dots + A_n)|X) \newline \end{split} \end{equation} \tag{3}\label{3} $$

We can use the product rule to expand the first term in the last line above.

$$ p(CA_1|X) = p(A_1|X)p(C|A_1X) \tag{4} $$

This is the format of terms in the numerator of \(\eqref{1}\). Now, the second term in the last line of \(\eqref{3}\) looks a lot like the righthand side expression on the first line. Except that the disjunction starts with \(A_2\) instead of \(A_1\). Expanding that expression in the same way as \(\eqref{3}\), we get

$$ p(CA_2|X) + p(C(A_3 + \dots + A_n)|X) - p(CA_2(A_3 + \dots + A_n)|X) $$

This looks a lot like the last line of \(\eqref{3}\), and it shouldn’t be hard to see how this can be expanded all the way down to

$$ p(CA_{n-1}|X) + p(CA_n|X) - p(CA_{n-1}A_n|X) $$

If every such expansion was written out, and the minus terms left out, we have our sum.

$$ p(CA_1|X) + p(CA_2|X) + \dots + p(CA_n|X) = \sum_i{p(A_i|X)p(C|A_iX)} $$

And it turns out we *can* leave out the minus terms. Because for every
expansion where \(A_i\) is the lowest indexed term, the minus term is zero due
to the constraint that propositions in \(\{A_i, …, A_n\}\) are mutually
exclusive given \(X\).

$$ \begin{equation} \begin{split} p(CA_iA_{i+1} + \dots + CA_iA_n|X) & = p(CA_iA_{i+1}|X) + p(CA_iA_{i+2} + \dots + CA_iA_n|X) - p(CA_iA_{i+1}(CA_iA_{i+2} + \dots + CA_iA_n)|X) \newline & = p(C|X)p(A_iA_{i+1}|X) + \dots \newline & = 0 + \dots \end{split} \end{equation} $$

Therefore, \(p(CB|X) = p(C(A_1 + \dots + A_n)|X) = \sum_i{p(A_i|X)p(C|A_iX)}\)

### Denominator

$$ \begin{equation} \begin{split} p(B|X) & = p(A_1 + \dots + A_n|X) \newline & = p(A_1 + (A_2 + \dots + A_n)|X) \newline & = p(A_1|X) + p(A_2 + \dots + A_n|X) - p(A_1(A_2 + \dots + A_n)|X) \newline & = p(A_1|X) + p(A_2 + \dots + A_n|X) - p(A_1A_2 + \dots + A_1A_n|X) \end{split} \end{equation} \tag{5}\label{5} $$

In order to get the desired sum, we can expand the second term in the last line of \(\eqref{5}\) using the same steps as \(\eqref{5}\). The same expansion will work all the way up until

$$ p(A_{n-1}|X) + p(A_n|X) - p(A_{n-1}A_n|X). $$

And the minus terms will all equal zero by the same argument as used to derive the numerator.

And so, \(p(B|X) = p(A_1 + \dots + A_n|X) = \sum_i{p(A_i|X)}\)

### Conclusion

Putting together the results from the expansions of the numerator and denominator, we get the weighted average asked for in the problem.

$$ p(C|(A_1 + \dots + A_n)X) = p(C|BX) = \frac{p(CB|X)}{p(B|X)} $$

$$ \frac{p(CB|X)}{p(B|X)} = \frac{\sum_i{p(A_i|X)p(C|A_iX)}}{\sum_i{p(A_i|X)}} $$

## Part II: What Does It Mean?

I’ve only been able to map this into one “real-world” situation. The way I understand it is pretty general, but there could be other ways of looking at it that I’m not seeing.

It seems to me the expression \(P(C|(A_1 + \dots + A_n)X)\) is about the probability that an object has a certain characteristic, given it belongs to a certain group. I’ll try to make this clear through example.

The exercise doesn’t specify this, but let’s assume the set of propositions are not only mutually exclusive, but exhaustive too. Which means for some \(0 \leq i \leq n\), \(P(A_i|X) = 1\). In english: the information contained in \(X\) is enough to choose the true proposition from among the set of \(A_i\)’s. But the expression given in the exercise is not, in the end, about the truth of any of the \(A_i\)’s. It’s about some other proposition, \(C\), after we have information \(X\) and therefore know which \(A_i\) is true.

For example, say we are talking about a star, designated J2-2. We want to know
if the mass of J2-2 falls within 0.5 and 1.5 solar masses. This is the “certain
characteristic.” Following Jaynes’ advice,^{1} we set up a probability
calculation to determine the probability that the proposition

$$ C \equiv \textrm{“J2-2 is between 0.5 and 1.5 solar masses”} $$

is true. Namely,

$$ P(C|Z), $$

where \(Z\) encodes all of our prior knowledge, but isn’t necessarily relevant to our question about J2-2.

In this contrived scenario, our circumstance is such that we can accurately measure the effective temperature of J2-2. Which means, by the theory of stellar classification, that we can place J2-2 into a spectral class, which are mutually exclusive given effective temperature.

Before we take this temperature measurement, our knowledge about J2-2 is just
that it will fit *some* spectral type. This belief state can be represented by
the disjunction of the propositions

$$ \begin{equation} \begin{split} A_1 & \equiv \textrm{“J2-2 is an M-type star”}, \newline A_2 & \equiv \textrm{“J2-2 is a K-type star”}, \newline A_3 & \equiv \textrm{“J2-2 is a G-type star”}, \newline A_4 & \equiv \textrm{“J2-2 is an F-type star”}, \newline A_5 & \equiv \textrm{“J2-2 is an A-type star”}, \newline A_6 & \equiv \textrm{“J2-2 is a B-type star”}, \newline A_7 & \equiv \textrm{“J2-2 is an O-type star”}. \newline \end{split} \end{equation} $$

So our probability calculation becomes

$$ P(C|(A_1 + A_2 + A_3 + A_4 + A_5 + A_6 + A_7)Z), $$

which is really the same thing as \(P(C|Z)\). But the expansion will matter once we take our temperature measurement, which is encoded in the proposition

$$ X \equiv \textrm{“J2-2 has an effective temperature of 5,500K”}. $$

With information \(X\) in hand, our calculation becomes

$$ P(C|(A_1 + A_2 + A_3 + A_4 + A_5 + A_6 + A_7)XZ) = P(C|A_3XZ), $$

because the effective temperature 5,500K falls within the range held only by G-type stars.

Now, given the information that J2-2 is a G-type star, we can make a reasonable estimate of the probability its mass lies within the range in question. From a paper I found in the Encyclopedia of Astronomy and Astrophysics:

It is inappropriate to speak of a spectral type to mass relationship for higher-mass stars: stellar evolution results in a progression from higher effective temperatures to cooler during the core-H-burning lifetime, and during this evolution stars of different masses will pass through a particular spectral type ‘stage’. For lower-mass main-sequence stars this is not true, and there is only a slight change of spectral type with evolution (i.e. little change of the effective temperature). For example, a star which is spectroscopically classified as ‘O4 V’ star may be a zero- age 60M⊙ star, or a slightly older (0.5 Myr) 85M⊙ star, but all stars of spectral type G2 V will have a mass roughly that of the Sun.

And so, as a function of how much we trust official-looking PDF’s on the internet, we can assign a value to \(P(C|A_3XZ)\).

1: *Probability Theory: The Logic of Science*, *p. 86*

To form a judgement about the likely truth or falsity of any proposition A, the correct procedure is to calculate the probability A is true:

$$ P(A|E_1E_2\dots) $$

conditional on all the evidence at hand.