The Pulsar Cafe    ·

The Logic of Science: 4.2

Problem 4.2 of E.T. Jaynes’ Probability Theory: The Logic of Science:

Calculate the exact threshold of skepticism \(f_t(x, y)\), supposing that proposition \(C\) has instead of \(10^{−6}\) an arbitrary prior probability \(P(C|X) = x\) , and specifies instead of 99100 an arbitrary fraction y of bad widgets. Then discuss how the dependence on \(x\) and \(y\) corresponds – or fails to correspond – to human common sense. Hint: In problems like this, always try first to get an analytic solution in closed form. If you are unable to do this, then you must write a short computer program which will display the correct numerical values in tables or graphs.

There is some context behind this one, so I’ll wait while you go read sections 4.2 and 4.4.

$$ \dots $$

If the ratio of defective to good widgets stays at some constant value as the number of tested widgets goes to infinity, no evidence will accumulate for or against proposition \(C\). This is what Jaynes means by “threshold of skepticism.” If the observed ratio of defective to good goes above this threshold, a Bayesian agent will believe \(C\). And if the ratio goes below the threshold, a Bayesian agent will believe \(\overline{C}\).

Using the equations explained in this chapter, how do we model the scenario where no evidence accumulates, in either direction, for \(C\)?1

Here is the evidence equation for \(C\), conditional on a proposition \(D\) \(\equiv\) “\(n\) widgets were tested and \(n_d\) were found to be defective.”

$$ e(C|DX) = e(C|X) + 10\log_{10}\bigg[ \frac{P(D|CX)}{P(D|\overline{C}X)} \bigg]. $$

The first term is the prior evidence, and is independent of what we find out by testing widgets. We can consider it constant. The ratio inside the logarithm is what we care about, as that value is what gets updated every time we pick a widget out of the box and test it. As more defective widgets are found, the numerator of that ratio increases, and therefore positive values get added to our model of the evidence for \(C\). As more good widgets are found, the denominator will dominate that fraction, and negative values will be added to our model of the evidence for \(C\).

There is, however, one value the ratio can take such that our models of the evidence for both \(C\) and \(\overline{C}\) won’t change. If \(\frac{P(D|CX)}{P(D|\overline{C}X)} = 1 \), the logarithm will evaluate to zero, and no evidence will be added or subtracted.

Let’s expand this ratio and see what we’re working with.

For the numerator, Jaynes explains why we can use the binomial sampling distribution.2 If we let \(y\) equal the fraction of bad widgets specified by \(C\), \(f_d\) equal the fraction of tested widgets that are defective, and \(n\) equal the number of widgets tested so far, then

$$ P(D|CX) = \binom{n}{n_d} y^{f_dn}(1-y)^{n - f_dn}. $$

For the denominator, Jaynes gives the expansion in \((4.35)-(4.39)\). It’s just two applications of the product rule. I go one step further here and expand the probability notation, putting the whole thing in terms of \(n\), \(f_d\), and \(x\), the prior probability of \(C\).

$$ \begin{equation} \begin{split} P(D|\overline{C}X) & = P(D|X)\frac{P(A + B|DX)}{P(A + B|X)} \newline & = \frac{P(D|X)P(A|DX) + P(D|X)P(B|DX)}{P(A + B|X)} \newline & = \frac{P(A|X)P(D|AX) + P(B|X)P(D|BX)}{P(A + B|X)} \newline & = \frac{(1-x)\big(\frac{1}{11}\big)\binom{n}{n_d}\big(\frac{1}{3}\big)^{f_dn}\big(\frac{2}{3}\big)^{n - f_dn} + (1-x) \big(\frac{1}{10}\big)\binom{n}{n_d}\big(\frac{1}{6}\big)^{f_dn}\big(\frac{5}{6}\big)^{n-f_dn}}{(1-x)\big(\frac{1}{11}\big) + (1-x)\big(\frac{1}{10}\big)}. \end{split} \end{equation} $$

Putting the two expansions together, and simplifying by crossing out the terms common across the fraction,

$$ \begin{equation} \begin{split} \frac{P(D|CX)}{P(D|\overline{C}X)} & = \binom{n}{n_d}y^{f_dn}(1-y)^{n - f_dn} \cdot \bigg[ \frac{(1-x)\big(\frac{1}{11}\big) + (1-x)\big(\frac{1}{10}\big)}{(1-x)\big(\frac{1}{11}\big)\binom{n}{n_d}\big(\frac{1}{3}\big)^{f_dn}\big(\frac{2}{3}\big)^{n - f_dn} + (1-x)\big(\frac{1}{10}\big)\binom{n}{n_d}\big(\frac{1}{6}\big)^{f_dn}\big(\frac{5}{6}\big)^{n-f_dn}} \bigg] \newline & = y^{f_dn}(1-y)^{n - f_dn} \cdot \frac{\binom{n}{n_d}}{\binom{n}{n_d}} \frac{(1-x)}{(1-x)}\bigg[ \frac{\big(\frac{1}{11}\big) + \big(\frac{1}{10}\big)}{\big(\frac{1}{11}\big)\big(\frac{1}{3}\big)^{f_dn}\big(\frac{2}{3}\big)^{n - f_dn} + \big(\frac{1}{10}\big)\big(\frac{1}{6}\big)^{f_dn}\big(\frac{5}{6}\big)^{n-f_dn}} \bigg] \newline & = \frac{y^{f_dn}(1-y)^{n - f_dn}}{\big(\frac{1}{11}\big)\big(\frac{1}{3}\big)^{f_dn}\big(\frac{2}{3}\big)^{n - f_dn} + \big(\frac{1}{10}\big)\big(\frac{1}{6}\big)^{f_dn}\big(\frac{5}{6}\big)^{n-f_dn}}. \end{split} \end{equation} $$

Now we set this expression equal to one,

$$ \frac{y^{f_dn}(1-y)^{n - f_dn}}{\big(\frac{1}{11}\big)\big(\frac{1}{3}\big)^{f_dn}\big(\frac{2}{3}\big)^{n - f_dn} + \big(\frac{1}{10}\big)\big(\frac{1}{6}\big)^{f_dn}\big(\frac{5}{6}\big)^{n-f_dn}} = 1, $$

and get an equation in terms of the variable we want to solve for, \(f_d\).

This is the part where it gets awkward. I have no idea how to solve this equation for \(f_d\). Maybe there is a way, but I don’t know it, nor am I inclined to spend that much time on it. I feel like I have spent enough time thinking about this problem to understand the qualitative parts well, and I’m ready to move on.

Luckily, Jaynes gave me a way out via the hint:

In problems like this, always try first to get an analytic solution in closed form. If you are unable to do this, then you must write a short computer program which will display the correct numerical values in tables or graphs.

Since I am unable to find a closed form solution, the next best things is having a computer try values until the equation is satisfied to some error threshold. Here is my script that solves for \(f_d\). I’m relatively confident in its correctness, since it outputs 0.7932299999990248 for \(y = \frac{99}{100}\), which is different from the value Jaynes’ gives by 7 parts in 10,000. (That’s reasonable, right?)

The values for threshold and n I found experimentally.

#!/usr/bin/env python3

import matplotlib.pyplot as plt

threshold = 0.05
n = 500


def foo(y, n, nb):
    numerator = y**nb * (1-y)**(n-nb)
    denominator = ((1/11) * (1/3)**nb * (2/3)**(n-nb)
                   + (10/11) * (1/6)**nb * (5/6)**(n-nb))

    if denominator == 0:
        denominator = 1e-300

    return numerator / denominator


def f(y):
    global n, threshold

    f = 0.0
    while f <= 1:
        nb = f * n
        ft = foo(y, n, nb)

        if abs(ft - 1) < threshold:
            return f

        f += 0.00001


xs = range(100)
ys = [f(i/100) for i in xs]

plt.plot(xs, ys)
plt.show()

I also plotted \(f_t\) as a function of the defective rate expected by proposition \(C\). I’m not so sure what to make of it, however. There is a clear change in behavior at \(y = \frac{34}{100}\). This could be because proposition \(B\) specifies \(\frac{1}{3}\) of widgets will be defective. Until \(C\) specifies a greater fraction than this, \(f_t\) will be a regular old linear function, because \(C\) is not the most extreme proposition of the three. Immediately after that, more evidence is required to turn the Bayesian agent in \(C\)’s direction.

But I don’t have a technical explanation for that.



1: I was stuck on this problem until I found this math stack exchange post. Many thanks to thenoviceoof for answering a three and a half year old question, and putting me on the right track.

2: It’s because we are assuming the number of widgets tested is far fewer than the number of widgets in the box.

comments powered by Disqus