Chance and Proof: The Probabilistic Method

Preface

Inspired from the courses Math 159, Phil 162, and Phil 184P.

A Hard Problem

Let’s introduce some definitions so understand what problem we’re trying to answer.

A complete graph $K_n$ on $n$ vertices is a graph where every pair of distinct vertices is connected by an edge. Here is an example of a $K_5$ graph.

K_5 graph
$K_5$ Graph

A tournament $T_n$ is a complete graph $K_n$ together with an orientation of each edge. Here is a $T_5$ tournament.

K_5 graph
$T_5$ Graph

A Hamilton Path on an oriented graph is an oriented path which uses every vertex of the graph. Here is one example for the $T_5$ above.

Hamilton Path on $T_5$
Hamilton Path on $T_5$

Now, here’s the problem: What is the largest number of Hamilton paths in a tournament?

If you think about it, this gets very messy very quickly. There are ${n\choose 2}$ total edges in a $K_n$ graph, and since each edge can be oriented in two ways, there are $2^{n\choose 2}$ total possible tournaments $T_n$ on a $K_n$. Even for a $T_5$, that’s $2^{10} =1024$ tournaments you would have to check, and then count the number of Hamilton Paths you find in each one. Aside from rejecting all isomorphic tournaments, there doesn’t seem to be any nice trick we can use to make our lives any easier. The number of Hamilton Paths we find is extremely sensitive to the orientation of even a single edge, because whether the Hamilton path exists or not is contingent on a long chain of edges being oriented correctly.

A lower bound solution to this problem comes from Szele, who used the probabilistic method to show there is a tournament $T_n$ with $\frac{n!}{2^{n-1}}$ Hamilton Paths. The proof is really quite nice.

Proof: Take a $K_n$ and orient each edge independently and uniformly at random; call the resulting tournament $T_n$. Let $S_n$ denote the set of all permutations of the $n$ vertices. $|S(n)|=n!$ since there are at first $n$ vertices to choose from, then $n-1$ vertices, and so on. Notice that each permutation of vertices is either a Hamilton path or it isn’t. For each permutation $\pi=(v_1,\dots,v_n)\in S_n$, define an indicator variable $X_\pi$ by

$$ X_\pi = \begin{cases} 1 & \text{if } v_1 \to v_2 \to \cdots \to v_n \text{ is a Hamilton path}, \\ 0 & \text{otherwise.} \end{cases} $$

Let

$$ X = \sum_{\pi \in S_n} X_\pi, $$

so $X$ is the number of Hamilton paths in $T_n$. For a fixed permutation $\pi=(v_1,\dots,v_n)$, the event $X_\pi=1$ occurs when the $n-1$ edges are all oriented correctly, i.e. we have

$$v_1 \to v_2 \to \dots \to v_n$$

Since the edges are oriented independently and each direction occurs with probability $1/2$,

$$ \mathbb E[X_\pi] = \Pr(X_\pi=1) = \left(\frac12\right)^{n-1} $$

By the linearity of expectation,

$$ \mathbb E[X] = \sum_{\pi\in S_n} \mathbb E[X_\pi] = n!\left(\frac12\right)^{n-1} = \frac{n!}{2^{n-1}}. $$

Since $\mathbb E[X]$ is the average value of $X$ over all tournaments, it cannot be that every tournament has fewer than $\frac{n!}{2^{n-1}}$ Hamilton Paths. Therefore, there exists a tournament with at least $\frac{n!}{2^{n-1}}$ Hamilton paths. $\square$

If you’re like me, you might be suspicious about this proof. The probabilistic method is compressing two different claims into a single move. First, there is a probabilistic claim about what happens under a random procedure. Second, there is an existential claim that some mathematical object must exist if the probability is strictly greater than zero or through an argument about the expected value.

Consider the $T_5$ case, which tells us there are at least $\frac{5!}{2^{4}} = \lceil 7.5 \rceil=8$ Hamilton paths. Great—but what are they? Show me them! But the probabilistic method cannot tell us this. At least for the $T_5$ example we’ve been working with, we can find the Hamilton paths by brute force—in fact, there are a maximum of 15 Hamilton Paths in a $T_5$.

Which looks like this:

Hamilton Path on $T_5$
All Hamilton Paths on $T_5$

This weirdness gets even worse because we can use the probabilistic method to claim existence with probability $1-o(1)$, that is, with probability tending to 1 (asymptotically almost surely) as $n$ goes to infinity. Take the following problem:

Let $G\sim\mathcal{G}(n,1/2)$ be an Erdos-Renyi random graph and let $M$ be its number of edges. Prove that there exists a constant $C>0$ such that, with probability $1-o(1)$, $G$ has no bipartite subgraph with more than $M/2+CM^{3/4}$ edges.

which can be shown using Chebyshev’s inequality and Chernoff bounds. It seems strange that the proof claims almost every graph has this property and yet cannot show us one example of a graph that has this property.

Questions

The logical argument establishing why we can use the probabilistic method works like this:

If every object in a collection of objects fails to have a certain property, then the probability that a random object chosen from the collection has that property is zero. Thus, by contraposition, if the probability that a random object chosen from the collection has that property is nonzero, then some object in the collection must possess the property.1

Consider two wrinkles.

First, probability isn’t actually doing much work. There is nothing contingent about the “random” tournament on $n$ vertices; the space is fully specified, and the probabilities are analytic consequences of how it is constructed. What Frank Ramsey called the “logic of inconclusive argument” is being used here to produce conclusive proofs, which is a strange role reversal worth thinking about. The probabilities aren’t really representing uncertainty about anything.

Second, the proof tells us the object exists but does not show us what it is. This feels more like theology rather than mathematics.2 The constructivist/intuitionist tradition has been objecting to proofs like this for a century. The contrapositive at the heart of the method relies on the law of excluded middle applied to a quantified statement.

To see this, let $C$ be a collection of objects and $P(x)$ the property in question. The probabilistic method’s crucial step is

$$\lnot(\forall x \in C\ \lnot P(x)) \to (\exists x \in C\ P(x)),$$

which roughly says: if it is not the case that every object lacks property $P$, then some object has property $P$. In classical logic, the two sides are interchangeable. Moving between the two statements requires the law of excluded middle applied to $\exists x \in C\ P(x)$ itself: either such an $x$ exists, or every $x$ fails to satisfy $P$; the latter is ruled out by assumption, so the former must hold. Critics point to the Liar’s Paradox for why the law of the excluded middle leads to self-contradiction.


  1. Wikipedia, Probabilistic Method ↩︎

  2. As Paul Gordan putatively said. ↩︎