Entropy Estimation via Two Chains: Streamlining the Proof of the Sunflower Lemma

The sunflower lemma describes an interesting combinatorial property of set families: any large family of small sets must contain a large sunflower—a sub-family consisting of sets A_1,\ldots,A_r with a shared core S:=A_1\cap\cdots\cap A_r and disjoint petals A_1\backslash S,\ldots,A_r\backslash S. (See the figure below for an example and a non-example of sunflowers.) The application of the lemma in theoretical computer science dates back to the influential paper by Razborov in 1985 that established the first super-polynomial monotone circuit lower bound for a function in NP, and since then the lemma has been applied broadly to other problems in theoretical computer science (see STOC version of Alweiss-Lovett-Wu-Zhang for a discussion of applications of the lemma in computer science).

Left: 3 sets forming a sunflower. Right: 3 sets NOT forming a sunflower.

The lemma was first proved by Erdős-Rado in 1960, who gave the quantitative bound that among (r-1)^kk!+1 distinct sets each of size at most k one can find r sets forming a sunflower. Erdos-Rado conjectured that the k!=k^{k(1 - o(1))} = k^{\Omega(k)} in the bound could be significantly improved to O(1)^k. For nearly 60 years since the Erdős-Rado upper bound, all known upper bounds had had the k^{\Omega(k)} dependence on k even for r = 3 despite much research effort. In 2019, a breakthrough work by Alweiss-Lovett-Wu-Zhang improved the bound significantly to (Cr^3\log k\log\log_2 k)^k for k being at least 3 and C being an absolute constant. The breakthrough led to many new results including improved monotone circuit lower bounds by Cavalar-Kumar-Rossman.

Since the breakthrough of Alweiss-Lovett-Wu-Zhang, researchers have been refining the bound and simplifying the proof. The current best bound is (Cr\log k)^k for k \ge 2 proved by Bell-Chueluecha-Warnke via a minor but powerful twist in the proof of an earlier (Cr\log (rk))^k bound by Anup Rao:

Theorem 1 (Sunflower Lemma [BCW’20]). There exists a constant C>0 such that the following holds for all positive integers k\ge 2 and r. Let \mathcal F be a family of at least (Cr\log k)^k distinct sets each of size at most k. Then \mathcal F contains r sets that form a sunflower.

The aim of this blog post is to present a streamlined proof of Theorem 1. The proof is largely based on a blog post by Terence Tao where he presented an elegant proof of Rao’s result using Shannon entropy. However, Tao’s proof included a trick of passing to a conditional copy twice, which Tao described as “somewhat magical”. We show here that the trick is not necessary for the proof, and avoiding the trick gives a simpler proof with a slightly better constant in the bound.

We start by defining R-spread families, a notion key to all recent proofs of the sunflower lemma. We use [N] as a shorthand for \{1,\ldots,N\}. We use boldface symbols to denote random variables, and non-boldface ones to denote deterministic quantities.

Definition 1 (Spread family). Let \mathcal F be a family of finite sets A_1,\ldots,A_N that are not necessarily distinct. For R > 1, we say \mathcal F is R-spread if for all S\subseteq \cup_{n=1}^NA_n,

\begin{aligned} \Pr[S\subseteq A_{\mathbf n}] \le R^{-|S|},\end{aligned}

where \mathbf n is chosen uniformly at random from [N].

The main technical part of recent improvements in the sunflower lemma happens in the proof of the following refinement lemma. We use the base-2 logarithm throughout.

Lemma 2 (Refinement). Let X be a finite set. For R > 1, let \mathcal F be an R-spread family of sets A_1,\ldots,A_N\subseteq X. Let \mathbf W be a size-\delta|X| subset of X chosen uniformly at random for \delta\in(1/R,1] assuming \delta|X| is an integer. Let \mathbf n be a uniform random number in [N] independent of \mathbf W. There is a random number \mathbf n' in [N] (being not independent from (\mathbf n,\mathbf W) in general) such that A_{\mathbf n'}\subseteq A_{\mathbf n}\cup\mathbf W almost surely and

\begin{aligned}\mathbb E|A_{\mathbf n'}\backslash \mathbf W|\le\frac{4}{\log(R\delta)}\mathbb E|A_{\mathbf n}|.\end{aligned}

The proof of Theorem 1 using Lemma 2 can be found in Rao’s paper and Tao’s blog post, so we omit it here and focus on proving Lemma 2. Rao and Tao both proved a slightly weaker version of Theorem 1 but this weakness can be overcome using a minor twist observed by Bell-Chueluecha-Warnke. They also used slightly different forms of Lemma 2 but the differences are non-essential.

It is easy to see that in Lemma 2 one can convert \mathbf n' to a deterministic function of \mathbf n and \mathbf W without violating any property of the original \mathbf n' guaranteed by the lemma, because after conditioning on \mathbf n and \mathbf W one can fix any additional randomness in \mathbf n' so that |A_{\mathbf n'}\backslash \mathbf W| is minimized. This more deterministic version of \mathbf n' can be somewhat more convenient for proving Theorem 1 and it was used in Rao’s proof, but allowing additional randomness in \mathbf n' enabled Tao to obtain a simpler proof of Lemma 2 using a construction of \mathbf n' that is easier to analyze.

We follow Tao’s idea of proving Lemma 2 using Shannon entropy, but we present the proof in a more streamlined fashion with a slightly sharper constant in the bound (Tao proved a version of Lemma 2 where the constant 4 was replaced by 5). Specifically, we present the proof in a way resembling a basic technique in combinatorics called counting in two ways: one can show that two quantities are equal by showing that they both count the number of elements in the same set. Here, we estimate the entropy of the same collection of random variables in two different ways, and prove Lemma 2 by comparing the two estimates. The way we obtain the two estimates relies crucially on the chain rule of conditional entropy:

\begin{aligned} \mathbb H(\mathbf a_1,\ldots,\mathbf a_m) = \sum_{i=1}^m\mathbb H(\mathbf a_i|\mathbf a_1,\ldots,\mathbf a_{i-1}). &&  (1) \end{aligned}

Equation (1) holds for arbitrary random variables \mathbf a_1,\ldots,\mathbf a_m taking values in a discret set. We say that equation (1) computes the entropy of (\mathbf a_1,\ldots,\mathbf a_m) via the chain

\begin{aligned}\mathbf a_1\rightarrow \cdots \rightarrow \mathbf a_m.\end{aligned}

To prove Lemma 2, we obtain two entropy estimates for the same collection of random variables by applying (1) to two different chains.

We need the following useful lemmas about Shannon entropy. We omit their proofs here as they can be found in Tao’s blog post, where many basic properties of Shannon entropy are also discussed. (We also highly recommend Tao’s other blog posts about Shannon entropy and the entropy compression argument.)

Lemma 3 (Subsets of small sets have small conditional entropy). Let \mathbf A,\mathbf B be finite random sets. Assume \mathbf A\subseteq \mathbf B almost surely. Then \mathbb H(\mathbf A|\mathbf B)\le \mathbb E|\mathbf B|.

Lemma 4 (Information-theoretic interpretation of spread). Let A_1,\ldots,A_N be an R-spread family of finite sets. Let \mathbf n be chosen uniformly at random from [N]. Let \mathbf S be a random set satisfying \mathbf S\subseteq A_{\mathbf n} almost surely. Then

\begin{aligned} \mathbb H(\mathbf n|\mathbf S) \le \mathbb H(\mathbf n) - (\log R) \mathbb E|\mathbf S|.\end{aligned}

Lemma 5 (Information-theoretic properties of uniformly random subsets of fixed size). Let X be a finite set. Let \mathbf W be a size-\delta |X| subset of X chosen uniformly at random for \delta\in (0,1] assuming \delta |X| is an integer. Let \mathbf A be a random subset of X. The following inequalities hold:

  1. (absorption) \mathbb H(\mathbf A\cup \mathbf W) \le \mathbb H(\mathbf W) + 1 + (1 + \log(1/\delta))\mathbb E|\mathbf A|;
  2. (spread) if \mathbf A\subseteq \mathbf W almost surely, then \mathbb H(\mathbf W|\mathbf A)\le \mathbb H(\mathbf W) - \log(1/\delta)\mathbb E|\mathbf A|.

Proof of Lemma 2. If there exists n\in[N] such that A_n is empty, we can simply choose \mathbf n'=n deterministically. We assume henceforth that A_n\neq \emptyset for all n\in[N].

Following Tao’s proof, we construct {\mathbf n}' by creating a conditionally independent copy ({\mathbf n}',{\mathbf W}') of ({\mathbf n},{\mathbf W}) given {A_{{\mathbf n}'}}\cup {\mathbf W}' = {A_{{\mathbf n}}}\cup{\mathbf W}. In other words, ({\mathbf n}',{\mathbf W}') and ({\mathbf n},{\mathbf W}) have the same conditional distribution given {A_{{\mathbf n}}}\cup{\mathbf W}, and they are also conditionally independent given {A_{{\mathbf n}}}\cup{\mathbf W}. This construction guarantees that {A_{{\mathbf n}'}}\subseteq {A_{{\mathbf n}}}\cup{\mathbf W} almost surely, which implies that A_{\mathbf n'}\backslash \mathbf W\subseteq A_{\mathbf n}\cap A_{\mathbf n'} almost surely.

It remains to prove that the {\mathbf n}' constructed as above satisfies {\mathbb E}|A_{\mathbf n}\cap A_{\mathbf n'}|\le \frac 4{\log(R\delta)}{\mathbb E}|{A_{{\mathbf n}}}|. We achieve this by estimating {\mathbb H}({\mathbf n},{\mathbf W},{\mathbf n}',{\mathbf W}') using the chain rule (1) via two different chains, one for a lower bound and the other for an upper bound. The lower bound is obtained via the following chain:

\begin{aligned}{\mathbf n},{\mathbf W}\rightarrow {\mathbf n}',{\mathbf W}'. \end{aligned}

Namely, we apply (1) in the following way:

\begin{aligned}{\mathbb H}({\mathbf n},{\mathbf W},{\mathbf n}',{\mathbf W}') = {\mathbb H}({\mathbf n},{\mathbf W}) + {\mathbb H}({\mathbf n}',{\mathbf W}'|{\mathbf n},{\mathbf W}). && (2)\end{aligned}

By the independence of {\mathbf n} and {\mathbf W},

\begin{aligned}{\mathbb H}({\mathbf n},{\mathbf W}) = {\mathbb H}({\mathbf n}) + {\mathbb H}({\mathbf W}). && (3)\end{aligned}

By the conditional independence of ({\mathbf n}',{\mathbf W}') and ({\mathbf n},{\mathbf W}) given {A_{{\mathbf n}}}\cup{\mathbf W} and their identical conditional distribution,

\begin{aligned} & \mathbb H({\mathbf n}',{\mathbf W}'|{\mathbf n},{\mathbf W}) \\ = {} & {\mathbb H}({\mathbf n}',{\mathbf W}'|{\mathbf n},{\mathbf W},{A_{{\mathbf n}}}\cup {\mathbf W})\\ = {} & {\mathbb H}({\mathbf n}',{\mathbf W}'|{A_{{\mathbf n}}}\cup {\mathbf W})\\ = {} & {\mathbb H}({\mathbf n},{\mathbf W}|{A_{{\mathbf n}}}\cup {\mathbf W})\\ = {} & {\mathbb H}({\mathbf n},{\mathbf W}) - {\mathbb H}({A_{{\mathbf n}}}\cup{\mathbf W})\\ = {} & {\mathbb H}({\mathbf n}) + {\mathbb H}({\mathbf W}) - {\mathbb H}({A_{{\mathbf n}}}\cup{\mathbf W})\\ \ge {} & {\mathbb H}({\mathbf n}) - (2 + \log(1/\delta)){\mathbb E}|{A_{{\mathbf n}}}|,& (4) \end{aligned}

where the last inequality is by Lemma 5 Item 1. Plugging (3), (4) into (2), we get the following lower bound:

\begin{aligned}&{\mathbb H}({\mathbf n},{\mathbf W},{\mathbf n}',{\mathbf W}') \\ \ge {} & 2{\mathbb H}({\mathbf n}) + {\mathbb H}({\mathbf W}) - (2 + \log(1/\delta)){\mathbb E}|{A_{{\mathbf n}}}|. & (5)\end{aligned}

Now we establish an upper bound for {\mathbb H}({\mathbf n},{\mathbf W},{\mathbf n}',{\mathbf W}') via a different chain:

\begin{aligned}{\mathbf n}\rightarrow {A_{{\mathbf n}}}\cap{A_{{\mathbf n}'}}\rightarrow {\mathbf n}'\rightarrow {\mathbf W}\rightarrow {\mathbf W}'.\end{aligned}

Namely, we apply (1) in the following manner:

\begin{aligned} & {\mathbb H}({\mathbf n},{\mathbf W},{\mathbf n}',{\mathbf W}')\\ = {} & {\mathbb H}({\mathbf n},{A_{{\mathbf n}}}\cap{A_{{\mathbf n}'}},{\mathbf n}',{\mathbf W},{\mathbf W}')\\= {} & {\mathbb H}({\mathbf n})\\& + {\mathbb H}({A_{{\mathbf n}}}\cap{A_{{\mathbf n}'}}|{\mathbf n})\\& + {\mathbb H}({\mathbf n}'|{\mathbf n},{A_{{\mathbf n}}}\cap{A_{{\mathbf n}'}})\\& + {\mathbb H}({\mathbf W}|{\mathbf n},{A_{{\mathbf n}}}\cap{A_{{\mathbf n}'}},{\mathbf n}')\\& + {\mathbb H}({\mathbf W}'|{\mathbf n},{A_{{\mathbf n}}}\cap{A_{{\mathbf n}'}},{\mathbf n}',{\mathbf W}).&(6)\end{aligned}

By Lemma 3 and {A_{{\mathbf n}}}\cap{A_{{\mathbf n}'}}\subseteq {A_{{\mathbf n}}},

\begin{aligned}{\mathbb H}({A_{{\mathbf n}}}\cap{A_{{\mathbf n}'}}|{\mathbf n})\le{\mathbb E}|{A_{{\mathbf n}}}|. && (7) \end{aligned}

By Lemma 4 and {A_{{\mathbf n}}}\cap{A_{{\mathbf n}'}}\subseteq {A_{{\mathbf n}'}},

\begin{aligned} & {\mathbb H}({\mathbf n}'|{\mathbf n},{A_{{\mathbf n}}}\cap{A_{{\mathbf n}'}})\\ \le {} & {\mathbb H}({\mathbf n}'|{A_{{\mathbf n}}}\cap{A_{{\mathbf n}'}})\\ \le {} & {\mathbb H}({\mathbf n}') - (\log R){\mathbb E}|{A_{{\mathbf n}}}\cap{A_{{\mathbf n}'}}|. & (8) \end{aligned}

By Lemma 5 Item 2 and {A_{{\mathbf n}'}}\backslash{A_{{\mathbf n}}}\subseteq {\mathbf W},

\begin{aligned}&{\mathbb H}({\mathbf W}|{\mathbf n},{A_{{\mathbf n}}}\cap{A_{{\mathbf n}'}},{\mathbf n}') \\  \le {} & {\mathbb H}({\mathbf W}|{A_{{\mathbf n}'}}\backslash{A_{{\mathbf n}}}) \\ \le {} & {\mathbb H}({\mathbf W}) - (\log (1/\delta)){\mathbb E}|{A_{{\mathbf n}'}}\backslash{A_{{\mathbf n}}}|. & (9)\end{aligned}

Since {\mathbf W}' = ({A_{{\mathbf n}}}\cup{\mathbf W})\backslash({A_{{\mathbf n}'}}\backslash{\mathbf W}') and {A_{{\mathbf n}'}}\backslash{\mathbf W}'\subseteq {A_{{\mathbf n}'}}, by Lemma 3,

\begin{aligned}& {\mathbb H}({\mathbf W}'|{\mathbf n},{A_{{\mathbf n}}}\cap{A_{{\mathbf n}'}},{\mathbf n}',{\mathbf W}) \\ \le {} & {\mathbb H}({A_{{\mathbf n}'}}\backslash{\mathbf W}'|{\mathbf n},{A_{{\mathbf n}}}\cap{A_{{\mathbf n}'}},{\mathbf n}',{\mathbf W})\\ \le {} & {\mathbb E}|{A_{{\mathbf n}'}}|. &(10)\end{aligned}

Plugging (7), (8), (9), (10) into (6) and simplifying using

\begin{aligned}&{\mathbb H}({\mathbf n}') = {\mathbb H}({\mathbf n}),\quad {\mathbb E}|{A_{{\mathbf n}'}}| = {\mathbb E}|{A_{{\mathbf n}}}|,\\&{\mathbb E}|{A_{{\mathbf n}'}}\backslash{A_{{\mathbf n}}}| = {\mathbb E}|{A_{{\mathbf n}'}}| - {\mathbb E}|{A_{{\mathbf n}}}\cap{A_{{\mathbf n}'}}|,\end{aligned}

we get the following upper bound:

\begin{aligned}&{\mathbb H}({\mathbf n},{\mathbf W},{\mathbf n}',{\mathbf W}') \\ \le {} & 2{\mathbb H}({\mathbf n}) + {\mathbb H}({\mathbf W}) + (2 - \log(1/\delta)){\mathbb E}|{A_{{\mathbf n}}}| \\ & - (\log (R\delta)){\mathbb E}|{A_{{\mathbf n}}}\cap{A_{{\mathbf n}'}}|. & (11)\end{aligned}

Comparing (5) and (11) proves the desired inequality {\mathbb E}|A_{\mathbf n}\cap A_{\mathbf n'}|\le \frac 4{\log(R\delta)}{\mathbb E}|{A_{{\mathbf n}}}|.\square

Acknowledgments. I would like to thank my quals committee, Moses Charikar, Omer Reingold, and Li-Yang Tan for valuable feedback and inspiring discussions.

Leave a comment