Entropy Estimation via Two Chains: Streamlining the Proof of the Sunflower Lemma
The sunflower lemma describes an interesting combinatorial property of set families: any large family of small sets must contain a large sunflower—a sub-family consisting of sets with a shared core and disjoint petals (See the figure below for an example and a non-example of sunflowers.) The application of the lemma in theoretical computer science dates back to the influential paper by Razborov in 1985 that established the first super-polynomial monotone circuit lower bound for a function in NP, and since then the lemma has been applied broadly to other problems in theoretical computer science (see STOC version of Alweiss-Lovett-Wu-Zhang for a discussion of applications of the lemma in computer science).
The lemma was first proved by Erdős-Rado in 1960, who gave the quantitative bound that among distinct sets each of size at most one can find sets forming a sunflower. Erdos-Rado conjectured that the in the bound could be significantly improved to For nearly 60 years since the Erdős-Rado upper bound, all known upper bounds had had the dependence on even for despite much research effort. In 2019, a breakthrough work by Alweiss-Lovett-Wu-Zhang improved the bound significantly to for being at least and being an absolute constant. The breakthrough led to many new results including improved monotone circuit lower bounds by Cavalar-Kumar-Rossman.
Since the breakthrough of Alweiss-Lovett-Wu-Zhang, researchers have been refining the bound and simplifying the proof. The current best bound is for proved by Bell-Chueluecha-Warnke via a minor but powerful twist in the proof of an earlier bound by Anup Rao:
Theorem 1 (Sunflower Lemma [BCW’20]). There exists a constant such that the following holds for all positive integers and Let be a family of at least distinct sets each of size at most Then contains sets that form a sunflower.
The aim of this blog post is to present a streamlined proof of Theorem 1. The proof is largely based on a blog post by Terence Tao where he presented an elegant proof of Rao’s result using Shannon entropy. However, Tao’s proof included a trick of passing to a conditional copy twice, which Tao described as “somewhat magical”. We show here that the trick is not necessary for the proof, and avoiding the trick gives a simpler proof with a slightly better constant in the bound.
We start by defining -spread families, a notion key to all recent proofs of the sunflower lemma. We use as a shorthand for We use boldface symbols to denote random variables, and non-boldface ones to denote deterministic quantities.
Definition 1 (Spread family). Let be a family of finite sets that are not necessarily distinct. For , we say is -spread if for all
where is chosen uniformly at random from
The main technical part of recent improvements in the sunflower lemma happens in the proof of the following refinement lemma. We use the base- logarithm throughout.
Lemma 2 (Refinement). Let be a finite set. For , let be an -spread family of sets Let be a size- subset of chosen uniformly at random for assuming is an integer. Let be a uniform random number in independent of There is a random number in (being not independent from in general) such that almost surely and
The proof of Theorem 1 using Lemma 2 can be found in Rao’s paper and Tao’s blog post, so we omit it here and focus on proving Lemma 2. Rao and Tao both proved a slightly weaker version of Theorem 1 but this weakness can be overcome using a minor twist observed by Bell-Chueluecha-Warnke. They also used slightly different forms of Lemma 2 but the differences are non-essential.
It is easy to see that in Lemma 2 one can convert to a deterministic function of and without violating any property of the original guaranteed by the lemma, because after conditioning on and one can fix any additional randomness in so that is minimized. This more deterministic version of can be somewhat more convenient for proving Theorem 1 and it was used in Rao’s proof, but allowing additional randomness in enabled Tao to obtain a simpler proof of Lemma 2 using a construction of that is easier to analyze.
We follow Tao’s idea of proving Lemma 2 using Shannon entropy, but we present the proof in a more streamlined fashion with a slightly sharper constant in the bound (Tao proved a version of Lemma 2 where the constant 4 was replaced by 5). Specifically, we present the proof in a way resembling a basic technique in combinatorics called counting in two ways: one can show that two quantities are equal by showing that they both count the number of elements in the same set. Here, we estimate the entropy of the same collection of random variables in two different ways, and prove Lemma 2 by comparing the two estimates. The way we obtain the two estimates relies crucially on the chain rule of conditional entropy:
Equation (1) holds for arbitrary random variables taking values in a discret set. We say that equation (1) computes the entropy of via the chain
To prove Lemma 2, we obtain two entropy estimates for the same collection of random variables by applying (1) to two different chains.
We need the following useful lemmas about Shannon entropy. We omit their proofs here as they can be found in Tao’s blog post, where many basic properties of Shannon entropy are also discussed. (We also highly recommend Tao’s other blog posts about Shannon entropy and the entropy compression argument.)
Lemma 3 (Subsets of small sets have small conditional entropy). Let be finite random sets. Assume almost surely. Then
Lemma 4 (Information-theoretic interpretation of spread). Let be an -spread family of finite sets. Let be chosen uniformly at random from Let be a random set satisfying almost surely. Then
Lemma 5 (Information-theoretic properties of uniformly random subsets of fixed size). Let be a finite set. Let be a size- subset of chosen uniformly at random for assuming is an integer. Let be a random subset of The following inequalities hold:
- (absorption)
- (spread) if almost surely, then
Proof of Lemma 2. If there exists such that is empty, we can simply choose deterministically. We assume henceforth that for all
Following Tao’s proof, we construct by creating a conditionally independent copy of given In other words, and have the same conditional distribution given and they are also conditionally independent given This construction guarantees that almost surely, which implies that almost surely.
It remains to prove that the constructed as above satisfies We achieve this by estimating using the chain rule (1) via two different chains, one for a lower bound and the other for an upper bound. The lower bound is obtained via the following chain:
Namely, we apply (1) in the following way:
By the independence of and ,
By the conditional independence of and given and their identical conditional distribution,
where the last inequality is by Lemma 5 Item 1. Plugging (3), (4) into (2), we get the following lower bound:
Now we establish an upper bound for via a different chain:
Namely, we apply (1) in the following manner:
By Lemma 3 and
By Lemma 4 and
By Lemma 5 Item 2 and
Since and by Lemma 3,
Plugging (7), (8), (9), (10) into (6) and simplifying using
we get the following upper bound:
Comparing (5) and (11) proves the desired inequality
Acknowledgments. I would like to thank my quals committee, Moses Charikar, Omer Reingold, and Li-Yang Tan for valuable feedback and inspiring discussions.
Leave a comment