Entropy Estimation via Two Chains: Streamlining the Proof of the Sunflower Lemma
The sunflower lemma describes an interesting combinatorial property of set families: any large family of small sets must contain a large sunflower—a sub-family consisting of sets with a shared core
and disjoint petals
(See the figure below for an example and a non-example of sunflowers.) The application of the lemma in theoretical computer science dates back to the influential paper by Razborov in 1985 that established the first super-polynomial monotone circuit lower bound for a function in NP, and since then the lemma has been applied broadly to other problems in theoretical computer science (see STOC version of Alweiss-Lovett-Wu-Zhang for a discussion of applications of the lemma in computer science).

The lemma was first proved by Erdős-Rado in 1960, who gave the quantitative bound that among distinct sets each of size at most
one can find
sets forming a sunflower. Erdos-Rado conjectured that the
in the bound could be significantly improved to
For nearly 60 years since the Erdős-Rado upper bound, all known upper bounds had had the
dependence on
even for
despite much research effort. In 2019, a breakthrough work by Alweiss-Lovett-Wu-Zhang improved the bound significantly to
for
being at least
and
being an absolute constant. The breakthrough led to many new results including improved monotone circuit lower bounds by Cavalar-Kumar-Rossman.
Since the breakthrough of Alweiss-Lovett-Wu-Zhang, researchers have been refining the bound and simplifying the proof. The current best bound is for
proved by Bell-Chueluecha-Warnke via a minor but powerful twist in the proof of an earlier
bound by Anup Rao:
Theorem 1 (Sunflower Lemma [BCW’20]). There exists a constant such that the following holds for all positive integers
and
Let
be a family of at least
distinct sets each of size at most
Then
contains
sets that form a sunflower.
The aim of this blog post is to present a streamlined proof of Theorem 1. The proof is largely based on a blog post by Terence Tao where he presented an elegant proof of Rao’s result using Shannon entropy. However, Tao’s proof included a trick of passing to a conditional copy twice, which Tao described as “somewhat magical”. We show here that the trick is not necessary for the proof, and avoiding the trick gives a simpler proof with a slightly better constant in the bound.
We start by defining -spread families, a notion key to all recent proofs of the sunflower lemma. We use
as a shorthand for
We use boldface symbols to denote random variables, and non-boldface ones to denote deterministic quantities.
Definition 1 (Spread family). Let be a family of finite sets
that are not necessarily distinct. For
, we say
is
-spread if for all
where is chosen uniformly at random from
The main technical part of recent improvements in the sunflower lemma happens in the proof of the following refinement lemma. We use the base- logarithm throughout.
Lemma 2 (Refinement). Let be a finite set. For
, let
be an
-spread family of sets
Let
be a size-
subset of
chosen uniformly at random for
assuming
is an integer. Let
be a uniform random number in
independent of
There is a random number
in
(being not independent from
in general) such that
almost surely and
The proof of Theorem 1 using Lemma 2 can be found in Rao’s paper and Tao’s blog post, so we omit it here and focus on proving Lemma 2. Rao and Tao both proved a slightly weaker version of Theorem 1 but this weakness can be overcome using a minor twist observed by Bell-Chueluecha-Warnke. They also used slightly different forms of Lemma 2 but the differences are non-essential.
It is easy to see that in Lemma 2 one can convert to a deterministic function of
and
without violating any property of the original
guaranteed by the lemma, because after conditioning on
and
one can fix any additional randomness in
so that
is minimized. This more deterministic version of
can be somewhat more convenient for proving Theorem 1 and it was used in Rao’s proof, but allowing additional randomness in
enabled Tao to obtain a simpler proof of Lemma 2 using a construction of
that is easier to analyze.
We follow Tao’s idea of proving Lemma 2 using Shannon entropy, but we present the proof in a more streamlined fashion with a slightly sharper constant in the bound (Tao proved a version of Lemma 2 where the constant 4 was replaced by 5). Specifically, we present the proof in a way resembling a basic technique in combinatorics called counting in two ways: one can show that two quantities are equal by showing that they both count the number of elements in the same set. Here, we estimate the entropy of the same collection of random variables in two different ways, and prove Lemma 2 by comparing the two estimates. The way we obtain the two estimates relies crucially on the chain rule of conditional entropy:
Equation (1) holds for arbitrary random variables taking values in a discret set. We say that equation (1) computes the entropy of
via the chain
To prove Lemma 2, we obtain two entropy estimates for the same collection of random variables by applying (1) to two different chains.
We need the following useful lemmas about Shannon entropy. We omit their proofs here as they can be found in Tao’s blog post, where many basic properties of Shannon entropy are also discussed. (We also highly recommend Tao’s other blog posts about Shannon entropy and the entropy compression argument.)
Lemma 3 (Subsets of small sets have small conditional entropy). Let be finite random sets. Assume
almost surely. Then
Lemma 4 (Information-theoretic interpretation of spread). Let be an
-spread family of finite sets. Let
be chosen uniformly at random from
Let
be a random set satisfying
almost surely. Then
Lemma 5 (Information-theoretic properties of uniformly random subsets of fixed size). Let be a finite set. Let
be a size-
subset of
chosen uniformly at random for
assuming
is an integer. Let
be a random subset of
The following inequalities hold:
- (absorption)
- (spread) if
almost surely, then
Proof of Lemma 2. If there exists such that
is empty, we can simply choose
deterministically. We assume henceforth that
for all
Following Tao’s proof, we construct by creating a conditionally independent copy
of
given
In other words,
and
have the same conditional distribution given
and they are also conditionally independent given
This construction guarantees that
almost surely, which implies that
almost surely.
It remains to prove that the constructed as above satisfies
We achieve this by estimating
using the chain rule (1) via two different chains, one for a lower bound and the other for an upper bound. The lower bound is obtained via the following chain:
Namely, we apply (1) in the following way:
By the independence of and
,
By the conditional independence of and
given
and their identical conditional distribution,
where the last inequality is by Lemma 5 Item 1. Plugging (3), (4) into (2), we get the following lower bound:
Now we establish an upper bound for via a different chain:
Namely, we apply (1) in the following manner:
By Lemma 3 and
By Lemma 4 and
By Lemma 5 Item 2 and
Since and
by Lemma 3,
Plugging (7), (8), (9), (10) into (6) and simplifying using
we get the following upper bound:
Comparing (5) and (11) proves the desired inequality
Acknowledgments. I would like to thank my quals committee, Moses Charikar, Omer Reingold, and Li-Yang Tan for valuable feedback and inspiring discussions.
Leave a Reply