DNF Minimization, Part II

In the last post, I discussed the complexity of DNF minimization given a dataset. Specifically, given a dataset D\subseteq \{0,1\}^n\times \{0,1\} of input/output pairs, how hard is it to compute the smallest DNF consistent with D? In this post, we’ll look at a variant of this problem where one requires that the dataset D specify a label for every point in \{0,1\}^n.

DNF truth table minimization. DNF truth table minimization is the variant of DNF minimization where the input dataset T is a truth table for a function f:\{0,1\}^n\to\{0,1\}. The extra constraint on the dataset can only make the problem easier, and indeed using the greedy approximation for Set-Cover, it is possible to approximate truth table minimization to within a factor of O(\log |T|) in polynomial-time. The first lower bound was proved by Masek in 1979 which showed that the exact variant is NP-hard. His result was never published though the proof was later reproduced and refined by Umans, Villa, and Sangiovanni-Vincentelli. Hardness of approximation remained open until 2006, when two independent papers came out which obtained hardness of approximating within a factor of \left(\log|T|\right)^{\gamma}.

It is NP-hard to approximate truth table DNF minimization within a factor of \left(\log|T|\right)^{\gamma} for some \gamma>0.

Feldman 2009

Allender, Hellerstein, McCabe, Pitassi, and Saks independently obtained the same hardness of approximation under the stronger assumption that \mathrm{NP}\not\subseteq \mathrm{DTIME}(n^{\mathrm{polylog}~ n}). This was subsequently improved by Khot and Saket (2008) to an approximation factor of \left(\log|T|\right)^{1-\delta} for all constants \delta>0, also under the stronger assumption that \mathrm{NP}\not\subseteq \mathrm{DTIME}(n^{\mathrm{polylog}~ n}).

The proofs of these results use PCP techniques and are significantly more involved than that of DNF minimization in the dataset setting. A key intermediate step is proving hardness for partial truth table minimization. In this variant, the input is still a truth table of size 2^n but some entries are starred indicating “don’t care” inputs. Hardness of approximating this variant follows from hardness of approximating the label cover problem. After establishing hardness, the partial truth table variant can be reduced to truth table minimization using ideas from Gimpel (1965).

Learning hardness. Given access to random labeled samples (x,\varphi(x)) from some unknown DNF \varphi:\{0,1\}^n\to\{0,1\} where x is drawn from an unknown distribution D, how hard is it to find a hypothesis that approximates \varphi over D? This problem is known as PAC learning DNFs and is a fundamental problem in learning theory. In the influential paper introducing PAC learning, Valiant showed that DNFs with k terms can be learned by width-k CNFs in polynomial time when k is a constant. On the flip side, Pitt and Valiant showed that learning size-s DNFs by size-s DNFs is NP-hard (this is a consequence of the NP-hardness discussed in the first blog post).

The landscape changes if the learner is allowed to assume that the distribution in the learning instance is uniform. The Pitt-Valiant NP-hardness result crucially relies on constructing a “pathological” distribution which makes the problem hard. In fact, with this new distributional assumption, it is possible to learn size-s DNFs by size-O(s\log s) DNFs:

There is an algorithm that learns size-s DNFs with size-O(s\log s) DNFs over the uniform distribution in time n^{O(\log s)}.

Verbeurgt 1990

Any algorithm that learns DNFs over the uniform distribution can also be used to solve DNF minimization given a truth table. This observation was made by Feldman. The basic idea is to provide the learner with uniform random samples labeled according to the truth table. Using this observation and the hardness result from Khot and Saket, one can obtain the following learning lower bound which complements the above upper bound.

The following holds for all constants \delta>0. If \mathrm{NP}\not\subseteq \mathrm{DTIME}(n^{\mathrm{polylog}~ n}), then size-s DNFs cannot be learned by size-s(\log s)^{1-\delta} DNFs in polynomial time. This holds even if the distribution in the learning instance is uniform and the target is a \log n-junta over n variables.

In fact, the above theorem also holds even if the learner has access to membership queries. This means that in addition to random samples, the learner is allowed to query the function on arbitrary inputs (note Verbeurgt’s algorithm only uses random samples). We also remark that the above theorem requires the learner to output a hypothesis that has accuracy 1/n over the uniform distribution. This is a weakness of the hardness result. For example, it does not rule out PAC learning size-s DNFs by size-s(\log s)^{1-\delta} DNFs to error 1\% over the uniform distribution. Regardless, it gives some evidence that Verbeurgt’s algorithm is unlikely to be significantly improved.

Conclusion and open problems. DNF minimization is a fundamental problem in computer science and has many connections to learning. We’ve seen that assumptions on the structure of the dataset can greatly affect the complexity of the problem, just as distributional assumptions in the learning setting can affect the complexity of the learning task. While a lot of progress has been made on understanding the nuances of these assumptions, there are still many directions to explore:

1. Gaps remain between the approximation lower bounds and the approximation upper bounds for these problems. For example, can one show hardness of approximating DNF minimization in the dataset setting to within a factor of 0.99m? Or perhaps it is possible to improve the trivial m-approximation upper bound?

2. Can Verbeurgt’s algorithm be improved to run in time \mathrm{poly}(n,s)? The O(\log |T|)-approximation algorithm for truth table minimization runs in polynomial-time. A priori nothing rules out a \mathrm{poly}(n,s)-time algorithm for learning size-s DNFs by size s\log s DNFs over the uniform distribution.

3. Is it hard to learn DNFs over the uniform distribution to constant error?

That’s it for this blog post! Thanks for reading!

Leave a comment