Testing Juntas (or “Avoiding the Curse of Irrelevant Dimensionality”)

Posted on February 12, 2018 by ccanonne in Uncategorized // 2 Comments

Juntas (or, with more words, “functions of few relevant attributes”) are a central concept in computational learning theory, analysis of Boolean functions, and machine learning. Without trying to motivate this here (see e.g. this blog post of Dick Lipton’s), let us first recap some notions.

A Boolean function $f\colon \{-1,1\}^n \to \{-1,1\}$ is said to be a $k$ -junta (where one should think of $n$ as big, and $k$ as smallish, constant or say at most $\log n$ ) if there is an unknown set of $k$ indices $i_1,\dots,x_k$ such that $f$ depends only on the variables $x_{i_1},\dots, x_{i_k}$ . Put differently, if there exists a Boolean function $g\colon \{-1,1\}^k \to \{-1,1\}$ such that, for all $x$ ,

$\Large f(x) = g(x_{i_1},\dots,x_{i_k})$

( $f$ is technically on $n$ variables, but “morally” on $k$ ). In particular, any Boolean function is trivially an $n$ -junta. Also, because names are fun: a $1$ -junta is called a dictator.

Now, if $f$ turns out to be a $k$ -junta for some small value of $k$ , then one can hope to do many things more efficiently: e.g., learn $f$ ; test some interesting property of $f$ . With a little bit of luck, now the complexity of all these tasks could be much milder, with $k$ replacing (ish) all the $n$ ‘s, with maybe some $\log n$ ‘s thrown in for good measure.

So testing whether an unknown function $f\colon \{-1,1\}^n \to \{-1,1\}$ is a $k$ -junta (given, say, query access to it, or even random samples) could be a quite useful thing to have in our toolbox. Here enters property testing of juntas:

Given parameters $k\geq 1$ , $\varepsilon\in(0,1)$ , and query access to an unknown Boolean function $f\colon \{-1,1\}^n \to \{-1,1\}$ , distinguish with high probability between the case where $f$ is a $k$ -junta and the case where $f$ is at least at distance $\varepsilon$ (in Hamming distance) for any $k$ -junta.

(In the above, the main parameter is $n$ , then $k$ , and $\varepsilon$ comes last — and is often considered as a small constant for the sake of asymptotics. Further, the focus is on the worst-case number of queries made to the unknown function, the query complexity, and not the running time.)

Property testing algorithms come in many flavors: they can be non-adaptive (decide all the queries they will make to $f$ beforehand, based only on their private randomness), adaptive, one-sided (accept juntas with probability $1$ and only err, with small probability, on functions far from being juntas), two-sided; standard (as defined above) or tolerant (accept functions $\varepsilon'$ -close to some $k$ -junta, and reject those that are $\varepsilon$ -far).

So what do we know thus far?

1 Surprise: there is no $n$

Nobody likes to have three parameters, and nobody likes $n$ anyway. The first main insight (and surprising result), due to Fischer, Kindler, Ron, Safra, and Samorodnitsky [FKRSS04] who initiated the study of $k$ -junta testing, is that, actually, one can get rid of $n$ altogether!

Namely, they give a tester (actually, a couple) with query complexity $O(k^2/\varepsilon^2)$ , non-adaptive, and one-sided. The main idea underlying this result is the following: if one partitions randomly the $n$ variables in $\mathrm{poly}(k)$ bins, then with high probability the following happens. If $f$ is indeed a $k$ -junta, then clearly at most $k$ of these bins will ever contain a relevant variable. Moreover, if $f$ is far from being a $k$ -junta, then either $k+1$ bins will contain significantly “influent” variables, or many bins will contain many “relatively influent” variables.

So then, it suffices to estimate the “influence” of each of these bins, and reject if more than $k$ of them have noticeable influence.

Note that I used the word “influence” quite a lot in the previous paragraph, and this for a good reason: the quantity at play here is the influence of a set of variables, a generalization of the standard notion of influence of a variable in Boolean analysis. Defined as

$\textrm{Inf}_f(S) \stackrel{\rm def}{=} 2\Pr_{x\sim \{-1,1\}^{[n]\setminus S}, y\sim \{-1,1\}^{S}}[ f(x\sqcup y) \neq f(x) ]$

this quantity captures the probability that “re-randomizing” the variables in $S$ will change the outcome of the function. Also, it is very simple to estimate given query access to $f$ .

2 Let’s bring that $k$ down

After this quite amazing result, Eric Blais [Blais08, Blais09] improved this bound twice: first, establishing a non-adaptive, two-sided upper bound of $\tilde{O}(k^{3/2})/\varepsilon$ queries, which hinges on a clever balancing between two sub-algorithms: one to detect few high-influence variables, and the other to detect many low-influence variables. As before, this algorithm relies on random partitioning, and estimating the influence.

The second algorithm yields an adaptive, one-sided upper bound of ${O}(k\log k + k/\varepsilon)$ queries—and proceeds by binary search, trying to find and eliminate influential bins one at a time, repeatedly starting with two inputs $x,y$ with $f(x)\neq f(y)$ and “moving” from $x$ to $y$ , changing all the variables inside a bin at once until the value of $f$ flips.

These last two upper bounds have remained unchallenged. For a rather good reason: they’re more or less tight, each of them.

Indeed, Chockler and Gutfreund [CG04], and Blais, Brody, and Matulef [BBM12] proved that $\Omega(k)$ queries were necessary, even for adaptive, two-sided algorithms. (This leaves a $\log k$ and an $\varepsilon$ hanging…) Then, after Servedio, Tan, and Wright [STW15] proved a slighter stronger lower bound (for some regime of $\varepsilon$ ) against non-adaptive tester, a result of Chen, Servedio, Tan, Waingarten, and Xie [CSTWX17] showed that, actually, $\tilde{\Omega}(k^{3/2}/\varepsilon)$ query were necessary for non-adaptive, two-sided testing of $k$ -juntas.

So, well, both of Blais’ testers are optimal. Are we done?

3 The Virtue of Tolerance

Not quite, actually. Up to some slack (see open problems below), we know the landscape of standard testing of $k$ -juntas. But what about trying to distinguish between a very slightly noisy version of a $k$ -junta, and something very far from being one? I.e., what about tolerant testing?

While all the above algorithms have some (very) small amount of tolerant built in (namely, something like $\mathrm{poly}(\varepsilon/k)$ ), the best algorithm known for “general” tolerant testing for a while was an $\exp(k/\varepsilon)$ -query tolerant tester implied by… the algorithm of [Blais09]. At least, there is no $n$ in there either. But since it is totally conceivable that a tolerant testing algorithm with polynomial (in $k$ and $1/\varepsilon$ ) query complexity exists—we have no separation, this is slightly unsatisfying.

Recently, with Eric Blais, Talya Eden, Amit Levi, and Dana Ron [BCELR18], we provided two different algorithms for tolerant testing of juntas.

The first provides a smooth trade-off between the amount of tolerance guaranteed and the query complexity: for $\varepsilon \rho$ vs. $\varepsilon$ , the query complexity is $O((k\log k)/(\varepsilon \rho(1-\rho)^k)$ . In particular, for $\rho = 1/k$ this (slightly) improves on the weak tolerance provided by the known (standard) testers, while having low polynomial query complexity. At the other end of the spectrum, setting $\rho = O(1)$ ones gets a tolerant testing algorithm with query complexity $\exp(k)/\varepsilon$ (so, at least, the $\varepsilon$ moved out of the exponent).

The second builds on the properties of the influence function — which is not only directly related to the distance to $k$ -junta$, but also happens to be monotone, submodular, and generally very nice — to phrase the question as a submodular minimization problem under cardinality constraints. Which turns out to be a drag, since this problem is NP-hard, even to approximate. However, relaxing the cardinality constraint (replacing it with a suitable linear penalization term in the objective) makes the problem tractable, at the price of losing a bit in the solution. Specifically, now one can distinguish, with $\mathrm{poly}(k,1/\varepsilon)$ queries, functions which are $\varepsilon$ -close to some $k$ -junta from those which are $O(\varepsilon)$ -far from every $2k$-juntas.

Which almost solves the question, but not quite.

4 Intolerance is Easier?

Mentioned earlier was the fact that we do not have any separation being tolerant and intolerance testers for $k$ -juntas. That was a bit of a lie: in yet unpublished work, Levi and Waingarten [LW18] establish the following:

Any (possibly two-sided) non-adaptive tolerant tester for $k$ -juntas must have query complexity $\tilde{\Omega}(k^2)$ (for $\varepsilon',\varepsilon=\Theta(1)$ ).

Since the standard testing version can be done with $\tilde{O}(k^{3/2})$ queries, this is a polynomial separation between standard and tolerant testing, and the only one I am aware of for a natural class of Boolean functions.

Now, the proof goes through a reduction to a graph testing question, in a model the authors introduce (that of rejection sampling oracle, which on query $T\subseteq [n]$ returns $\{i,j\}\cap T$ for an edge $(i,j)$ sampled uniformly at random from the unknown graph $G$ ; and the cost of a query $T$ is its size $\lvert T\rvert$ ). In more detail, they show a lower bound of $\Omega(n^2)$ rejection sampling query cost to distinguish between (i) $G$ being a complete balanced bipartite graph with side size $n/2$ , and (ii) $G$ being the union of two copies of $K_{n/2}$ .

5 Open questions

The above may give the impression that all is done, and that the landscape is fully known (up to some technical details). Sadly (or fortunately), this is far from being true; below, I highlight a few question i personnally find very interesting and appealing. (I may have very bad taste, mind you)

The role of adaptivity in standard testing. We have pretty tight bounds (up to some $\log k$ ‘s and $\varepsilon$ ‘s) for both adaptive and non-adaptive testing of juntas. What about testing with bounded adaptivity (as introduced in [CG17])? Is there a smooth tradeoff between query complexity and number of rounds of adaptivity, or do things “jump” immediately?
Tolerant testing. Can we get a $\textrm{poly}(k,1/\varepsilon)$ -query tolerant tester without the relaxation “close to $k$ -junta vs. far from $2k$ -junta$?
Tolerant testing. Can we get a tester (even with query complexity exponential in $k$ , but independent of $n$ ) which distinguishes “ $\varepsilon$ -close vs. $1.1\varepsilon$ -far”? (It looks like that all techniques relying on using the influence function, due to its slightly-loose relation to distance to $k$ -junta, have to lose a factor $2$ here). If not, can we prove a lower bound against “influence-based testers”?
Standard vs. tolerant testing. Can the separation of Levi and Waingarten be extended to adaptive algorithms?

And finally, one I am really fond of: changing the setting itself, say a function $f\colon\mathbb{R}^n \to [-1,1]$ is a $k$ -junta if there exists an unknown rotation of the space such that, in this unknown basis, $f$ depends on at most $k$ coordinates. (That is, $f$ only depends on a low-dimensional subspace). Can we test efficiently whether an unknown $f$ is a $k$ -junta?

References.

[BBM12] Blais, Brody, and Matulef. Property testing lower bounds via communication complexity. CCC, 2012.
[BCELR18] Blais, Canonne, Eden, Levi, and Ron. Tolerant Junta Testing and the Connection to Submodular Optimization and Function Isomorphism. SODA, 2018.
[Blais08] Blais. Improved bounds for testing juntas. RANDOM, 2008.
[Blais09] Blais. Testing juntas nearly optimally. STOC, 2009.
[CG04] Chockler and Gutfreund. A lower bound for testing juntas. Information Processing Letters, 2004.
[CG17] Canonne and Gur. An adaptivity hierarchy theorem for property testing. CCC, 2017.
[CSTWX17] Chen, Servedio, Tan, Waingarten, and Xie. Settling the query complexity of non-adaptive junta testing. CCC, 2017.
[FKRSS04] Fischer, Kindler, Ron, Safra, and Samorodnitsky. Testing juntas. J. Computer
& System Sciences, 2004.
[STW15] Servedio, Tan, and Wright. Adaptivity helps for testing juntas. CCC, 2015.

	Roy Andrews on Prediction with a short m…
	Multiparty Computati… on A few lessons from the history…
	Arick Grootveld on Trace Reconstruction from Comp…
	DNF Minimization, Pa… on DNF Minimization, Part I
	Hao Sun on RANDOM & APPROX 2023
	Yet another backprop… on Backpropagation ≠ Chain R…
	Boaz Barak on Backpropagation ≠ Chain R…
	Lunjia Hu on Backpropagation ≠ Chain R…
	Boaz Barak on Backpropagation ≠ Chain R…
	kodlu on Average-Case Fine-Grained Hard…

Theory Dish

Stanford's CS Theory Research Blog

Testing Juntas (or “Avoiding the Curse of Irrelevant Dimensionality”)

1 Comment on Testing Juntas (or “Avoiding the Curse of Irrelevant Dimensionality”)

1 Trackback / Pingback

Leave a reply to Clement Canonne Cancel reply

Share this:

Related

1 Comment on Testing Juntas (or “Avoiding the Curse of Irrelevant Dimensionality”)

1 Trackback / Pingback

Leave a reply to Clement Canonne Cancel reply