Continuing our previous discussion, I will show another application of the new recipe described in the previous post (i.e., constructing a “good” polynomial for a problem of interest), which will establish average-case hardness of a problem related to the orthogonal vector (OV) problem. (Recall the OV problem: Given $X=\{x_1,\dots,x_n\}$, $Y=\{y_1,\dots,y_n\}$, where each $x_i,y_i\in\{0,1\}^{d}$ for $d=\omega(\log n)$, decide if there are $x_i,y_j$ such that $\langle x_i,y_j\rangle=0$. The reader can look up the worst-case hardness of the OV problem in the first post of the series.)

The motivating question is can we show average-case hardness for counting the number of orthogonal pairs in the OV problem by constructing a “good” polynomial? (One motivation is that such average-case hardness result could serve as the source of reductions for proving average-case hardness for many other problems, because OV is a main source of fine-grained hardness.) There is a good reason to believe the answer is no. Specifically, an $O(n^{2-\delta})$-time algorithm for counting orthogonal pairs for average-case OV instances (here “average-case” is Erdős–Rényi random input model, and $\delta$ is a constant that depends on the parameter of the input model) was given in [DLW20], while constructing a “good” polynomial would prove $\Omega(n^{2-\varepsilon})$ average-case hardness for any constant $\varepsilon>0$ assuming randomized SETH. This indicates that sometimes constructing a “good” polynomial might be a little too ambitious goal.

Instead, can we first come up with a nice combinatorial problem that encodes OV on a slightly larger binary input space and then construct a “good” polynomial for counting solutions of this combinatorial problem? (The motivation is that since this combinatorial problems encodes OV, it is at least as hard as OV for worst case, and moreover, if we can construct a “good” polynomial for counting solutions of this combinatorial problem, by the new recipe in the previous post, counting solutions of this combinatorial problem for average case is (almost) as hard as that for worst case. Therefore, counting solutions of this combinatorial problem for average case is at least as hard as OV for worst case. More importantly, this combinatorial problem could serve as the source of reductions thanks to its combinatorial structure.) The factored OV problem introduced in [DLW20] gives a positive answer to this question. Analogously, they proposed factored variants for many other flagship fine-grained hard problems. By reductions to these factored problems, they managed to prove average-case fine-grained hardness for many natural combinatorial problems such as counting regular expression matchings (the featured image of this post is the web of reductions in their paper).

Next, for the purpose of exposition, I briefly sketch the high-level idea behind the factored OV problem (without even explicitly describing its combinatorial interpretation, since I will not show any reduction from this problem).

Counting solutions for factored OV. Given an OV instance $X,Y$ for $d=o((\log n/\log\log n)^2)$ (actually, $d=o(\log^2 n/\log\log n)$ would also work, and the choice here is for simplicity), we encode $x\in\{0,1\}^d$ as $\textrm{Enc}(x):=\textrm{LONG}(x[1:\sqrt{d}])\circ \textrm{LONG}(x[\sqrt{d}+1:2\sqrt{d}])\circ\dots\circ\textrm{LONG}(x[d-\sqrt{d}+1:d])$,
where $x[i:j]$ represents the subvector (block) of $x$ from the $i$-th to the $j$-th coordinate, and $\textrm{LONG}$ is the long code encoding (specifically, $\textrm{LONG}$ maps a $\sqrt{d}$-dimensional binary vector $x'$ to a $2^{\sqrt{d}}$-dimensional vector binary vector $\textrm{LONG}(x')$ of which all the coordinates are zero except the coordinate indexed by $x'$). Namely, $\textrm{Enc}(x)$ is the concatenation of the long code encoding of each block of $x$, and thus, $\textrm{Enc}(x)\in\{0,1\}^{\sqrt{d}\cdot 2^{\sqrt{d}}}$, and for our choice of $d$, we have that $\textrm{Enc}(x)\in\{0,1\}^{n^{o(1)}}$. Therefore, by taking such encoding for the vectors in the OV instance, we blow up the size of input (and hence weaken the worst-case hardness of OV) very mildly.

The key advantage of such encoding is that the indicator function $\mathbf{1}(x[1:\sqrt{d}]\perp y[1:\sqrt{d}])$ (which outputs $1$ if the two vectors are orthogonal and $0$ otherwise) can be represented as the sum of degree- $2$ monomials, of which the variables are the coordinates of $\textrm{LONG}(x[1:\sqrt{d}])$ and $\textrm{LONG}(y[1:\sqrt{d}])$. Indeed, we can first enumerate all the orthogonal pairs of vectors $v_1,v_2\in \{0,1\}^{\sqrt{d}}$, and we check whether $\textrm{LONG}(x[1:\sqrt{d}])$ has value $1$ at coordinate $v_1$, and $\textrm{LONG}(y[1:\sqrt{d}])$ has value $1$ at coordinate $v_2$, by taking product of these two coordinates, and then, we take the sum of all the products.

Using this approach, for each $x\in X, y\in Y$, for each $i\in[\sqrt{d}]$, we get a degree- $2$ polynomial that computes $\mathbf{1}(x[(i-1)\sqrt{d}+1:i\sqrt{d}]\perp y[(i-1)\sqrt{d}+1:i\sqrt{d}])$ on input $\textrm{Enc}(x)$ and $\textrm{Enc}(y)$. The product of these polynomials obviously computes $\mathbf{1}(x\perp y)$, and moreover, this product is a $2\sqrt{d}$-partite polynomial ( $d'$-partite polynomial is defined in the previous post), where each part corresponds to the coordinates of the long code encoding of a block (subvector) of $x$ or $y$. If we sum up these $2\sqrt{d}$-partite polynomials for all pairs $x\in X, y\in Y$, we get a $2\sqrt{d}$-partite polynomial that precisely counts orthogonal pairs for the OV instance. We can let the field size of this polynomial be a prime $p>n^2$ ( $n^2$ is a trivial upper bound of the number of orthogonal pairs) such that the output of this polynomial is indeed the number of orthogonal pairs.

Notice that (i) Since the encoding only blows up the input size mildly, the worst-case hardness of OV (almost) carries over to evaluating this polynomial. (ii) Since $d=o((\log n/\log\log n)^2)$, the $2\sqrt{d}$-partite polynomial is a “good” polynomial (defined in the previous post). It follows from our new recipe in the previous post that evaluating this polynomial on binary input for average case (here “average case” means Erdős–Rényi random input model) is (almost) as hard as worst case. (iii) Last but not least, evaluating this polynomial on any $z\in\{0,1\}^{\sqrt{d}\cdot 2^{\sqrt{d}}}$ (not just $\textrm{Enc}(x)$ for some $x\in\{0,1\}^d$) can be interpreted as counting solutions for a combinatorial problem, which is the factored OV problem in [DLW20]. As I mentioned earlier, I will not explain the combinatorial interpretation in details. The takeaway is that counting solutions of such factored problem is average-case fine-grained hard, and its combinatorial structure allows possible reductions to other natural combinatorial problems.

Finally, I mention two broad research directions in this area: (i) design cryptographic primitives, e.g., one-way functions, based on these fine-grained average-case hardness results (or show complexity barriers) and (ii) prove fine-grained average-case hardness for decision problems (or design more efficient algorithms).

Acknowledgements. I would like to thank my quals committee — Aviad Rubinstein, Tselil Schramm, Li-Yang Tan for valuable feedback to my quals talk.