Quantum DNA sequencing & the ultimate hardness hypothesis

Longest common subsequence (LCS) and edit distance (ED) are fundamental similarity measures of interest to genomic applications (insertions/deletions/substitutions are a pretty good proxy for both mutations and read errors). The fastest known algorithms for computing the LCS/ED between two n-character strings run in nearly quadratic time. Breakthroughs in fine-grained complexity [BI18][AHWW16] from the last few years provide some nice formal barriers further progress: in particular, substantially faster algorithms would refute NC-SETH, the hypothesis that given an NC circuit with n input bits,  deciding if the circuit has any satisfying assignment requires 2^{(1-o(1))n} time. (Note that this is a weaker, aka safer, hypothesis than the standard SETH.)

 

Quantum algorithms for edit distance?

One way to try to circumvent the computational barriers is by approximation algorithms (see my blog post). A different approach is to go with quantum algorithms: Grover’s search can solve NC-SAT in 2^{n/2} time. Even those of us less skeptic than Gil Kalai can probably agree that quadratic quantum speedups won’t be practical anytime soon. But in theory, I find the question of whether we can design subquadratic quantum algorithms for edit distance very interesting (see also [BEG+18]).

Alas, even with the power of quantum computers we don’t know any truly subquadratic algorithms for computing LCS/ED. On the other hand, it is not clear how to rule out even linear-time algorithms. A few weeks ago, Buhrman, Patro, and Speelman, posted a paper that gives a n^{3/2} quantum query complexity lower bound for LCS/ED.

Open Problem: Close the n^{3/2} vs n^2 gap for quantum query complexity of ED/LCS.

What is “query complexity” of ED/LCS? Buhrman et al. consider the following query model: In LCS, we want a maximum monotone matching between the characters of the two strings, where we can match two characters if they’re identical. Now suppose that in the query complexity problem we still want to find a maximum monotone matching, but instead of the character-equality graph (which is a union of disjoint cliques), we have an arbitrary bipartite graph, and given a pair of vertices, the oracle tells you if there is an edge between them.

This model may seem a bit counter-intuitive at first since the graphs may not correspond to any pair of strings; and indeed other models have been considered before [UAH76][AKO10]. But it turns out that this model is well-motivated by the NC-SETH lower bound (see discussion below).

 

The ultimate hardness hypothesis

What does the n^{3/2} query complexity lower bound mean for algorithms on actual strings? Instead of a black box oracle, our algorithms have access to an NC-circuit that implements it. Intuitively, we don’t know how to do very much with white box circuits, so it seems plausible to hypothesize that the running time will be lower bound by the query complexity. In some sense, this is a special case of the following ultimate hardness hypothesis that unifies a lot of the computational hardness assumptions that we like to assume but have no idea how to prove (e.g. P!=NP, P!=BQP, NC-SETH, FP!=PPAD, etc):

[Ultimate Hardness Hypothesis] For every problem, the white-box computational complexity is lower bounded by the black-box query complexity.”

In communication complexity similar statements are known and are called simulation/lifting theorems (see e.g. Mika’s thesis). For computational complexity, there are obvious counter examples such as “decide if the oracle can be implemented by a small circuit”. So it only makes sense to continue to assume the ultimate hardness hypothesis for “reasonable problems” instead of “every problem”.

But Burhman et al. identify the following variant of the ultimate hardness hypothesis which I find very interesting. It is defined with respect to a function T:\{0,1\}^{2^n} \rightarrow \{0,1\} which takes as input the truth-table of a circuit and outputs True or False. Roughly, they hypothesize that:

[Burhman et al.-QSETH, paraphrased] For every T, deciding if T is true or false is as hard whether we’re given the actual circuit, or only the guarantee that the oracle is implemented by a small circuit”

At a first read, I thought that arguments a-la impossibility of obfuscation [BGI+01] should refute this hypothesis, but a few weeks later I still don’t know how to prove it. Do you?

 

A tip for the upcoming holidays

During my postdoc, I worked on the quantum query complexity of ED/LCS with Shalev Ben-David, Rolando La Placa, and John Wright. I was a bit bummed to find out that we got scooped by Buhrman et al, but I know of at least 3 other groups that were also scooped by the same paper, so at least we’re in good company 🙂

At the time, a fellow postdoc from Psychology asked me what I was working on. I resisted the temptation to try to explain the various quantum variants of NC-SETH, and instead told him I was working on “DNA sequencing with quantum computers”. His reaction was priceless. Regardless of what you’re actually working on, try this line during the holidays when your relatives ask you about your work.

 

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: