The coming Friday we will have the third TOCA-SV meeting in a bit more than a year (here are the details for the first and second meetings). These meetings bring together theoreticians from universities and industry around the Silicon Valley. We welcome and encourage everybody to attend, and have a great program for you.

The event will take place in the Mackenzie Room at the Jen-Hsun Huang Engineering Center (quite close to the CS department). More details, including parking can be found here.

Program:

10:00-10:30 Gathering and coffee

11:30-12 Clément Canonne, Stanford
12:30-1:30pm lunch
2-3 Students’ short talks
3-3:30 break
4:00-5:30 happy hour
Clément Canonne Testing Conditional Independence of Discrete Distributions
We study the problem of testing conditional independence for discrete distributions. Specifically, given samples from a discrete random variable $(X, Y, Z)$ on domain $[\ell_1]\times[\ell_2] \times [n]$,
we want to distinguish, with probability at least $2/3$, between the case that $X$ and $Y$ are conditionally independent given $Z$ from the case that $(X, Y, Z)$ is $\epsilon$-far, in $\ell_1$-distance, from every distribution that has this property. Conditional independence is a concept of central importance in probability and statistics with a range of applications in various scientific domains. As such, the statistical task of testing conditional independence has been extensively studied in various forms within the statistics and econometrics communities for nearly a century. Perhaps surprisingly, this problem has not been previously considered in the framework of distribution property testing  and in particular no tester with sublinear sample complexity is known, even for the important special case that the domains of $X$ and $Y$ are binary.

The main algorithmic result of this work is the first conditional independence tester with \emph{sublinear} sample complexity for
discrete distributions over $[\ell_1]\times[\ell_2] \times [n]$.
To complement our upper bounds, we prove information-theoretic lower bounds establishing that the sample complexity of our algorithm is optimal, up to constant factors, for a number of settings (and in particular for the prototypical setting when $\ell_1, \ell_2 = O(1)$).

Joint work with Ilias Diakonikolas (USC), Daniel Kane (UCSD), and Alistair Stewart (USC).