Machine learning on big data sets takes a significant amount of computational power, so it is often necessary to offload some of the work to external distributed systems, such as an Amazon EC2 cluster. It is useful to be able to utilize external resources for computation tasks while keeping the actual data private and secure. In particular, matrix multiplication is an essential step in many machine learning processes, but the owner of the matrices may have reasons to keep the actual values protected.

In this post, we’ll discuss four works about secure distributed computation. First, we’ll talk about a method of using MDS (maximum distance separable) error correcting codes to add security and privacy to general data storage (“Cross Subspace Alignment and the Asymptotic Capacity of X-Secure T-Private Information Retrieval” by Jia, Sun, Jafar).

Then we’ll discuss method of adapting a coding strategy for straggler mitigation (“Polynomial codes: an optimal design for high-dimensional coded matrix multiplication” by Yu, Qian, Maddah-Ali, Avestimehr) in matrix multiplication to instead add security or privacy (“On the capacity of secure distributed matrix multiplication” by Chang, Tandon and “Private Coded Matrix Multiplication” by Kim, Yang, Lee)

Throughout this post we will use variations on the following communication model:

The data in the grey box is only given to the master, so workers only have access to what they receive (via green arrows). Later on we will also suppose the workers have a shared library not available to the master. The workers do not communicate with each other as part of the computation, but we want to prevent them from figuring out anything about the data if they do talk to each other.

This model is related to private computation but not exactly the same. We assume the servers are “honest but curious”, meaning they won’t introduce malicious computations. We also only require the master to receive the final result, and don’t need to protect any data from the master. This is close to the BGW scheme ([Ben-Or, Goldwasser, Wigderson ’88]), but we do not allow workers to communicate with each other as part of the computation of the result.

We consider unconditional or information-theoretic security, meaning the data is protected even if the workers have unbounded computational power. Furthermore, we will consider having perfect  secrecy, in which the mutual information between the information revealed to the workers and the actual messages is zero.

## X-Secure T-Private Information Retrieval

Before we get into matrix-matrix multiplication, consider the problem of storing information on the workers to be retrieved by the master, such that it is “protected.” What do we mean by that? [Jia, Sun, and Jafar ’19] define X-secure T-private information retrieval as follows:

Let $W_1,...,W_{K}$ be a data set of messages, such that each $W_i$ consists of $L$ random bits. A storage scheme of $W_{1},...,W_{K}$ on $N$ nodes is

1. X-secure if any set of up to $X$ servers cannot determine anything about any $W_i$ and

2. T-private  if given a query from the user to retrieve some data element $W_{\theta}$, any set of up to $T$ users cannot determine the value of $\theta$.

[Jia, Sun, and Jafar ’19]

Letting $Q_{1}^{[\theta]},...,Q_{N}^{[\theta]}$ be the set of queries sent to each node and $S_1,...,S_N$ be the information stored on each node (all vectors of length L), we depict this as:

The information theoretic requirements of this system to be correct can be summarized as follows (using notation $S_{[1:N]}$ for set $S_1,...,S_N$):

Given these constraints, Jia et al. give bounds on the capacity of the system. Capacity is the maximum rate achievable, where rate is defined as bits requested by the worker ($L$, the length of a single message) divided by the number of bits downloaded by the worker. The bounds are in terms of the capacity of T-Private Information Retrieval, (which is the same as the above definition, with only requirement 2).

If $N \leq X+T$ then for arbitrary $K$, $C_{XSTPIR}(N,K,X,T) = \frac{N-X}{N}C_{TPIR}(N-X,K,T) = \frac{N-X}{NK}$.

$\displaystyle C_{XSTPIR}(N,K,X,T) \leq \left(\frac{N-X}{N}\right) C_{TPIR}(N-X,K,T)$

When $N\leq X+T$: $\displaystyle C_{XSTPIR}(N,K,X,T) = \left(\frac{N-X}{N}\right) C_{TPIR}(N-X,K,T) = \frac{N-X}{NK}$

$\displaystyle\lim_{K\to \infty} C_{XSTPIR}(N,K,X,T) = \lim_{K\to \infty} \left(\frac{N-K}{N}\right) C_{TPIR}(N-X,K,T) =\begin{cases} 1-(\frac{X+T}{N}) & N>X+T \\ 0 & N\leq X+T \end{cases}$

[Jia, Sun, and Jafar ’19]

Jia et al. give schemes that achieve these bounds while preserving the privacy and security constraints by introducing random noise vectors into how data is stored and queries are constructed. The general scheme for $N>X+T$ uses cross subspace alignment, which essentially chooses how to construct the stored information and the queries such that the added noise mostly “cancels out” when the master combines all the response from the servers. The scheme for $N\leq X+T$ is straightforward to explain, and demonstrates the idea of using error correcting codes that treat the random values as the message and the actual data as the “noise.”

For this scheme, the message length $L$ is set to $N-X$ (the number of nodes $N$, minus the maximum number of colluding servers $X$). First, we generate $K$ random bit vectors of length $X$:

Next, apply an $(N,X)$ MDS code to $Z_1,...,Z_K$ to get $\bar{Z}_1,..,\bar{Z}_K$, which are $K$ encoded vectors of length $N$:

For our data $W_1,...,W_K$, we pad each vector with zeros to get $\bar{W}_1,..,\bar{W}_K$ of length $N$:

Now that the dimensions line up, we can add the two together and store each column $n$ at the $n^{th}$ node:

To access the data, the user downloads all $NK$ bits. The length $N$ string downloaded from  row $i$ can be used to decode $\bar{Z}_i$: $\bar{W}_{L+1},\bar{W}_{L+2},...,\bar{W}_{N}$ are all zero, so columns $L+1 = N-X+1$ through $N$ have the values of $\bar{Z}_{i,N-X+1},...,\bar{Z}_{i,N}$. This gives the user $X$ values from the MDS code used on each row, so they can decode and get $Z_1,...,Z_K$ and $\bar{Z}_{1},...,\bar{Z}_K$. Then a subtraction from the downloaded data gives $W_1,...,W_{K}$. Because of the MDS property of the code used to get $\bar{Z}_1,...,\bar{Z}_K$, this scheme is X-secure and because the user downloads all bits, it is T-private.

## Matrix Multiplication with Polynomial Codes

We now move on to the task of matrix-matrix multiplication. The methods for secure and private distributed matrix multiplication we will discuss shortly are based on polynomial codes, used by [Yu, Maddah-Ali, Avestimehr ’17] for doing distributed matrix multiplications robust to stragglers. Suppose the master has matrices ${\bf{A}} \in \mathbb{F}_q^{m\times n}$ and ${\bf{B}} \in \mathbb{F}_q^{n \times p}$ for some finite field $\mathbb{F}_q$, and $m,n,p \in \mathbb{Z}^+$. Assume $m$ and $p$ are divisible by $N$, so we can represent the matrices divided into $N$ submatrices:

${\bf{A}}= \begin{bmatrix}A_1\\ A_2 \\ \vdots \\ A_m\end{bmatrix}$ and ${\bf{B}}= \begin{bmatrix}B_1& B_2 & \dots &B_n\end{bmatrix}$

So to recover $\bf{AB}$, the master needs each entry of:

${\bf{AB}} = \begin{bmatrix}A_1B_1 & A_1B_2 & \dots & A_1B_n\\A_2B_1 & A_2B_2 & \dots & A_2 B_n\\\vdots & \vdots &\vdots &\vdots\\A_mB_1 &A_mB_2 & \dots & A_mB_n \end{bmatrix}.$

The key idea of polynomial codes is to encode $A_1,...,A_m$ and $B_1,...,B_n$ in polynomials $\tilde{A}_i$ and $\tilde{B}_i$ to be sent to the $i^{th}$ worker,  where they are multiplied and the result is returned. The goal of Yu et al. was to create robustness to stragglers, and so they add redundancy in this process so that not all workers need to return a result for the master to be able to determine $\bf{AB}$. In particular, only $mn$ returned values are needed, so $N-mn$ servers can be slow or fail completely without hurting the computation. This method can be thought of as setting up the encodings of $\bf{A}$ and $\bf{B}$ so that the resulting multiplications $\tilde{C}_1=\tilde{A}_1\tilde{B}_1,...,\tilde{C}_N=\tilde{A}_N\tilde{B}_N$ are evaluations of a polynomial with coefficients  $A_1B_1,A_1B_2,...,A_2B_1....,A_mB_n$ at $N$ different points — equivalent to a Reed-Solomon code.

This idea is adapted by [Chang, Tandon ’18] to protect the data from colluding servers: noise is incorporated into the encodings such that the number of encoded matrices required to determine anything about the data is greater than the security threshold $X < N$. Since the master receives all $N$ responses it is able to decode the result of $\textbf{AB}$, but no set of $X$ nodes can decode $\textbf{A}$, $\textbf{B}$, or $\textbf{AB}$. Similarly, [Kim, Yang, Li ’19] adapts this idea to impose privacy on a matrix-matrix multiplication: workers are assumed to have a shared library $\{{\textbf{B}}_i\}_{i=1}^M$, and the user would like to multiply ${\textbf{A}}{\textbf{B}}_{D}$ for some $D \in[1:M]$ without revealing the value of $D$ to the workers. The workers encode the entire library such that when the  encoding is multiplied by an encoded input $\tilde{A}$ from the master, the result is useful to the master in decoding ${\textbf{A}}{\textbf{B}}_D$.

Chang and Tandon consider the following two privacy models, where up to $\ell$ servers may collude. The master also has $K^{(A)}$ (and in the second model, $K^{(B)}$), which are matrices of random values with the same dimensions as $\textbf{A}$ (and $\textbf{B}$). These are used in creating the encodings $\tilde{A}_i$ (and $\tilde{B}_i$).

$\textbf{B}$ is public, $\textbf{A}$ is private:

Both private:

Kim, Yang, and Lee take a similar approach of applying the method of polynomial code to private matrix multiplication. As before, there are $N$ workers, but now the master wants to multiply $\textbf{A}$ with some ${\textbf{B}}_D$ in shared library $\{{\textbf{B}}_i\}_{i=1}^M$ (all the workers have the shared library).

Since the master isn’t itself encoding $\{{\textbf{B}}\}_{i=1}^M$ it has to tell the workers how to encode the library so that it can reconstruct the desired product. This is done by having the master tell the workers what values of $\vec{y}$ they should use to evaluate the polynomial that corresponds to encoding each library matrix. We denote the encoding of the library done by each worker $i$ as the multivariate polynomial $h$ which is evaluated at $\{{\textbf{B}}\}_{i=1}^M$ and the node-specific vector $y^{(i)}_{[1:M]}$ to get the node’s encoding, $\tilde{B}_i$. The worker multiplies this with the encoding of $\textbf{A}$ it receives, $\tilde{A}_i$ and returns the resulting value $Z_i$. All together, we get the following communication model:

## Conclusion

As we’ve seen, coding techniques originally designed to add redundancy and protect against data loss can also be used to intentionally incorporate noise for data protection. In particular, this can be done when out-sourcing matrix multiplications, making it a useful technique in many data processing and machine learning applications.

References: