In our previous post, we discussed the automation of error correcting codes and how formal methods are quite helpful toward this goal. In this post, we will discuss machine learning techniques as well as possible directions for future research.
Machine Learning. The use of machine-learning techniques to study ECCs has increased exponentially in recent years. The main strength of machine learning methods is that they can adapt well to a variety of conditions (c.f., this paper), unlike the codes produced by formal methods which are often designed for a rigid application. On the other hand, it is much more difficult to give any formal guarantees for error-correcting codes designed with machine learning, although such an obstacle may be overcome in the future (see the conclusion).
Error-correcting output codes. One interface between machine learning and error correcting codes which has been studied for decades is the construction of error correcting codes for multiclass learning. Multiclass learning is classifying a group of objects into a number of categories. Instead of trying to distinguish the categories all up front, one can instead try to do a binary classification of some subset of the categories from another subset of the categories.” If an error correcting code (called an error-correcting output code–ECOC) is used for the distinguishing, it can provide a more robust classification.
This technique was studied theoretically for a preset error-correcting code (e.g., [Guruswami, Sahai, 1999]). There are also works which synthesize an ECOC for multiclass learning. For example, the work of Pujol, Radeva, and Vitria  used a heuristic which tries to find tests which reveal the highest amount of information (subject to a prior that each class is a mixture of gaussians).
A related recent work [Xiang, Wang, Kitani, 2018] uses the technique of deep hashing to hash images so that images of a similar class have hashes which are close in Hamming distance, while images of different classes have hashes which are far in edit distance.
Deep Learning. The bulk of recent machine learning research on the automation of ECCs has been using Deep Learning techniques; that is, training a constructed deep learning network to either encode–given a message as input, output a robust encoding–or decode–given a noisy transmission, recover the original message. As deep neural networks pass messages from earlier neurons to deeper neurons, the commonly referred to benchmark is that of belief propagation.
In brief (see also Nachmani, et. al, 2016), the belief propagation (BP) algorithm starts by taking the parity-check matrix defining a given linear code and converts it into a bipartite graph–one side of the vertices are the symbols of the code and the other vertices are the parity checks. Each vertex starts with a prior (i.e., probability that the vertex should be a 0 or 1) based on the received transmission, and then messages are passed back and forth. In odd-numbered rounds, the symbols give their confidence levels to the parity checks, which then update probabilities based on being satisfied or not. In even-numbered rounds the parity checks inform the symbols if they should change their prior to better satisfy the parity checks. After some number of rounds, a decoding is deduced.
Some of the earlier papers using deep learning to automate ECCs [Nachmani, et. al, 2016, 2018; Crammerer, et.al., 2017; Gruber, et.al., 2017] tried to generalize a belief propagation algorithm for decoding ECCs. At a high level, the works of Nachmani, et.al. “unroll” the belief propagation into a deep neural network with many layers. Unlike BP which has a fixed weight for the value of each message in each stage, they have trainable weights which allow one to obtain a lower BER than plain BP. Both Nachmani, et.al., and the works of Crammerer, et.al., and Gruber, et.al., consider more complex neural architectures which use BP as a subroutine. The latter two works show success in these methods for polar codes and random linear codes. Note that all of these works are only training a decoder, as the encodings are fixed.
A later group of papers [Kim, et.al. 2020; Jiang, et.al., 2019a, 2019b, 2020] (see also their blog) introduce what is known as DeepCode. Unlike previous work, the initial DeepCode paper [Kim, et.al., 2020] builds both a custom encoder and decoder. A particular tool which the authors utilized is that of feedback–after each bit is transmitted the noisy received bit is sent back (possibly with more noise added). Intuitively, feedback allows for the encoder to have an approximate understanding of what the decoder doesn’t know, allowing for on-the-fly adjustments to the encoding. The encoder sends the original message in plain-text and then incorporates the feedback with RNN units to add two redundant bits per message bit (and thus the rate is one-third). The decoder incorporates the noisy plain text and these redundant bits using bidirectional-GRUs. Their methods can get a BER of as low as for a block length of 50 bits.
Follow-up work [Jiang, et.al., 2019a, 2019b, 2020] consider variants of the model. In [Jiang, et.al., 2019a], the encoder is fixed to be a turbo code (a type of convolutional code), a “standard” code in many applications, but the goal is to train a deep net to beat the standard decoder in a “non-Gaussian noise” model. The work [Jiang, et.al., 2019b] tries to beat the turbo code at its own game by constructing a turbo code-like deep net where a commonly used finite automata is replaced by a deep net (and a corresponding decoder is constructed as well). Finally, the work [Jiang, et.al, 2020] generalized [Jiang, et.al., 2019b] by also using feedback, like in DeepCode.
Conclusion and Future Directions. Although the technology is still in its infancy, the automation of error correction codes is an exciting domain which could have a variety of applications in the future. As we have seen in this post, both formal methods and machine learning allow for powerful methods for obtaining automating various aspects of ECCs, whether it be constructing error correcting codes with provable guarantees or designing non-standard decodes which are robust to a variety of conditions. Looking ahead, there are a number of exciting directions which could be ripe for theoretical and practical contributions.
- A very active field of research currently is DNA storage–that is, synthesizing DNA molecules which encode data (rather than biological life). The primary theoretical challenge of DNA storage is that data loss is no longer a “bit flip,” rather nucleotides can be inserted, deleted, replicated, etc. Very recently, convolutional codes have proved successful in this setting [Chandak, et.al., 2020], but an analogue to DeepCode does not currently exist in this setting. One barrier is that it is difficult for a fixed-architecture neural network to easily accommodate “index shifting” taking place during insertions and deletions. Automated techniques have been used to give lower bounds on how much redundancy is needed for such codes [Kulkarni, Kiyavash, 2013].
- One potential theoretical contribution in this space is understanding how these automation techniques relate to augmented learning. An example of augmented learning is the problem of picking the “best” algorithm from a class of algorithms for a given problem. A theoretical framework for understanding such questions has recently been developed by Gupta and Roughgarden. Such a framework seems natural in the context of error-correcting codes, as there are many different kinds of codes, and even within one family of codes, such as Reed-Solomon codes, there are many parameter choices whose optimal values can vary significantly from application to application.
- As a final thought, the use of “Formal Methods” in contrast to “Machine Learning” for automating ECCs has been largely separate. Recently, outside the context of ECCs, there have been a number of works (e.g., [Ehlers, 2017], [Raghunathan, Steinhardt, Liang, 2018]) which use formal methods to provably verify that a machine learning model such as a deepnet runs correctly under given conditions. It would be exciting if such methods could be extended to ECCs by showing that machine learning encoders/decoders like DeepCode can be utilized correctly.
Are there other directions for which you would like to see the automation of ECCs extended? If so, please leave a comment.
Acknowledgments. I would like to thank my quals committee, Aviad Rubinstein, Moses Charikar, and Mary Wootters for valuable feedback. I would also like to thank Sivakanth Gopi and Sergey Yekhanin for insightful discussion on the relationship between automation of ECCs and DNA storage as well as Ofir Geri for discussion on augmented learning.