Introduction. For nearly a century, error-correcting codes (ECCs) have been used for allowing communication even when the used communication channel is corrupted by noise. Beyond communication, error-correcting codes have found a variety of other uses, from multiclass learning to even showing hardness of approximation. As such, understanding the rich world of error-correcting codes is essential for progress in all of these domains.

In our setting, imagine Alice wants to send a message to Bob of length $k$, but the channel between them is corrupted by noise. To overcome this, Alice uses an encoder to turn her message into a longer, redundant string of length $n$. Then, Bob receives this transmission and uses a decoder to (hopefully) recover the original message of length $k$. Two important properties are the rate $k/n$ of the code (essentially what fraction of the transmission is “information”) and the bit error rate (BER) which is the (expected) number of decoding errors divided by $k$. Desirable properties are to make the rate as large as possible and the BER as small as possible.

Since the days of Claude Shannon, many error-correcting codes have been discovered, such as Reed-Solomon codes and BCH codes. Each error-correcting code has its own tradeoffs (e.g., some have higher rate, some are more resistant to special kinds of channel corruptions, etc.). With the large number of ECCs which have been discovered, it can sometimes be overwhelming what the proper error correcting code is for a given application. Further, if the application is sufficiently specialized there may be no known ECC which meets your needs. Such concerns motivate the automation of error correcting codes, which is the main topic of this blog post.

I’m using the word “automation” to cover a variety of tasks which various computational methods could assist with in the study of ECCs:

1. Existence — Does the code I want even exist?
2. Encoding — What is the “best” way to convert my messages into a code?
3. Decoding — How do I recover from noisy transmissions?
4. Verification — Is the proposed ECC design provably correct?
5. Selection — Which ECC from a given class should I use for a given application?

Each of these facets of the automation of ECCs is a whole field of research! In this and the subsequent post, I will discuss at a high level two types of techniques which have been used to approach these questions: “Formal Methods” and “Machine Learning.” We’ll cover formal methods in this post, and in the next post we will cover machine learning methods.

Formal Methods. The field of Formal Methods strives to give provable guarantees for various computational questions by reducing them to formal logic. Although formal methods are mostly used for software and hardware verification (that is, making sure they are “bug free”), such tools are also used by mathematicians to show the validity of mathematical statements that would be difficult to prove by hand. For example, the Kepler conjecture, a question of what is the best way to pack spheres in three dimensions–essentially finding an optimal error-correcting code in Euclidean space–was only firmly proved by Thomas Hales and his team through the use of automated theorem-proving tools.

A line of work for using formal methods to directly construct practical ECCs was initiated by Shamshiri and Cheng in 2010. In their work, they are motivated by designing error-correcting codes for static random-access memory (SRAM), the kind that is often used for CPU caches. When using SRAM (pictured below), a worry is that cosmic rays could hit some of the bits, causing them to flip. Further, it is not uncommon for a group of consecutive bits to flip. As such, it is desirable to esure that the error correcting code can correct either $g$ global errors or $l$ local errors. Correcting global errors is a common property of error correcting codes, such as BCH codes or the Golay code. However, local error correction is a much less common property to guarantee. Thus, the authors use a SAT solver to construct error correcting codes with the properties they desire.

Static random-access memory (source: Wikipedia)

Assuming that the code to be construction is linear (the encoding map is a linear function over the field $\mathbb F_2$), then the error correcting code can be described by a parity check matrix M in ${0,1}^{(n-k) \times n}$. The key observation the authors make is that for M to be a proper error correcting code, every error pattern $p$ (i.e., vectors with hamming weight at most $g$ or consecutive errors in a block of length $l$) must have $Mp$ be a distinct vector. For instance  $M(0, 1, 0, 0, 1) \neq M(1,1,1,0,0)$ if $g = 2$ and $l = 3$.

These Boolean constraints can be expressed in conjunctive normal form, i.e., a SAT instance. As such a SAT-solver can be used to determine if there exists a matrix M with the given properties for a given k and n. For instance, they are able to find an error correcting code with parameters $k = 16, n = 26, g = 2$ and $l = 4$. In their follow-up work [Shamshi, Ghofrani, Cheng, 2011], they use this error-correcting code for modeling an “on-chip network” between CPU cores in a multi-core processor.

Another line of work led by Ben Curtis (see the survey by Curtis, Kotsireas, and Ganesh) has been seeking to construct ECC-like combinatorial objects. An example of such an object is a Hadamard matrix: a square matrix with $\pm 1$ entries such that every row and column is orthogonal in $\mathbb R^n$. In fact, the authors search for a special type of Hadamard matrix made up of a quartet Williamson matrices which have an intricate algebraic structure. They find these objects by using an algorithm which goes back-and-forth between a SAT solver with a CAS (computer algebraic system) to help narrow the search space.

Formal Methods have further applications in error-correcting codes for distributed cloud storage and value-deviation-bounded codes.

This concludes our first post. In the next post, we discuss machine learning methods.