micahlerner.com

Understanding Raft Consensus - Part 1

Published May 08, 2020

Found something wrong? Submit a pull request!

Discussion on Hacker News

Recently I was digging deeper into Raft, an important algorithm in the field of distributed systems. Raft is a consensus algorithm, meaning that it is designed to facilitate a set of computers agreeing on a state of the world (more on exactly how the state of the world is represented later), even when communications between the computers in the set are interrupted (say for example, by someone accidentally unplugging a network cable that connects some of the nodes to the majority).

The problem of reliably storing a state of the world across many computers, keeping the state in sync, and scaling this functionality is required in a number of modern systems - for example, Kubernetes stores all cluster data in etcd, a key-value store library that uses Raft under the hood.

Given how important (and nuanced) the algorithm is, I wanted to attempt to boil it down to its simplest possible components first, then followup with a deeper dive.

It’s worth noting that there are a wealth of resources about Raft. Some of my favorites are:

What’s novel about Raft?

As mentioned above, Raft is an algorithm designed to help computers synchronize state through a process called consensus, although it was not the first system designed to do so.

A main difference between Raft and previous consensus algorithms was the desire to optimize the design with simplicity in mind - a trait that the authors thought was missing from existing research.

In particular, Raft aimed to improve on Paxos, a groundbreaking but (the authors of Raft argue) somewhat complicated set of ideas for achieving distributed consensus.

To attempt to quantify the complexity of Paxos, the Raft authors conducted a survey at NSDI, one of the top conferences for distributed systems academics:

In an informal survey of attendees at NSDI 2012, we found few people who were comfortable with Paxos, even among seasoned researchers. We struggled with Paxos ourselves; we were not able to understand the complete protocol until after reading several simplified explanations and designing our own alternative protocol, a process that took almost a year.

Other engineers also documented difficultes productionizing Paxos. Google implemented a system based off of Paxos called Chubby and documented the “algorithmic and engineering challenges … encountered in moving Paxos from theory to practice. In their paper they note that, “Despite the existing literature on the subject [Paxos], building a production system turned out to be a non-trivial task for a variety of reasons”.

From the above commentary, it might seem that Paxos is a terribly complicated and near-impossible set of ideas to implement, although this isn’t entirely true. Some have argued that Raft trades off understability for a performance hit, although it is unclear whether this is true given the latest etcd benchmarks. For further reading on Paxos vs Raft, this paper is an interesting read.

At a high level, how does Raft work?

Now that we have some context about the why? of Raft, there are a few high level points that are important to understand about Raft:

With that context, we can start breaking Raft down into more concrete sections that try to answer questions about the protocol:

Given that this article is already fairly lengthy, I saved the three topics outlined above for the second part of the series, available here.

Follow me on Twitter or subscribe below to get future paper reviews. Published weekly.

Found something wrong? Submit a pull request!
Subscribe