micahlerner.com

Understanding Raft - Part 2 (Raft leaders, logs, and safety)

Published May 09, 2020

Found something wrong? Submit a pull request!

Discussion on Hacker News

This post is a continuation in the series I wrote about Raft, the first part of which is here. This post focuses on what underlies leader election, log replication, and Raft safety. Enjoy!

Leaders and leader election

The Raft protocol requires a single node (called the Leader) to direct other nodes on how they should change their respective states of the world. There can only be one leader at a time - Raft maintains a representation of time called a term. This term only changes in special situations, like when a node attempts to become a Leader. When a Raft cluster starts, there are no leaders and one needs to be chosen through a process called Leader Election before the cluster can start responding to requests.

How does the leader election process work?

A node starts the leader election process by designating itself to be a Candidate, incrementing its term, voting for itself, and requesting the votes of other nodes in the Raft cluster using RequestVote.

There are a few ways that a node can exit Candidate state:

Once a node becomes a Leader, it begins sending communications in the form of AppendEntries (discussed more in the next section) messages to all other peers, and will continue trying to do so unless it hears about a different Leader with a higher term (you may be wondering how Raft ensures that a Leader with an out of date state of the world doesn’t somehow acquire a higher term, but that topic is covered in the Safety section).

To allow Raft to recover from a Leader failing (maybe because of an ethernet unplugging scenario), an up to date Follower can kick off an election.


Raft States

I found the visualization in the margin (from the original Raft paper) to be helpful for thinking about the ways that a node can transition between the three possible states of Follower, Candidate, and Leader.

Raft logs and replication

What is an AppendEntries request and what information does it contain?

As mentioned above, Leader nodes periodically send AppendEntries messages to Follower nodes to let the Followers know that there is still an active Leader. These AppendEntries calls also serve the purpose of helping to update out of date Followers with correct data to store in their logs. The information that the leader supplies in the calls is as follows:

What happens when a peer receives an AppendEntries request?

Once a peer receives an AppendEntries request from a leader, it evaluates whether it will need to update its state, then responds with its current term as well as whether it successfully processed the request:

Raft Safety

At the core of Raft are guarantees about safety that make sure that data in the log isn’t corrupted or lost. For example, imagine that a Leader starts coordinating changes to the log, does so successfully, then goes offline. While the existing Leader is offline, a new Leader is elected and the system continues updating the log. If the old Leader were to come back online, how can we make sure that it isn’t able to rewind the system’s log?

To account for this situation (and all of the edge cases that can occur in distributed systems), Raft aims to implement several ideas around Safety. A few of these we’ve already touched on (descriptions are from Figure 3 of the original Raft paper):

The other important ideas around Raft Safety are:

Conclusion

If you’ve made it to the end, thanks for following along and until next time!

References

Follow me on Twitter or subscribe below to get future paper reviews. Published weekly.

Found something wrong? Submit a pull request!
Subscribe