Notes on CRDTs

tags: draft programming computer science self study notes public 6.824 MIT distributed systems

Introduction

This week I read Shapiro et al. (2011). A comprehensive study of Convergent and Conflict-Free Replicated Data Types as part of self-study of 6.824 with NUS SoC folks. It may or may not be part of my goal to finish a Stanford CS degree in a year.

Overview

The big picture

A lot of the literature in distributed systems tries to solve the problem of consistency, which (roughly speaking) demands that all nodes must have the same state at every point in time.

Consensus algorithms like 2PC and Raft ensure consistency by guaranteeing that every transaction is either executed on all nodes, or not at all. This ensures that each node will have the same transactions in the same order. The idea is if every node starts from the same initial state, and each node runs the same set of (deterministic) transactions in the same order, then each node will have the same subsequent states. So this is the approach to solving consistency.

Of course we know from the famous CAP theorem that we give up availability as a result. We can try and get back availability by relaxing consistency into eventual consistency: mandating that all nodes must have the same state at some point in time. This is the approach favoured by Amazon's DynamoDB.

But even relaxing consistency to allow eventual consistency doesn't necessarily give us full availability. The biggest problem is that nodes could reach a "broken" or otherwise unrecoverable state if operations come in out of order. A simple example is a stack: obviously the state of the stack depends on the order of operations, and if the pop operations come before the push operations you're broken.

And even if our nodes never "break", their histories can diverge irreconcilably in a way that kind of breaks eventual consistency --- DynamoDB may be always writable (and thus fully available) with W = 1, but it has to return a set of possible states rather than a single canonical one. This is technically not eventual consistency, because obviously no single-threaded execution (non-concurrent history) would ever result in a set of states.

Wouldn't it be nice if we had full availability and (eventual) consistency? That's the promise of CRDTs:

replicas of any CRDT [eventually] converge to a common state that is equivalent to some correct sequential execution.

The key contract of CRDTs is: as long as you design your data structure in a specific way, it doesn't matter what network partitions you have, it doesn't matter in which order operations are applied, all nodes will eventually converge to the same state.

How does it do this? The idea is simple and genius. Recall that the various definitions of consistency are all about rearranging operations in some concurrent history (where calls can interleave) into a valid sequential history (where each call must immediately be followed by its response), subject to some other rules (follow process order, real-time order, etc.)

And the tricky bit is that in many data structures, the order of operations is important. The operations are not commutative.

If we have noncommutativity then getting a valid sequential history may be tough because you might violate an invariant.

This presents the two problems I mentioned earlier: firstly you must rearrange carefully so as not to violate the invariants of the object, and secondly every node must rearrange its operations in the exact same way

But of course, if your operations are commutative, then any arrangement of method calls is perfectly fine! No matter in what order you

set(1) set(3)

You can think of these things as existing in a consistency--availability Cartesian space, where the impossibility theorems constrain the possible spots in that space. Raft and 2PC are in one region: they guarantee strong consistency, but they're not available in the presence of partitions. Amazon's Dynamo with N=3 is in the middle; it (almost always) guarantees eventual consistency, and is (almost always) available. Dynamo with N=1 is further to the left: it's always writable, but forks more easily. And finally CRDTs are better than Dynamo with N=1N=1, fully available, _and _ always eventually consistent.

FAQ

What's the purpose of the compare function?

To map it to a monotonic join semilattice.

What's least upper bound?

What's a monotonic join semilattice?

Monotonic: you can only move "up" state, or don't move at all

a Join Semilattice is some ordered set where a LUB must exit. This ordered set is the set of all possible payload sate

What's a source pre-condition?

It's a "guard statement" that causes the query or update function to fail if a particular condition is not met.

What's a causal history?

The causal history of an object xix_i a set of all update operations applied to the object.

The operation can contain the snapshot or

Causal history op-based:

Downstream: executes after the source-local phase immediately at the source and asynchronously at all other replicas only if the downstream precondition is true

What's an "abstract state"? How does it differ from regular state?

Merge is idempotent and commutative by the properties of... messages may be lost, received out of order, or multiple times.

Why is it not possible to maintain a global invariant?

There is actually no non-negative counter. You can only enforce local invariants by definition

What's the importance of this paper:

The importance is that any data type that

What makes something a CRDT?

Query and add

"Only merge and compare must follow CRDT"

All merges must be idempotent , commutative, and associative to be a state-based CRDT.

What's a PN-Set? Do you need add and remove counter for each replica?

I think so.

Removing stuff is hard without synchronisation

When would you use CRDTs versus a more traditional synchronisation solution?

What's the process of designing or building a new CRDT?

  • I'll start with some primitives, LWW sets
  • Then compose higher level CRDTs on the primitives. E.g. 2P sets can be composed with grow-only sets.

Is it true that any CRDT can be represented as both state-based and op-based? Proof?

What are operational transforms?

What is quiescent consistency?

Quiescent consistency requires non-overlapping operations to appear to take effect in their real-time order, but overlapping operations might be reordered.

Pavel Shved's 2012 post on "Consistency Models Explained Briefly and this SO post are both very useful. Pavel's post needs a bit of background, but if you have it, it is a great post.

Further reading

In comparing well-known CRDTs representing sets that can growand shrink, we find caveats. In one, the removal of an element can-not be reliably undone. In another, undesirable states are attainable,such as when an element is present -1 times (and so must be addedfor the set to become empty). The first lacks a general-purposeundo, while the second acts less like a set and more like a tuple ofcounters, one per possible element.Using some group theory, we show that this trade-off is unavoid-able:every undoable CRDT is a tuple of counters.