Saltzer and Kaashoek (2009). Principles of Computer System Design: An IntroductionMon Feb 15 2021
This week we read the Saltzer and Kaashoek (2009) textbook, which explained atomicity, consistency, locking, two-phase commits, and distributed two-phase commits. Song Tianyi was in charge of explaining this week (click the link for his notes).
I've marked things I don't know with
and bits that still need work with
Please help me clarify or fill them in!
9.1.5 Before-or-After Atomicity
What is sequence coordination?
not sure about this
Sequence coordination ensures some logical order/ semantics of the object, e.g. you cannot dequeue before enqueuing.
What is before-or-after atomicity?
Concurrent actions possess before-or-after atomicity if their effect from the point of view of the invokers is the same as if the actions occurred either completely before or completely after one another.
What's the difference between sequence coordination and before-or-after atomicity?
What's the relationship between before-or-after atomicity and serialisability? What's the difference?
A history H is serialisable if and only if a concurrent history can be rearranged into some (any) serial ordering. "Informally, serializability means that transactions appear to have occurred in some total order."
Before-or-after atomicity implies serialisability. Consider any set of concurrent actions. By before-or-after atomicity (BAA) they must appear to have occurred either completely before or completely after one another. But that directly implies that there exists some serial ordering of these concurrent transactions, which is exactly what we need for serialisability.
What's the difference between consistency models and these atomicity guarantees?
There is another atomicity guarantee called all-or_nothing atomicity. You need before-or-after and all-or-nothing atomicity in order to get serialisability?
I think before-or-after atomicity implies all-or-nothing atomicity. This is because if all-or-nothing atomicity was violated, it would be possible for half of a transaction to fail and half of a transaction to succeed... so what? is that OK as long as both halves lie before or after other transactions?
What is the fundamental definition of atomicity?
Both atomicity guarantees can be thought of as what we really care about:
An action is atomic if there is no way for a higher layer to discover the internal structure of its implementation.
9.1.6 Correctness and serialisation
The textbook gives this general concept of correctness:
if any result is guaranteed to be one that could have been obtained by some purely serial application of those same actions.
This is serialisability!
Before-or-after atomicity leads directly to this concept of correctness
If all transactions are before-or-after atomic, then they must form some purely serial ordering. And if they form some purely serial ordering, this is the definition of serialisability.
Note that serialisabilty places no constraints on the total order, so transactions may appear to execute in any (serial) order. There are correctness requirements stricter than serialisability: we walk through two here.
There are correctness requirements that are stronger than serialisability: external time consistency, or sequential consistency
External time consistency (that transactions must appear to happen in the order that they were called) is equivalent to strict serialisability. Note that strict serialisability is not quite the same (is stricter than) linearisability:
We can view strict serializability as linearizability plus the multi-object transactions of serializability. But in another sense, linearizability is strict serializability with the constraint that transactions are constrained to act on a single object source
What is sequential consistency?
Sequential consistency is stronger than serialisability. It requires that transactions must appear to have occurred in some order, and that transactions must appear to have occurred in order on a per-process basis.
For example, this concurrent history
B READ x A READ x B WRITE x A WRITE x
would be serialisable but not sequentially consistent if the following execution order was permitted:
A WRITE x A READ x B WRITE x B READ x
This respects a serial ordering, but is not sequentially consistent because process A needs to read before writing.
What's the difference between external time consistency and sequential consistency?
External time consistency imposes a time order on all different processes. In sequential consistency all transactions within a process must run sequentially. But there's no requirement for real-time order between processes.
9.5.2 Simple locking
We next consider how to maintain correctness. One way is locking. We first consider simple locking. There's a lock manager that manages the locks. Each transaction has to acquire all the locks that it will need before starting the transaction. The transaction has what is called a lock point: the instant at which it has acquired all of its locks. It may release its locks only after the transaction installs its last update and commits or completely restores its data (aborts).
A lock manager can enforce simple locking by requiring that each transaction supply its intended lock set to the function call.
Simple locking can cause deadlocks
Consider the following scenario:
Mitigate this by assigning each lock a unique ID and mandate that each process tries to acquire each lock in order.
Simple locking is not completely efficient
However, simple locking might not be efficient because it needs to:
- acquire locks on every shared object that it might need to read or write, even if it doesn't need it in the end;
- acquire locks on all objects at the start, even if the transaction is long-running and it may only need the lock towards the end.
So there are some more optimistic approaches. One of them is two-phase locking, which we'll cover next.
9.5.3 Two-phase locking
Two-phase locking improves upon simple locking in two key (ha ha) ways: rather than forcing a transaction to acquire all the locks before it begins, it allows the transaction to acquire locks as it proceeds. The transaction may read or write a data object only after it acquires a lock on that object.
Second, instead of forcing a transaction to release all locks at the end, the transaction may release a lock on an object that it only reads if it will never need to read that object again, even to abort. (Note that it can't release a lock on an object that it writes, because it would be impossible to roll back!)
This is called two-phase locking because the number of locks requested by a transaction increases monotonically at the start, then decreases monotonically.
However, two-phase locking still has its inefficiencies...
How can we handle system recovery with locks?
There are two interactions between locks and logs that require thought: 1) individual transactions that abort, and 2) system recovery.
Aborts are easy, since all aborting transactions must restore their changed data objects to their original values before releasing any locks.
Are volatile locks (locks in memory) fault-tolerant?
What happens if you have processes holding different sets of locks and then they crash? How do you make sure that reconstructing the new state from the logs ensures the correct (serialisable) outcome?
Two-phase locking often results in deadlocks
Two-phase locking results in deadlocks but simple locking does not result in deadlocks (why?) as long as you index locks in order. (Prove this -- also mentioned in GFS)
9.6.2 Two-phase commit
Go through this myself
9.6.3 Multiple-Site Atomicity: Distributed Two-Phase commit
In Chapter 4 we learned of the Remote Procedure Call method and in Chapter 7 we learned how to design an RPC with a persistent sender. Now we combine a two-phase commit protocol with persistent senders, duplicate suppression, and single-site transactions to build a distributed two-phase commit protocol.
Correctness of multiple-site atomicity is when all the sites commit OR all the sites abort; we will have failed if some sites commit their part of a multiple-site transaction while others abort their part of that same transaction.
What is the procedure for 2PC?
First phase: Alice the coordinator assigns
its child jobs to its slaves:
PREPARE TO COMMIT X.
Bob, Charlie and Dawn PREPARE
to commit X (commit tentatively).
They acquire and don't release the locks.
(or abort, if they can't):
I HAVE PREPARED TO COMMIT X or
I HAVE ABORTED X.
Second phase: Only after receiving
all tentative commits,
or any least one abort,
will Alice choose to abort the entire
transaction or tell a different worker site
to carry its component transaction.
Alice marks her own outcome record
COMMIT back to Bob, Charles and Dawn.
Each worker site, upon receiving such a message,
changes its state from
Only three messages are needed: one message to tell the worker to prepare the commit, one message to tell the master that it's prepared, and one message to tell the worker to commit.
Bob, Charlie and Dawn can lose messages arbitrarily, so Alice will never know if they have received her message to commit. However, since Bob, Charlie and Dawn are persistent senders (they keep asking if they haven't received a commit instruction), Alice can be confident that the workers will commit their transactions eventually.
What happens when the coordinator crashes?
What happens when a worker crashes?
There are four places a worker can crash:
- Before it receives the first phase message
- After it receives the first phase message, but before it sends the prepared message
- After it sends the prepared message, but before receiving a commit message
- After receiving a commit message
Only state 3 is nontrivial: recovery procedure must reacquire all locks the PREPARED transaction was holding at the time of the failure. The recovery procedure must also restart the persistent sender.
What are the tradeoffs of 2PC?
2PC gives up availability: nodes can block forever if the coordinator goes down.
The Spinnaker paper mentions a few other tradeoffs: Firstly, the failure of a single node will lead to an abort. Secondly, it suffers in performance: a typical implementation of 2PC requires 2 disk forces and 2 message delays. Lastly, it also blocks when the coordinator fails.
Why must the coordinator must hold completion states of all transactions indefinitely?
The Two Generals problem and impossibility theorem
The Two Generals problem is an impossibility theorem that states that it is impossible for two nodes to reach consensus over a lossy network. There's a lovely formalisation of the problem and the impossibility proof here.
Given the Two Generals impossibility theorem, how does 2PC work?
You'd think the two generals problem is very similar to the two-phase commit protocol, so how come 2PC works despite the impossibility result?
Bernard thinks it has to do with the notion of bounded vs unbounded time. I think that's true, but it might be more useful to think of it more as blocking vs non-blocking: we can't decide
There's a really nice blog post on 2PC by Henry Robinson here. I'll have to read this post (and the rest of his blog!) much more closely, but I think here it solves the consensus problem but a node failure might block indefinitely.
Follow-up: what is the relationship between the FLP and the Two Generals Problem?
When would you want to use 2PC?
If you had independent databases with very non-atomic transactions e.g when you buy something, you have child transactions of contacting the credit card company, decrementing the stock, adding loyalty points, etc. And you want to either commit or roll all of this back atomically.
Should one use 2PC or Raft?
They are solving different problems: Raft solves the problem of replication, 2PC solves the problem of coordination. We use 2PC when we want different computers to do different things. We use Raft when we want to replicate the same state over different machines.
What is the relationship between 2PC, 3PC, and Raft?
The major events in the development of consensus protocols can be summarised below, in hugely oversimplified form:
- Jim Gray (amongst others) proposes 2PC in the 1970s. Problem: blocks on failure of single node even in synchronous networks.
- Dale Skeen (amongst others) shows the need for an extra, third phase to avoid blocking in the early 1980s. 3PC is an obvious corollary, but it is not provably correct, and in fact suffers from incorrectness under certain network conditions.
- Leslie Lamport proposes Paxos, which is provably correct in asynchronous networks that eventually become synchronous, does not block if a majority of participants are available (so withstands faults) and has provably minimal message delays in the best case.
FOLLOW-UP: be clear about the difference between synchronous and asynchronous networks
FOLLOW-UP: add a point number 4. talking about Raft