Spanner is a large distributed relational database deployed within Google. Spanner is able to make good performance vs consistency tradeoffs by considering uncertainty and its necessity in determining correctness. This is done by creating an API that directly exposes clock uncertainty: Spanner calls this the TrueTime API.
Spanservers support many tablet data structures, each of which contains a mapping of a key and timestamp to a string. Timestamp is important and makes Spanner more like a multiversion database than a key value store like Bigtable.
Tablets are used in the Colossus distributed filesystem. Each tablet (bag of mappings) is replicated using its own Paxos state machine. Writes initiate the Paxos protocol at the leader, and reads access state directly from the underlying tablet at any replica that is sufficiently up to date(?)
Spanservers that are the leader of their Paxos group are required to implement a lock table for concurrency control. For this reason, it’s important to have a long-lived leader. Also, Spanner was designed with long-lived transactions in mind. The lock table lives only in the leader and is not replicated by Paxos, given how volatile it is.
Spanservers that are the leader of their Paxos group are also required to implement a transaction manager to support distriuted transactions between Paxos groups. Transaction manager can be bypassed if transactions only involve one Paxos group. But if multiple Paxos groups are required, thenthe groups’ leaders coordinate with a 2PC. The transaction manager is replicated by the underlying Paxos group.