My main interest is to study consensus, but we’ll see where that takes me.
Despite this book being nearly 14 years old, I do believe in the foresight of traditional literature. What was discovered and understood in the past can provide insight in deciphering current day issues. Such is definitely true in the case of distributed systems. My main motivation for studying distributed systems and its related technologies (networks, database systems, etc.) has come to be through my initial exposure to blockchain. The recent move towards academia and formally proving the properties of large scale public blockchains (underlying consensus and Sybil control stacks) due to more university involvement has served as my motivation as I dive deep into the traditional academic literature that underlies the space.
The goal of parallel processing: employ all processors to prerform one large task. Processors may be on the same chip or local-area cluster of workstations, to the Internet.
A means to share resources: e.g. printer at home or in the office.
Lack of adherence to standards, heterogeneous hardware and software.
Three facctors/fundamental difficulties:
Framework for specifying algorithms and comparing their performance: for distributed systems. Identify fundamental problems in distributed systems, state them precicely, and design and analyze efficient algorithms to solve them. Prove optimality.
No only time and (local) space:
Increased scope for “negative” results, lower bounds, and impossibility results (scary!). Like showing that something is NP-complete.
Maneuverability in distributed systems: we can change the rules. e.g. posing slightly weaker problem statement or to build stronger guarantees into system.
Essence of distributed systems paradigm (since the 1970s) has been caring about computability issues rather than complexity issues.
Earliest example of dist system: operating system time-sharing a single CPU. Virtual concurrency: mutual exclusion, deadlock detection and prevention.
MIMD with shared memory (tightly coupled) are called multiprocessors – multiple separate hardware running common software. Connected by bus or switching network. MIMD can also be loosely coupled and not have shared memory e.g. workstations on LAN.
Looser: autonomous hosts connected by network: Internet, LAN. Separate hardware running separate software, though entities interact through well-defined interfaces (Internet stack).
**compare with PoS synchrony article *******
Two main timing models: synchrnous, asynchronous
Main complexity measures: number of messages and time
Simple algorithms: boradcast/collect information, construct spanning trees
System or algorithm consists of n processors
Configuration is a vector of states of all processors. State of all processos’ $outbuf$s are current messages in transit.
Any sequence satisfying all safety conditions is an execution and any sequence satisfying also all liveness conditions is admissible
Asynchronous algorithm: algorithm independent of any particular timing parameters e.g. no upper bound on timing.
An execution segment $\alpha$:
If eexecution segment finite, then:
For each execution segment, associate a schedule: sequence of events in the execution. Not every sequence of events is a schedule for every initial configuration: e.g. messages not valid in certain configurations.
$exec(C_0, \sigma)$ starts with initial configuration and also schedule segment.
Execution admissible if each processor has an infinite number of computation events and every message sent is eventually delivered. Infinite computation events models processors do not fail, not that there is infinite loop. “Termination” (informal notion) can mean that transition function does not change the processor’s state e.g. takes dummy steps. Schedule admissible if it is the schedule of an admissible execution.
Lockstep generally not achievable in practical distributed systems, but convenient.
Once algo designed for ideal timing model, can be automatically simulated to work in other more realistic timing models.
Execution admissible if it is infinite. Due to round structure: every process takes an infinite number computation steps, and every message sent is eventually delivered.
In a synchronous system with no failures, once the algorithm is fixed, the only relevant aspect of executions that can differ is the initial configuration. In an asynchronous system, there can be many different executions of the same algorithm, even with the same initial configuration and no failures, because the interleaving of processor steps and message delays are not fixed.
Number of messages and amount of time.
Termination: when all processors are in terminated states and no messages are in transit. Admissible execution is still infinite, but every processor making dummy steps.
Message complexity: maximum of the total number of messages sent, over all admissible executions of the algorithm