This is the list of papers we read:
- Lamport Time Clocks
 - Spanner: Google’s Globally-Distributed Database
 - The Chubby Lock Service for Loosely-Coupled
 - A note on distributed computing
 - The Byzantine Generals Problem
 - Your computer is already a distributed system. Why isn't your OS?
 - How Complex Systems Fail
 - Fast and Message-Efficient Global Snapshot Algorithms for Large-Scale Distributed Systems
 - Automatic Management of Partitioned, Replicated Search Services
 - Simple Testing Can Prevent Most Critical Failures
 - Dynamo: Amazon’s Highly Available Key-value Store
 - Wait-free coordination for Internet-scale system
 - Brewer’s Conjecture and the Feasibility of Consistent, Available, Partition-Tolerant Web Services
 - SEDA: An Architecture for Well-Conditioned, Scalable Internet Services
 - Kafka: a Distributed Messaging System for Log Processing
 - DistributedLog: A high performance replicated log service
 - The Log: What every software engineer should know about real-time data's unifying abstraction
 - Social Hash: an Assignment Framework for Optimizing Distributed Systems Operations on Social Networks
 - Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications
 - MapReduce: Simplified Data Processing on Large Clusters
 - Mesos: A Platform for Fine-Grained Resource Sharing in the Data Center
 - MillWheel: Fault-Tolerant Stream Processing at Internet Scale
 - Snowflake - Unique ID Generation. “No two snowflakes are alike.”
 - The Hadoop Distributed File System
 - Gorilla: A Fast, Scalable, In-Memory Time Series Database
 - Implementing Fault-Tolerant Services Using the State Machine Approach: A Tutorial
 - Meltdown
 - Spectre Attacks: Exploiting Speculative Execution
 - Communicating Sequential Processes
 - The Tail at Scale
 - Consistent Hashing and Random Trees: Distributed Caching Protocols for Relieving Hot Spots on the World Wide Web
 - Dapper, A Large Scale Distributed Systems Tracing Infrastructure
 - The many faces of consistency
 - SDPaxos: Building efficient semi-decentralized geo-replicated state machines
 - Dataflow Model
 - How to read a paper
 - Jupiter Rising: A Decade of Clos Topologies andCentralized Control in Google’s Datacenter Network
 - Characterizing, Modeling, and Benchmarking RocksDB Key-Value Workloads at Facebook
 - Harvest, Yield, and Scalable Tolerant Systems
 - Lineage stash: fault tolerance off the critical path.
 - F1 Query: Declarative Querying at Scale
 - The Architectural Implications of Facebook’s DNN-based Personalized Recommendation
 - EFLOPS: Algorithm and System Co-design for a High Performance Distributed Training Platform
 
Other papers mentioned but not discussed:
Updated 2018-05-31: added additional papers
Updated 2018-06-20: added additional papers. Linkified a few more
Updated 2018-10-06: added additional papers. Linkified a few more
Updated 2019-07-10: added additional papers. Linkified a few more
Updated 2022-04-05: added additional papers. Linkified a few more
Updated 2018-06-20: added additional papers. Linkified a few more
Updated 2018-10-06: added additional papers. Linkified a few more
Updated 2019-07-10: added additional papers. Linkified a few more
Updated 2022-04-05: added additional papers. Linkified a few more