Eitan Adler's thoughts: November 2016

Some months ago I started a reading group at my workplace focussed on distributed systems. The goal of the group was to be an informal meeting to discuss a mixture of high impact, historical, and modern papers.

This is the list of papers we read:

Lamport Time Clocks
Spanner: Google’s Globally-Distributed Database
The Chubby Lock Service for Loosely-Coupled
A note on distributed computing
The Byzantine Generals Problem
Your computer is already a distributed system. Why isn't your OS?
How Complex Systems Fail
Fast and Message-Efficient Global Snapshot Algorithms for Large-Scale Distributed Systems
Automatic Management of Partitioned, Replicated Search Services
Simple Testing Can Prevent Most Critical Failures
Dynamo: Amazon’s Highly Available Key-value Store
Wait-free coordination for Internet-scale system
Brewer’s Conjecture and the Feasibility of Consistent, Available, Partition-Tolerant Web Services
SEDA: An Architecture for Well-Conditioned, Scalable Internet Services
Kafka: a Distributed Messaging System for Log Processing
DistributedLog: A high performance replicated log service
The Log: What every software engineer should know about real-time data's unifying abstraction
Social Hash: an Assignment Framework for Optimizing Distributed Systems Operations on Social Networks
Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications
MapReduce: Simplified Data Processing on Large Clusters
Mesos: A Platform for Fine-Grained Resource Sharing in the Data Center
MillWheel: Fault-Tolerant Stream Processing at Internet Scale
Snowflake - Unique ID Generation. “No two snowflakes are alike.”
The Hadoop Distributed File System
Gorilla: A Fast, Scalable, In-Memory Time Series Database
Implementing Fault-Tolerant Services Using the State Machine Approach: A Tutorial
Meltdown
Spectre Attacks: Exploiting Speculative Execution
Communicating Sequential Processes
The Tail at Scale
Consistent Hashing and Random Trees: Distributed Caching Protocols for Relieving Hot Spots on the World Wide Web
Dapper, A Large Scale Distributed Systems Tracing Infrastructure
The many faces of consistency
SDPaxos: Building efficient semi-decentralized geo-replicated state machines
Dataflow Model
How to read a paper
Jupiter Rising: A Decade of Clos Topologies andCentralized Control in Google’s Datacenter Network
Characterizing, Modeling, and Benchmarking RocksDB Key-Value Workloads at Facebook
Harvest, Yield, and Scalable Tolerant Systems
Lineage stash: fault tolerance off the critical path.
F1 Query: Declarative Querying at Scale
The Architectural Implications of Facebook’s DNN-based Personalized Recommendation
EFLOPS: Algorithm and System Co-design for a High Performance Distributed Training Platform

Other papers mentioned but not discussed:

Updated 2018-05-31: added additional papers
Updated 2018-06-20: added additional papers. Linkified a few more
Updated 2018-10-06: added additional papers. Linkified a few more
Updated 2019-07-10: added additional papers. Linkified a few more
Updated 2022-04-05: added additional papers. Linkified a few more

Sunday, November 27, 2016

Papers We Read