Sunday, November 27, 2016

Papers We Read

Some months ago I started a reading group at my workplace focussed on distributed systems. The goal of the group was to be an informal meeting to discuss a mixture of high impact, historical, and modern papers.

This is the list of papers we read:
  1. Lamport Time Clocks
  2. Spanner: Google’s Globally-Distributed Database
  3. The Chubby Lock Service for Loosely-Coupled
  4. A note on distributed computing
  5. The Byzantine Generals Problem
  6. Your computer is already a distributed system. Why isn't your OS?
  7. How Complex Systems Fail
  8. Fast and Message-Efficient Global Snapshot Algorithms for Large-Scale Distributed Systems
  9. Automatic Management of Partitioned, Replicated Search Services
  10. Simple Testing Can Prevent Most Critical Failures
  11. Dynamo: Amazon’s Highly Available Key-value Store
  12. Wait-free coordination for Internet-scale system
  13. Brewer’s Conjecture and the Feasibility of Consistent, Available, Partition-Tolerant Web Services
  14. SEDA: An Architecture for Well-Conditioned, Scalable Internet Services
  15. Kafka: a Distributed Messaging System for Log Processing
  16. DistributedLog: A high performance replicated log service 
  17. The Log: What every software engineer should know about real-time data's unifying abstraction
  18. Social Hash: an Assignment Framework for Optimizing Distributed Systems Operations on Social Networks
  19. Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications 
  20. MapReduce: Simplified Data Processing on Large Clusters
  21. Mesos: A Platform for Fine-Grained Resource Sharing in the Data Center
  22. MillWheel: Fault-Tolerant Stream Processing at Internet Scale
  23. Snowflake - Unique ID Generation. “No two snowflakes are alike.”
  24. The Hadoop Distributed File System
  25. Gorilla: A Fast, Scalable, In-Memory Time Series Database
  26. Implementing Fault-Tolerant Services Using the State Machine Approach: A Tutorial
  27. Meltdown
  28. Spectre Attacks: Exploiting Speculative Execution
  29. Communicating Sequential Processes
  30. The Tail at Scale
  31. Consistent Hashing and Random Trees: Distributed Caching Protocols for Relieving Hot Spots on the World Wide Web
  32. Dapper, A Large Scale Distributed Systems Tracing Infrastructure
  33. The many faces of consistency
  34. SDPaxos: Building efficient semi-decentralized geo-replicated state machines
  35. Dataflow Model
  36. How to read a paper
  37. Jupiter Rising: A Decade of Clos Topologies andCentralized Control in Google’s Datacenter Network
  38. Characterizing, Modeling, and Benchmarking RocksDB Key-Value Workloads at Facebook
  39. Harvest, Yield, and Scalable Tolerant Systems
  40. Lineage stash: fault tolerance off the critical path.
  41. F1 Query: Declarative Querying at Scale
  42. The Architectural Implications of Facebook’s DNN-based Personalized Recommendation
  43. EFLOPS: Algorithm and System Co-design for a High Performance Distributed Training Platform
Other papers mentioned but not discussed:
Updated 2018-05-31: added additional papers
Updated 2018-06-20: added additional papers. Linkified a few more
Updated 2018-10-06: added additional papers. Linkified a few more
Updated 2019-07-10: added additional papers. Linkified a few more
Updated 2022-04-05: added additional papers. Linkified a few more