A | B | C | D | E | F | I | J | K | L | M | N | O | P | Q | R | S | T | U | V | W | X | Y | Z | AA | AB | AC | AD | AE | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | Hey there! Have a paper to suggest? Please submit this form! | ||||||||||||||||||||||||||||
2 | https://forms.gle/s8XGxvSZxWKGSJyi6 | ||||||||||||||||||||||||||||
3 | |||||||||||||||||||||||||||||
4 | Status | Week | Year | Title | Paper Link | ||||||||||||||||||||||||
5 | Published! | 1/9/2022 | 3 | 2022 | ghOSt: Fast & Flexible User-Space Delegation of Linux Scheduling | Paper Link | |||||||||||||||||||||||
6 | Published! | 1/16/2022 | 4 | 2022 | Shard Manager: A Generic Shard Management Framework for Geo-distributed Applications | Paper Link | |||||||||||||||||||||||
7 | Published! | 1/23/2022 | 5 | 2022 | The ties that un-bind: decoupling IP from web services and sockets for robust addressing agility at CDN-scale | Paper Link | |||||||||||||||||||||||
8 | 1/30/2022 | 6 | 2022 | Presented at Alexey Charapko's reading group: https://www.youtube.com/watch?v=H-7OCFnTeMY | |||||||||||||||||||||||||
9 | Published! | 5/10/2022 | 20 | 2022 | Monarch: Google’s Planet-Scale In-Memory Time Series Database | Paper Link | |||||||||||||||||||||||
10 | Published! | 5/17/2022 | 21 | 2022 | Druid: a real-time analytical data store | https://dl.acm.org/doi/pdf/10.1145/2588555.2595631 | |||||||||||||||||||||||
11 | Published! | 6/4/2022 | 23 | 2022 | Data-Parallel Actors: A Programming Model for Scalable Query Serving Systems | https://www.usenix.org/system/files/nsdi22-paper-kraft.pdf | |||||||||||||||||||||||
12 | Published! | 7/4/2022 | 28 | 2022 | Sundial: Fault-tolerant Clock Synchronization for Datacenters | ||||||||||||||||||||||||
13 | Published! | 7/24/22 | 31 | 2022 | Metastable Failures in the Wild | ||||||||||||||||||||||||
14 | Published! | 8/25/22 | 35 | 2022 | Automatic Reliability Testing For Cluster Management Controllers | ||||||||||||||||||||||||
15 | Published! | 9/1/22 | 36 | 2022 | Seven years in the life of Hypergiants' off-nets | ||||||||||||||||||||||||
16 | Published! | 10/08/22 | 41 | 2022 | SDN in the stratosphere: loon's aerospace mesh network | SIGCOMM | |||||||||||||||||||||||
17 | Published! | 10/31/22 | 45 | 2022 | Design and evaluation of IPFS: a storage layer for the decentralized web | SIGCOMM | |||||||||||||||||||||||
18 | Published! | 12/11/22 | 51 | 2022 | Jupiter evolving: transforming google's datacenter network via optical circuit switches and software-defined networking | SIGCOMM | |||||||||||||||||||||||
19 | Published! | 1/19/2023 | 3 | 2023 | Elastic cloud services: scaling snowflake's control plane | SoCC 22 | |||||||||||||||||||||||
20 | Published! | 2/26/2023 | 9 | 2023 | Meta's Next-generation Realtime Monitoring and Analytics Platform | ||||||||||||||||||||||||
21 | Published! | 3/28/2023 | 13 | 2023 | Ambry: LinkedIn’s Scalable Geo-Distributed Object Store | ||||||||||||||||||||||||
22 | Published! | 4/16/2023 | 16 | 2023 | Perseus: A Fail-Slow Detection Framework for Cloud Storage Systems | ||||||||||||||||||||||||
23 | Published! | 6/6/2023 | 23 | 2023 | TelaMalloc: Efficient On-Chip Memory Allocation for Production Machine Learning Accelerators | ||||||||||||||||||||||||
24 | Published! | 6/29/2023 | 26 | 2023 | Towards an Adaptable Systems Architecture for Memory Tiering at Warehouse-Scale | ||||||||||||||||||||||||
25 | Published! | 7/23/2023 | 30 | 2023 | Defcon: Preventing Overload with Graceful Feature Degradation | ||||||||||||||||||||||||
26 | |||||||||||||||||||||||||||||
27 | |||||||||||||||||||||||||||||
28 | |||||||||||||||||||||||||||||
29 | |||||||||||||||||||||||||||||
30 | Networking | Running BGP in Data Centers at Scale | |||||||||||||||||||||||||||
31 | Networking | Orion: Google’s Software-Defined Networking Control Plane | |||||||||||||||||||||||||||
32 | Networking | Taiji: managing global user traffic for large-scale internet services at the edge | |||||||||||||||||||||||||||
33 | Networking | B4: Experience with a Globally-Deployed Software Defined WAN | |||||||||||||||||||||||||||
34 | Networking | Maglev: A Fast and Reliable Network Load Balancer | |||||||||||||||||||||||||||
35 | Databases | Spanner: Google’s Globally Distributed Database | |||||||||||||||||||||||||||
36 | RPCs | Method overloading the circuit | https://dl.acm.org/doi/10.1145/3542929.3563466 | ||||||||||||||||||||||||||
37 | BlobStore | Ambry: LinkedIn’s Scalable Geo-Distributed Object Store | http://dprg.cs.uiuc.edu/data/files/2016/ambry.pdf | ||||||||||||||||||||||||||
38 | DistSys | Understanding and Detecting Software Upgrade Failures in Distributed Systems | |||||||||||||||||||||||||||
39 | DB | Amazon DynamoDB: A Scalable, Predictably Performant, and Fully Managed NoSQL Database Service | https://www.usenix.org/system/files/atc22-elhemali.pdf | ||||||||||||||||||||||||||
40 | DB | CockroachDB: The Resilient Geo-Distributed SQL Database | |||||||||||||||||||||||||||
41 | AI | Using deep learning to annotate the protein universe | https://www.nature.com/articles/s41587-021-01179-w.epdf?sharing_token=q6tRetZ422gIjtPMP4s5a9RgN0jAjWel9jnR3ZoTv0M6G6LioRgZ9bzThQkXRdrB3jzKxuUul1YK61iQvv0TpiY1g-t8hlEEJAPaWoOEQSPqrFygPoSzQFS2EpxMCyl-LsP8mRRne59fwzepXL22aNjligptda4Cl01WNl1U13I%3D | ||||||||||||||||||||||||||
42 | |||||||||||||||||||||||||||||
43 | ByteGraph: A High Performance Distributed Graph Database in ByteDance | VLDB | |||||||||||||||||||||||||||
44 | Velox: Meta’s Unified Execution Engine | VLDB | |||||||||||||||||||||||||||
45 | D3: A Dynamic Deadline-Driven Approach for Building Autonomous Vehicles | https://dl.acm.org/doi/pdf/10.1145/3492321.3519576 | |||||||||||||||||||||||||||
46 | Building An Elastic Query Engine on Disaggregated Storage | ||||||||||||||||||||||||||||
47 | How to fight production incidents?: an empirical study on a large-scale cloud service | ||||||||||||||||||||||||||||
48 | |||||||||||||||||||||||||||||
49 | |||||||||||||||||||||||||||||
50 | Papers I want to read / write about soon! | ||||||||||||||||||||||||||||
51 | |||||||||||||||||||||||||||||
52 | Fail-Slow at Scale: Evidence of Hardware Performance Faults in Large Production Systems | ||||||||||||||||||||||||||||
53 | F1 Query: Declarative Querying at Scale | ||||||||||||||||||||||||||||
54 | Building An Elastic Query Engine on Disaggregated Storage | ||||||||||||||||||||||||||||
55 | Elle: Inferring Isolation Anomalies from Experimental Observations | ||||||||||||||||||||||||||||
56 | Conflict-free Replicated Data Types | Link | |||||||||||||||||||||||||||
57 | Dissecting performance bottlenecks of strongly-consistent replication protocols | Link | |||||||||||||||||||||||||||
58 | Shuffling, Fast and Slow: Scalable Analytics on Serverless Infrastructure | ||||||||||||||||||||||||||||
59 | Toward formally verifying congestion control behavior | ||||||||||||||||||||||||||||
60 | Dryad: Distributed Data-Parallel Programs from Sequential Building Blocks | ||||||||||||||||||||||||||||
61 | Solving Large-Scale Granular Resource Allocation Problems Efficiently with POP | ||||||||||||||||||||||||||||
62 | Evolution of Development Priorities in Key-value Stores Serving Large-scale Applications: The RocksDB Experience | ||||||||||||||||||||||||||||
63 | Lineage stash: fault tolerance off the critical path | ||||||||||||||||||||||||||||
64 | Consensus in the Cloud: Paxos Systems Demystified | https://cse.buffalo.edu/tech-reports/2016-02.orig.pdf | |||||||||||||||||||||||||||
65 | The Snowflake Elastic Data Warehouse | ||||||||||||||||||||||||||||
66 | Debugging the OmniTable Way | ||||||||||||||||||||||||||||
67 | |||||||||||||||||||||||||||||
68 | Backlog (arbitrarily ordered) | ||||||||||||||||||||||||||||
69 | |||||||||||||||||||||||||||||
70 | Amazon Aurora: Design Considerations for High Throughput Cloud-Native Relational Databases | ||||||||||||||||||||||||||||
71 | From cloud computing to sky computing | ||||||||||||||||||||||||||||
72 | In reference to RPC: it's time to add distributed memory | ||||||||||||||||||||||||||||
73 | Blending containers and virtual machines: a study of firecracker and gVisor | ||||||||||||||||||||||||||||
74 | Gandalf: An Intelligent, End-To-End Analytics Service for Safe Deployment in Cloud-Scale Infrastructure | ||||||||||||||||||||||||||||
75 | Challenges and Opportunities for Autonomous Vehicle Query Systems | ||||||||||||||||||||||||||||
76 | AntMan: Dynamic Scaling on GPU Clusters for Deep Learning | ||||||||||||||||||||||||||||
77 | Overload Control for μs-Scale RPCs with Breakwater | ||||||||||||||||||||||||||||
78 | Taiji: managing global user traffic for large-scale internet services at the edge | ||||||||||||||||||||||||||||
79 | A buffer-based approach to rate adaptation: evidence from a large video streaming service | ||||||||||||||||||||||||||||
80 | Taking the Edge off with Espresso: Scale, Reliability and Programmability for Global Internet Peering | ||||||||||||||||||||||||||||
81 | Andromeda: Performance, Isolation, and Velocity at Scale in Cloud Network Virtualization | ||||||||||||||||||||||||||||
82 | Azure Accelerated Networking: SmartNICs in the Public Cloud | ||||||||||||||||||||||||||||
83 | Unikernels as Processes | ||||||||||||||||||||||||||||
84 | Slim: OS Kernel Support for a Low-Overhead Container Overlay Network | ||||||||||||||||||||||||||||
85 | Keeping Master Green at Scale | ||||||||||||||||||||||||||||
86 | RPCValet: NI-Driven Tail-Aware Balancing of µs-Scale RPCs | ||||||||||||||||||||||||||||
87 | TCP-Fuzz: Detecting Memory and Semantic Bugs in TCP Stacks with Fuzzing | ||||||||||||||||||||||||||||
88 | Site-to-site internet traffic control | ||||||||||||||||||||||||||||
89 | CRDTs for truly concurrent file systems | ||||||||||||||||||||||||||||
90 | |||||||||||||||||||||||||||||
91 | Caerus: NIMBLE Task Scheduling for Serverless Analytics | ||||||||||||||||||||||||||||
92 | Ownership: A Distributed Futures System for Fine-Grained Tasks | ||||||||||||||||||||||||||||
93 | When Cloud Storage Meets RDMA | ||||||||||||||||||||||||||||
94 | ICARUS: Attacking low Earth orbit satellite networks | ||||||||||||||||||||||||||||
95 | FAASNET: Scalable and Fast Provisioning of Custom Serverless Container Runtimes at Alibaba Cloud Function Compute | ||||||||||||||||||||||||||||
96 | P3: Distributed Deep Graph Learning at Scale | ||||||||||||||||||||||||||||
97 | GoJournal: a verified, concurrent, crash-safe journaling system | ||||||||||||||||||||||||||||
98 | Argus: Debugging Performance Issues in Modern Desktop Applications with Annotated Causal Tracing | ||||||||||||||||||||||||||||
99 | Experiences Deploying Multi-Vantage-Point Domain Validation at Let’s Encrypt | ||||||||||||||||||||||||||||
100 | ReDMArk: Bypassing RDMA Security Mechanisms |