Millisort: a distributed sorting for millisecond-scale time budget [slide]
Millisort is a new distributed sorting algorithm that enables massively parallel sorting within milliseconds. Millisort minimizes the cost of coordination among nodes and made the massive parallelization practical. Millisort reduces the time to sort 50 million 100B tuples from 1 second (single node) to 2.7 ms (1000 nodes).
CURP: Consistent Unordered Replication Protocol [paper] [slide]
CURP allows writes to complete in 1 RTT on replicated systems. Traditional approaches to replication require client requests to be ordered before making them durable by copying them to replicas. As a result, clients must wait for two round-trip times (RTTs) before updates complete. This entanglement of ordering and durability is unnecessary for strong consistency. Consistent Unordered Replication Protocol (CURP) allows clients to replicate requests that have not yet been ordered, as long as they are commutative. This strategy allows most operations to complete in 1 RTT (the same as an unreplicated system). We implemented CURP in the Redis and RAMCloud storage systems. In RAMCloud, CURP improved write latency by ~2x (13.8 us -> 7.3 us) and write throughput by 4x. Compared to unreplicated RAMCloud, CURP's latency overhead for 3-way replication is just 0.4 us (6.9 us vs 7.3 us). CURP transformed a non-durable Redis cache into a consistent and durable storage system with only a small performance overhead.
RIFL: Reusable Infrastructure for Linearizability [paper] [video]
Linearizability is the strongest form of consistency for concurrent systems, but most large-scale storage systems settle for weaker forms of consistency. RIFL provides a general-purpose mechanism for converting at-least-once RPC semantics to exactly-once semantics, thereby making it easy to turn non-linearizable operations into linearizable ones. RIFL is designed for large-scale systems and is lightweight enough to be used in low-latency environments. RIFL handles data migration by associating linearizability metadata with objects in the underlying store and migrating metadata with the corresponding objects. It uses a lease mechanism to implement garbage collection for metadata. We have implemented RIFL in the RAMCloud storage system and used it to make basic operations such as writes and atomic increments linearizable; RIFL adds only 530 ns to the 13.5 us base latency for durable writes. This project is integrated into the main branch of RAMCloud [gitHub].
Distributed Transactions in RAMCloud [paper]
Constructed a new distributed transaction mechanism, with which clients can commit a transaction in 1 RTT. Clients send PREPARE requests to servers, and servers respond with votes. The clients can complete the transaction after collecting COMMIT-VOTEs from all participating servers. RIFL's exactly-once mechanism is used to simplify desgin and implementation of the multi-object transaction mechanism; the votes (which is the responses of PREPARE RPCs) are made durable and managed automatically by RIFL. The transaction mechanism can commit simple distributed transactions in about 20 us and it outperforms the H-Store main-memory database system for the TPC-C benchmark. This project is integrated into the main branch of RAMCloud [gitHub].
RAMCloud is a low-latency large-scale distributed key-value storage system. It was designed to provide lowest possible latency while maintaining strong consistency. Working on RAMCloud motivated my other projects, such as CURP, RIFL, and RIFL-based transaction mechanism, which provide the lowest possible latency while keeping strong consistency.
NanoLog: A Nanosecond Scale Logging System
Stephen Yang, Seo Jin Park and John Ousterhout
USENIX ATC 2018
Exploiting Commutativity For Practical Fast Replication
Seo Jin Park and John Ousterhout
preprint on Arxiv.
Implementing Linearizability at Large Scale and Low Latency
Collin Lee*, Seo Jin Park*, Ankita Kejriwal, Satoshi Matsushita, and John Ousterhout (*co-first author)
The 25th ACM Symposium on Operating Systems Principles (SOSP), 2015
The RAMCloud Storage System
John Ousterhout, Arjun Gopalan, Ashish Gupta, Ankita Kejriwal, Collin Lee, Behnam Montazeri, Diego Ongaro, Seo Jin Park, Henry Qin, Mendel Rosenblum, Stephen Rumble, Ryan Stutsman, and Stephen Yang
ACM Transactions on Computer Systems (TOCS) 33, 3, Article 7 (August 2015)
Analyzing performance and usability of broadcast-based inter-core communication (ATAC) on manycore architecture
Master's thesis, Massachusetts Institute of Technology, Cambridge, MA, USA, Jun 2013
Consistency in Asynchronous Replication
- Oral presentation on Exploiting Commutativity For Practical Fast Replication at Samsara, Nov 2017
- Oral presentation on Making RAMCloud Writes Even Faster at PlatformLab retreat, Jun 2016
Linearizability and Transactions
- Oral presentation on Implementing Linearizability at Large Scale and Low Latency [Video] at SOSP15, Oct 2015
- Oral presentation on Performance of Transactions in RAMCloud at SEDCL retreat, May 2015
- Oral presentation on Transactions on RAMCloud at SEDCL forum, Jan 2015
- Oral presentation on Infrastructure for Linearizable RPCs in RAMCloud at SEDCL retreat, Jun 2014