Tuesday, November 3, 2009

Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications

The Chord paper was fun for me to re-read as it's actually a system I've implemented. Having done a few paper to actual system conversions I will say that the chord paper does an excellent jobs of laying out everything an implementor might need from good pseudo-code to detailed discussion of how the various components of a chord system should interact. The paper does make the system sound quite simple and easy to implement which is not entirely true, but the clear description certainly helps.

At a more general level chord is important as it's an approach to DHTs that is considerably simpler than the other models that where being proposed at the time. The one dimensional ring is much easier to reason about than CANs hypercube for example. The observation that efficient log(n) routing can be performed with such a simple structure is the most important contribution of this paper. The fact the current production DHTs like Cassandra and Dynamo are using ring structures (albeit with different routing algorithms) coupled with the analysis from the other paper we read this week shows points to this being the 'right' choice in the DHT space.

This paper doesn't have a lot to criticize. They present an elegant algorithm, good performance results, and analyze the system in failure/churn situations. Where I think this paper misses the mark, as did all the early DHT papers, is in assessing their target market. Early DHT work saw itself as aimed at the P2P community where it could replace gnutella or napster. This was misguided for two reasons, firstly that bittorrent was already pretty much taking over the P2P space and secondly that exact key lookups do not map well into the P2P space. People want keyword search and DHTs do not provide a simple way to make this happen. Systems like Cassandra or Dynamo show that DHTs are better suited as large data stores in cluster environments where admins can have total control of all nodes in the system and can therefore use more restrictive and efficient routing algorithms.

No comments:

Post a Comment