Tuesday, November 3, 2009

Looking Up Data in P2P Systems

Summary
This paper takes a very high level look at a number of different DHT systems. It first motivates the problem, that we need distributed key lookup in the face of shifting nodes all administered by different users. We want to minimize both the number of hops it takes to find a keys true location as well and minimize the amount of state each node needs to maintain. In general the approach is to have the nodes organize themselves according to some geometry and then have each node know how to route a particular key request 'closer', in the geometry, to the node that's responsible for it. There are a number of possible geometries and the paper looks at some of the most common:

- chord: forms a ring and each node is assigned a position on the ring. nodes maintain information about other nodes 1/2, 1/4, 1/8 of the way around the ring, and so on. This way a key can be found in log(n) lookups. Nodes also maintain their immediate successors on the ring so in the worst case (all the more distant nodes are down) a request can just hop along one node at a time toward its location.

pastry/tapestry/kademlia: these use tree schemes in which each node knows of another nodes with a particular prefix (0,1,00,01...). Pastry maintains a leaf set of close nodes that it can route to and it can always route as long as more than half of those nodes don't fail.

can: forms a hypercube and nodes are responsible for a region of the cube. nodes maintain information about their neighbors in the cube and route along the straight line between their location and the location of the key.

Comments
This paper was very high level. If I didn't already know how chord and can worked I don't think I would now, and I don't feel that my basic understanding of the tree based systems was much improved either. It does present a good quick motivation for the problem, and perhaps as a first introduction to the topic it is a useful paper.

The main thing about multi-hop DHTs is that they seem like an obviously useful structure but they haven't actually found all that many uses. Aside from DNS, which isn't really the sort of thing this paper is describing, I'm not away of any major multi-hop DHT deployments. One-hop DHTs, on the other hand, have proven quite useful in systems like Dynamo or SEATTLE. Perhaps the inherent uncertainty in the number of hops for routing in this style of DHT makes them unattractive for real systems.

While this wasn't a bad paper, I would think that something that took a closer look at a particular type of DHT and some applications for it might be a better fit for the class.

No comments:

Post a Comment