Monday, September 21, 2009

PortLand: A Scalable Fault-Tolerant Layer 2 Data Center Network Fabric

Summary
PortLand is an attempt (much like SEATTLE) to make Ethernet more scalable. The motivations are much the same as the SEATTLE paper (discussed below), although the approach they take is quite different.

PortLand sacrifices some flexibility in exchange for better routing. They do not support arbitrary topologies, but rather assume the network will be organized as a "fat tree". That is, end-nodes connected to some number of levels of aggregation switches, and finally connected to 'core' switches before going out to the wide area. By making this assumption about topology they are able to maintain much less routing information at each switch, and avoid many of the broadcasts that are necessary to establish a topology in traditional Ethernet.

Switches go though a quite clever algorithm in which they can determine if they are 'edge' switches (switches connected to end-nodes), 'aggregate' switches (switches that connect to edge switches and/or aggregate switches below them, but also connect to some number of switches above them), or 'core' switches (only down links internally, and connection to the wide area). This greatly reduces the administration overhead, although it is unclear how well this algorithm will deal with hardware misconfiguration (i.e. things plugged into the wrong thing).

End-nodes are assigned a 'Pseudo MAC' (PMAC) which contains not only enough information to specify the target host, but also to locate it. Switches can use a prefix of the PMAC to route packets in the fat tree and can therefore maintain much less routing state.

There is also a 'fabric manager' which is responsible for handling ARP requests. Edge switches intercept ARP broadcasts, forward them to the fabric manager, which looks up the PMAC for the request, and returns it to the switch, which turns it back into an ARP response. The fabric manager is replicated and uses only soft-state.

Piggybacked on top of the auto-configuration protocol is a keep-alive system. If a location discovery message (they also call it a keepalive) is not received from a switch in a certain amount of time it is assumed to be down. The fabric manager aids in recovering from faults by receiving notifications of failures and relaying those failures to the relevant switches, which can then update their routing tables.

Comments
I quite liked the model this paper presented. The fat tree architecture is what many data-centers already use, and it's a clever way to exploit the topology. I especially thought the auto-configuration of switch positions was nice as it removes a large amount of the configuration overhead for the network. It was unclear, however, how switches were going to know they were connected to the wide area network correctly, specifically, what their default route out should be.

The evaluation section was quite disappointing in this paper. They did not compare themselves to any other systems, nor did I feel they gave a very convincing demonstration of their own scalability. They examine link failures, but don't look at failure of the fabric manager at all. The claim is that the fact that it is replicated will solve the problem, but it's also unclear how inconsistent fabric managers would impact the ability of PortLand to route packets.

I did like their thoughts on migrated VMs, and their ability to correctly migrate TCP connections. This is going to be a useful feature going forward in my opinion.

No comments:

Post a Comment