Tuesday, September 29, 2009

Understanding TCP Incast Throughput Collapse in Datacener Networks

Summary
This paper tackles the same issue as the previous CMU one, but takes a rather different methodology and comes up with some different results.

Firstly, their workload uses a fixed fragment size, rather than a fixed block size. In the CMU paper the fragment size decreases as the # of senders increases, to simulate a fixed block size. Both of these situations seem quite possible in practice.

They also look at changing parameters other than RTO, like randomizing the initial RTO value. They found that none of these approaches really seemed to help the incast problem. The hypothesis in the paper is that the switch buffers and requests "resynchronize" the machines, since the buffer overflow effects everyone at the same time.

This paper does not find that a 200µs RTO minimum increases performance as much as the CMU paper did. They argue that this is caused by delayed ACKs. Since the network RTT is 2ms the very short RTO causes unnecessary retransmits of data.

Another finding of this paper is that disabling delayed ACKs actually reduces performance as it makes TCP overflow its window, causing congestion in the network.

Finally the paper proposes a model to explain the results they see in the paper and predict goodput values for particular configurations. This explained some of the results in the paper, although it isn't clear it will generalize to other workloads.

Comments
This was an interesting paper in that it challenges some of the results from the CMU paper. In particular the CMU paper seemed to argue that reducing the RTO would be a silver bullet and, in large part, solve the incast problem. This paper suggests this might not be the case. This is a very interesting result and shows that this area needs more study.

The model struck me as a bit too simple, as I did not see how it would adjust to changing workloads, but it's a good start and, if made richer, could be a very useful tool in studying this area.

No comments:

Post a Comment