CS268 Readings: October 2009

Tuesday, October 27, 2009

On Power-Law Relationships of the Internet Topology

Summary
This paper attempts to derive some power laws for the inter-domain topology of the Internet. These are relationships of the form y α x^a, relating some property of the graph of node connectivity to another. These relationships can be used to check the validity of models of the Internet used in simulation, as well as to predict the growth of the network.

The authors use three data sets for their study, collected by the National Laboratory for Applied Network Research, about six months apart, the first collected in November 1997. The information is gleaned from a route server from the BGP routing tables of some distributed routers connected to that server. There is also a router level graph used, to provide data at a different level.

There are a few definitions used in the laws:

rank - the rank of a node is its place in the list of nodes sorted by decreasing outdegree
frequency - the frequency of an outdegree d is the number of nodes with outdegree d

The authors derive three main power-laws:

The rank exponent - This law states that the outdegree of a node is proportional to the rank of the node to the power of a constant R. R is a single number that can characterize an entire graph. Indeed, in the sample data the R value is similar for all three inter-domain graphs. This suggests that a graph that has a quite different value of R is not a good representation of Internet topology.
The outdegree exponent - This law states that the frequency of an outdegree d is proportional to the outdegree to the power of a constant O. Again the inter-domain graphs all exhibit very similar O values, indicating a common structure.
The hop-plot exponent - This law states that the total number of pairs of nodes, within h hopes, is proportional to the number of hops to the power of a constant H. The authors argue that since the interdomain graphs all have a similar value for H, but the router graph has a different one, that H is a good way of characterizing families of graphs.

Finally the authors argue that using power-laws is a better way of characterizing graphs of the Internet than previous methods as the relationships are often skewed. For example, 85% of nodes have an outdegree less than the average in one of their graphs.

Comments
I thought this was an interesting paper. It presents a concise and simple derivation of the various laws, and does a good job explaining why their results might be useful. I'm concerned that all the laws are derived from graphs from a single source, however. This could indicate some fundamental properties of the particular connections the source saw, but not properties of the Internet as a whole. One hopes the chances of this are minimized by the collection points being distributed geographically.

Some of their derivations also relied on too little data, in my opinion. For example the hop-count metric has only four points in the interdomain graphs, which seems far too few to derive any real relationship.

Finally, given the time I'd like to look up the actual numbers for the numbers the paper predicts, but my Google skills are failing me at the moment. All in all, an interesting paper, and one I'd keep on the syllabus.

Monday, October 26, 2009

On Inferring Autonomous System Relationships in the Internet

Summary
This paper presents an overview of some BGP concepts, some formal definitions of properties of graphs of Autonomous Systems (AS), and and algorithm for heuristacally inferring relationships. These relationships are often not made public as they are business relationships, so the ability to infer them helps us gain a deeper understanding of Internet topology.

An AS graph is built from public BGP routing tables. The graph's edges are classified into one of three types:

provider-to-customer - the only directed edge, indicates a relationship in which the provider provides service to a paying customer
peer-to-peer - edges indicating peering agreements between ASs
sibling-to-sibling - edges between siblings

By splitting the graph into trees the authors are able to work out uphill and downhill paths. Uphill paths are made up of edges that are customer-to-provider or sibling-sibling edges. A downhill path a path of provider-to-customer edges or sibling-to-sibling edges. By looking at maximal paths, and noting that providers should have a higher degree in the graph, we can infer what relationships exist between ASs. Information about what ASs provide transit to other ASs can be used to discover peering relationships.

The authors test their inferred relationships on Route Views data, and verify it by asking AT&T to confirm their results. They classify 90.5% of edges as provider-to-customer, less than 1.5% as sibling-to-sibling edges, and less that 8% as peer-to-peer edges. The AT&T verification shows that the majority of their edges are classified correctly. 100% of their customer relations are correct, which is already 90.5% of all the edges they classified. However, the provider info is wanting. They infer that AT&T has one provider while in reality it has none, meaning they get 0% on that metric. The sample size isn't really big enough to tell us anything about the efficacy of their provider inference however.

Comments
While this paper is clearly quite an achievement, I did not find it all that interesting. We've already looked at BGP so that part of the paper didn't give me much, and the rest of the paper is pretty heavy theoretical going with not much of payoff.

I also found the verification of the result pretty unsatisfactory. While I realize that AT&T cannot release real data so show more detailed results, the very high level percentage correct figures left me wanting more. The 0% correct result on provider inference is a pretty glaring example of the problem with this verification method.

This felt like a paper more useful to someone deeply involved in this particular area, and less well suited to a general networking class. I would remove it from the syllabus.

Tuesday, October 20, 2009

White Space Networking with Wi-Fi like Connectivity

Summary
This paper tackles the problem of harnessing the newly opened up UHF spectrum for data transfer. This presents a number of problems not present in the 2.4Gz band:

The transmission range is bigger, so there is more spacial variation wrt a single AP
Exiting users of the spectrum fragment the space making the problem of picking a channel for the AP more difficult.
Because the FCC states that devices must give up a channel to incumbents (like wireless mics), and because these devices and come and go, systems must be able to quickly switch channels and do so automatically and without interfering with the incumbent device.

The paper presents two main ideas, WhiteFi, which is a WiFi like system that runs in UHF white spaces which uses SIFT, which is a fast way of locating WhiteFi signals in the entire UHF band.

The idea behind sift is to do analysis on the signal itself, and not to do a FFT first which is expensive. By looking for blips in the signal that are temporally spaced at the right time to look like WhiteFi packets and ack they can quickly detect where APs are likely to be.

WhiteFi uses SIFT for AP discovery but also establishes a secondary backup channel. If an incumbent is detected on the main channel a client or AP will immediately switch channels and 'chirp' on the backup channel to inform peers of the change. If the chirp is lost SIFT can be used to re-acquire the AP.

Comments
This paper was certainly interesting, and the prospect of using UHF white spaces for networking is quite attractive. A few issues I had:

The authors discover that even a single packet can cause interference for wireless mics. If nodes are streaming data there is no way the detection of an incumbent plus the chirping to switch time is going to prevent a few packets still being transmitted on the old channel. This seems to violate the FCC requirement of no interference.
The analysis only tested UDP flows. As we've seen before, TCP is much more sensitive to unusual network environments and it seems important to evaluate it's performance in WhiteFi.
Given the long range in the UHF band it would have been nice to see numbers about how much more area can be covered with how many fewer APs compared to traditional WiFi as a motivator for this work.

This was still a cool paper, and gets into some of the signal processing problems we haven't talked about yet, so I would keep it on the reading list.

Monday, October 19, 2009

Interactive WiFi Connectivity For Moving Vehicles

Summary
This paper looks at a scheme for improving WiFi performance when the clients are in motion, for example cars or buses that will pass by a number of base stations. The authors develop ViFi, a protocol that achieves good performance in these situations.

The paper begins by explaining why the mobile environment is difficult for WiFi applications. Handoff, in which a client changes base stations, is the crux of the problem. Typically clients must perform a 'hard handoff', which means picking a new base station and leaving the old one. The authors evaluate a number of methods for switching but show that none perform as well as a scheme that can opportunistically use multiple base stations simultaneously.

From this observation the paper develops ViFi. The basic idea is that clients pick one base station to be an anchor in much the same way a base station is picked normally. 802.11 broadcast is used, which means no link-layer ACKs, but ViFi introduces its own ACKs. If a base station that is not the anchor overhears a packet but not the ACK for that packet it will relay the packet. The relay is performed over the 'base station backplane', which is either a mesh-network or wired, depending on the deployment.

The authors run a number of experiements in a real deployment at Microsoft, and also simulations based on data from a system in Amherst MA. These show that ViFi can perform much better than standard 'hard-handoff' schemes.

Comments
I enjoyed this paper. The idea seems a fairly obvious one, but the authors present a rigorous analysis showing why using multiple BSs is a good idea, and how much better their system can perform.

Their workload did seem a bit arbitrary however, and it would have been nice to see a more varied workload approach. They also dismiss the problem of packet re-ordering by saying it would be easy to buffer and order packets at the anchor, but we've seen that this can wreck havoc with TCPs RTT estimates and it's not clear how well this would actually work.

Still, a nice approach and a good paper that I would keep on the reading list.

Wednesday, October 14, 2009

XORs in the Air: Practical Wireless Network Coding

Summary
I really liked this paper, mostly just because it's a very clever idea. An algorithm, COPE, again taking advantage of the broadcast nature of wireless radios but this time by packing more data into a single transmission, is developed in this paper. This is done by maintaining a packet buffer at each node, and by knowing (or guessing) what packets nearby nodes have in their buffers. The can be known with certainty through 'reception reports', or by guessing based on expected transmissions given the routing metric. Once I know what my neighbors have I can look at the next packet I need to transmit. If I can find n packets such that my neighbors all have n - 1 of those packets (not necessarily the same n-1 packets though) then I can XOR those packets together to send multiple packets in the space of one 'native' packet. Any neighbors with n-1 packets can XOR again to get the native packet they don't have back out. This increases the throughput of the network without changing anything else.

Each node maintains a queue of packets to send out. This could be re-ordered to provide more XOR-ing opportunities, but this isn't done since it would create an out-of-order problem for TCP. COPE maintains two virtual queues per neighbor, one of small packets and one of large packets. When it needs to send a packet it tries to find other similarly sized packets to XOR-ing with it (since XOR-ing small packets with large ones doesn't provide as much bandwidth savings). If appropriate packets are found, they XOR-ed in and sent.

Since the standard 802.11 MAC broadcast doesn't do ACKs the authors also introduce pseudo-broadcast which uses 802.11 unicast but adds an XOR header with all intended nodes so that any overhearing node can figure out if it was also an intended recipient. The packet is sent until the primary recipient ACKs the packet.

The authors give proofs of the bandwidth gain that can be achieved for certain network topologies. Most importantly they show that in the presence of opportunistic listening the gain is unbounded. This explains the 3x-4x increase in performance they see when running UDP tests on Roofnet. Interestingly they don't see much of a performance boost when running TCP tests. They believe this is because TCP backs off too much when collisions occur, and that collisions are common due to hidden terminals. They validate this by showing that TCP performs well with COPE when there are no hidden terminals.

Comments
I thought this was a well argued paper. It gives a good overview of a clever design, proves some interesting theoretical properties of the design, and then runs actual experiments on an implementation.

A few gripes:

I didn't like the graphs they chose to use, rather difficult to read and not particularly illuminating. Bar charts of throughput feel like they would have been much more useful.

The hidden terminal problem feels a lot like Incast to me. Would reducing the RTOmin be an effective way to deal with this problem? RTTs are probably higher in the wireless environment, but there might be some gain to be had. It's all well and good to say, we help TCP if there are no hidden terminals, but this will always be a problem in the real world.

ExOR: Opportunistic Multi-Hop Routing for Wireless Networks

Summary
The ExOR paper is an approach to leverage the broadcast nature of wireless radios to achieve better performance. For each hop nodes compute what other nodes they think should forward this packet in a priority order. The priority is based on a similar metric to ETX, that is, the number of transmissions from that node to the receiver, including retransmissions. When nodes receive a packet, they look at the list of nodes that are supposed to be the next transmitter of the packet. If they are in the list they schedule it for transmission. Each node waits the amount of time it expects higher priority nodes to take to transmit their packets (packets are batched for better performance), and then, if it hasn't heard the packet(s) already transmitted, sends its batch. The intuition here is that, all nodes in the forwarder list should be 'closer' to the destination, so they should all be able to overhear each other if they heard the previous hop.

A map of successfully transmitted packets is kept at each node which is updates in response to overhearing other neighbors and in receiving maps from neighbors. This acts as a gossip mechanism and keeps maps fairly synchronized between nodes and also can prevent unnecessary retransmissions when the standard MAC ACKs aren't received. When the sender discovers that 90% or more of its packets have been transmitted to the destination it sends the last bit using traditional routing. This is because sending those few packets would require running a whole batch, which is expensive for so few packets.

This method allows ExOR to take advantage of transmissions that go unexpectedly far and/or that fall unexpectedly short. A sender might compute that a distant node close to the receiver is a great hop, but packets might not always make it that far. The sender puts that node at the top of the priority list, so if packets do make it that far they will be quickly send on to the destination, but if they don't, nodes closer to the sender will transmit the packets too. Likewise, if a packet doesn't make it as far as expected, closer nodes (to the sender) will send the packets after a short delay waiting for the expected nodes to transmit.

By running experiments on Rootnet the authors show that ExOR provides an increase in throughput for most node pairs. For short hop pairs ExOR's map acknowledgement prevents retransmission and boosts throughput by almost 35%. For more distant pairs the improvement is more like a factor of 2 to four due to the many path options in ExOR.

Comments
I liked the idea in this paper. Taking advantage of the inherent broadcast nature of wireless is a great way to improve performance. The priority lists are a nice way to deal with having to have a low overhead in reaching consensus on what packets have been received where.

A few thoughts:

Given that an ETX metric is used to rank nodes in the priority list, and that ETX is based on probabilities, usually the highest priority node will receive the packet. This is not going to be the closest node to the target often (since that node will have a low ETX rating as it will have a high # of expected re-transmissions). Thus ExOR seems it will often simply route as ETX, with added delays for the batching. It would be interesting to see if simply using min hop-count as a metric would actually perform better here since the fall-back to lower priority nodes would naturally handle the major issue with using min hop-count metrics.

ExORs poor interaction with TCP is unfortunate as it prevents it from being plugged in seamlessly. The reduction in variance in transmit times due to ExOR seems like it would boost TCP performance (lower RTOs) if the window size problem could be handled.

I enjoyed this paper and would keep it on the reading list.

Thursday, October 8, 2009

A Performance Comparison of Multi-Hop Wireless Ad Hoc Network Routing Protocols

Summary
This paper aims to make a fair comparison between four ad-hoc wireless routing protocols. The paper describes each protocol, their simulation environment, the workload they test with, and then the results they got for each protocol.

The protocols tested are:

DSDV, which does "next hop" routing by maintaining a routing table at each node with a sequence number for each route. Nodes advertise routes with increasing sequence numbers.
TORA, uses a link reversal algorithm, wherein a Query request is sent though the network, then an Update comes back with the best route. A flow model is used where nodes have height and unreachable nodes are modeled as having a greater height the the local height.
DSR, which floods the network with Route Request packets when it needs a route. Nodes cache routes and aggressively stop Route Request packets with Route Reply packets for routes they know about. DSR also snoops on traffic to hear or compute better routes.
AODV, which is a combination of DSDV and DSR. Routes are discovered as in DSR but stored all the way along the path to the destination. Sequence numbers are used as in DSDV.

The results of the paper look at % of packets delivered, routing overhead (i.e. # of route related packets transmitted, and path optimality (i.e. how many more hops did I take than I really needed).

At a high level, DSR performs much better than most of the other protocols, with DSDV performing very poorly, especially with a lot of mobility which basically causes TORA to go into congestion collapse. All protocols perform decently well with low mobility, although TORA and AODV send many more routing related packets. DSDV does have the advantage that it sends a fairly constant amount of routing related information.

Comments
I enjoyed this paper. It gives a fairly clear description of each of the protocols, as well as issues the authors encountered while actually trying to implement each one. The analysis goes to great lengths to try and make the comparison fair, and it seems they do a decent job of that (not using TCP for example as it will vary transmission times and rates based on the current state of the network). However, any real deployment would use some transport protocol and it would be nice to see how each of these performed on TCP, the most likely choice.

It was interesting the DSR, probably the most simple of the protocols, performs so much better than all the others. Perhaps the added complexity of 'clever' routing protocols in mesh-networks simply isn't worth it.

Wednesday, October 7, 2009

A High-Throughput Path Metric for Multi-Hop Wireless Routing

Summary
This paper looks at improving routing in wireless networks that often use routes with more than one hop. Their metric, expected transmission count (ETX), tries to find the route that has the fewest number of expected transmissions on the path to the destination. In a network with no loss, this is clearly the same as a minimum hop-count algorithm, but in real networks with expected losses, this is not the case. As the paper point out, minimizing hop-count will tend to favor long hops, which in the wireless world are often weak links with poor throughput and high loss rates. This leads to poor path selection in practice.

The paper shows simulated results the indicate that, in networks where paths are typically 3 or more hops, ETX performs much better than a minimum hop-count algorithm.

The ETX metric considers expected losses in both the forward and backward direction. If df is the probabilty that a packet is sucesfully transmitted in the forward direction and dr in the reverse, then ETX = 1 / (df x dr). The delivery ratios are computed using probe packets.

The authors implement ETX in the DSDV and DSR routing schemes. Their results show that it works well in both cases with some minor modifications.

Finally the authors point out that their metrics are not perfect as the probe packets are always the same size (and smaller than typical data packets) but links might have different characteristics when the packets are of different sizes. It seems like this could be solved by including metrics from actual packet transmissions into the algorithm, and also by varying the probe packet size.

Thoughts
I liked this paper. It gives a clear description of the problem with minimum hop-count routing, and nicely explains their new metric.

A few issues I had:

The graph style for the throughput measurements is terrible and makes extracting data from the graph difficult. This is mostly just a formatting gripe however.
There is a 'probe window' in which a certain number of probes are supposed to have been sent. The delivery ratios are calculated using this expected value and the actual # of received probes. This requires all nodes in the network to have the same configuration for window size and probe frequency and could make incremental upgrade difficult.
Probe packets contains the # of probes received from all it's neighbors. In a dense network this could make the probe packets quite big and hurt overall throughput.
DSR with failure feedback does almost as well as ETX. Does this indicate that really all we care about is failed links and we don't need to do all the extra work ETX is doing?

Still, I enjoyed this paper and think it's worth keeping on the reading list.

Tuesday, October 6, 2009

Achitecture and Evaluation of an Unplanned 802.11b Mesh Network

Summary
This paper looks at a deployment of Roofnet, an unplanned mesh network in an urban environment. The basic architecture is that the nodes connect to each other and use a shortest path routing algorithm to find paths to gateway nodes, that is, those small percentage of Rootnet nodes that are also connected to the Internet.

The nodes maintain a database of link metrics and do Dynamic Source Routing, in which each TCP session computes the best route using Dijkstra's algorithm over the link DB. The metric that gets used is the link throughput in conjunction with the probability that packets will get through. This sometimes required putting the WiFi cards into high-loss but also high-throughput modes, which required setting a custom policy on the wireless NIC. To discover the link metrics nodes can flood the network and listen for replies, or snoop on other nodes' floods.

The paper evaluates the performance and the robustness of Roofnet, noting that they get mean throughput of 627 kbit/sec. They also look at simulated results of handling failures, noting that most nodes choose a wide variety of neighbors for their first hop so failure of a single neighbor shouldn't effect performance too much. They do find, however, the throughput will drop significantly as there is often only one or two 'fast' neighbors.

Thoughts
I thought this was an interesting paper. They manage to get a working network set up with minimal work by the volunteers and it actually appears to work. They get decent throughput (although the median isn't as good as the mean and might have been a better metric).

On major concern I have is that the paper does not seem to take the performance of the gateway's connection to the WAN into account. It's all well and good if I can get to my gateway at 600kbps, but if that gateway is overloaded on it's WAN routes I won't see good performance overall. This does not seem to be captured by anything in their algorithm and should really be addressed.

Still, an interesting paper and one well worth reading.

Modeling Wireless LInks for Transport Protocols

Summary:
This paper looks at a number of reasons that current models of wireless networks, and why they are inadequate. They show that modeling a duplex connection gives quite different results than one way traffic, which is the model often used in simulations.

The paper also highlights a number of features of wireless networks, how they effect the performance of transport protocol, what types of links they are present in, and how to model them. These features are:

Error losses and corruption. These losses can make TCP think there is congestion when there isn't, leading to poor performance. While FEC and link-layer retransmission ease this somewhat, there can still be delays and re-ordering due to limited retransmit buffers at the end-points of the wireless links. This can be modeled by dropping packets, but we need to consider whether our losses are correlated or not.

Delay Variation. Link delay can change suddenly in a wireless world due to mobility, or scheduling at the wireless end-points. If the TCP connection assumes this is due to congestion (i.e. it's using delay based congestion-control) then it will back off sending unnecessarily.

Packet Reordering - This can be caused by link-layer recovery mechanisms and can cause retransmission from the end-host if it causes enough of a delay in delivery.

On-Demand Resource Allocation - In cell networks a channel needs to be allocated if a client hasn't send any data for a threshold amount of time. This can cause delays. The authors show how to model this by introducing a delay when a packet is queued on an empty queue that has been empty longer than the threshold time.

Bandwidth Variation - Different channels can have different bandwidths and clients might be moved from channel to channel. This can cause spurious timeouts in TCP if the RTT suddenly goes up.

Asymmetry in Bandwidth and Latency: With an asymmetric link TCP can get congestion on the ACKs. This can easily be modeled by setting the right parameters.

The paper also looks at queue management and how drop-tail queuing can lead to delay and problems due to spurious retransmission timeouts. They argue that drop-tail modeling is okay for cell and WLAN links, but that RED is perhaps a better model for satellite links and for future WLAN links. Finally the paper looks at ways transport protocols could handle the various problems modeled in the paper.

Thoughts:
I can't say I loved this paper. It felt a bit like a laundry list of wireless link issues, which might be very useful to designers but wasn't particularly illuminating from a theoretical perspective. Some of the experiments showing the real problems with past models was interesting.

I could imagine a very interesting follow-up to this paper in which solutions to the various issues they have identified here are proposed and evaluated. This is an important starting point toward that paper.

Thursday, October 1, 2009

A Comparison of Mechanisms for Improving TCP Performance over Wireless Links

Summary
This paper looks at a number of approaches for improving the performance of TCP over wireless links. TCP's assumption that a lost packet indicates congestion is not particularly valid in the wireless environment where packet loss is more often due interference of some sort. This leads to poor performance as TCP needs to wait for duplicate acks to retransmit, has long timeouts (200ms), and will reduce its congestion window even though it's not necessary.

The paper looks at three broadly different approaches to improving the performance of TCP over wireless links.

Link-layer protocols: The tools here are forward error correction and/or automatic repeat requests. The paper argues that the advantage of this approach is that it layers nicely in the network stack, operating independently of higher-level protocols and thereby shielding those higher-level protocols from the lossy nature of the wireless link. The challenge here is to make sure the link-layer protocol plays nicely with TCP. One proposed scheme that works quite well is that the base station snoops on the TCP traffic, and buffers some number of packets. When the station sees a duplicate ACK it can suppress it, and simply retransmit the packet itself. In the experiments this protocol is shown to perform well.
Split connection protocols: In this case the connection is 'split' on either side of the wireless link and a highly tuned (for wireless) protocol is used over the wireless hop. The concern here is that the end-to-end semantics of the link are not preserved. In particular the congestion window is independent for each side of the connection and this is shown to lead to poor performance in the experimentation section. The authors conclude that split connections are not necessary for good performance of TCP over wireless links.
end-to-end protocols: This approach modifies TCP itself to deal with wireless links. Techniques include adding selective ACKs and/or Explicit Loss Notification. This approach also performs quite well in practice.

The authors run a number of experiments with PC base stations and laptop nodes. The use an exponential drop model and test a number of different loss rates. They notice that link-level protocols that are not TCP aware perform quite poorly due to the fact that the order is not preserved by the link-layer and so often times both the link-layer and the end host end up retransmitting packets. Adding TCP awareness and SMART recovery leads to much better performance in this case.

The authors also note that split-protocols are not necessary. They can be made to perform almost as well as some of the link layer protocols, but the added complexity does not really buy you anything.

Comments
I enjoyed this paper. It gives a very nice summary of the various techniques for dealing with packet loss in a wireless network and why each protocol behaves the way it does. I also thought the snoop technique was particularly clever and was gratified to see it work well in practice.

As always in a paper who's results are based on experiments, more and more diverse experiments would have been nice. In particular it would have been nice to do some long running experiments in a real environment (the paper models packet losses) and see if the results looked similar to the modeled ones. I was nice that the network was real however, and not simulated.

I also think there is more work that could be done in improving split connection protocols. Rather than doing TCP why not use some super optimized wireless protocol. The problem of violating end-to-end semantics seems difficult to get around in this case however.

CS268 Readings