Tuesday, November 24, 2009

BotGraph: Large Scale Spamming Botnet Detection

Summary
A method for sending Spam that has become quite prevelant of late is having a bot running on an end host sign up for a web-based email account (hotmail, gmail etc) and then using that email to send Spam from the end host. The traditional CAPTCHA protection for new account signups is being subverted by either routing the CAPTCHA to some other 'sketchy' site (i.e. asking uses to solve CAPTCHA to download pirated software) or simply paying people a trivial amount to solve them.

This paper looks at a way of identifying these bot accounts using the signup and access patterns gleaned from the hotmail logs. They identify two main activities they consider suspicious:

  1. A sudden increase in the number of signups from a particular IP. The authors use an exponentially-weighted moving average (EWMA) to detect these sudden increases and limit the sign-up rate making it harder for spammers to obtain new accounts.
  2. The authors assume that there will be more accounts than machines available. Thus if multiple accounts are accessed from the same IP they stand a higher chance of being bot-created accounts. To reduce false positives (i.e. from machines changing IPs due to DHCP) they actually require the IPs to be from different ASes to count.
  3. Finally they look at the number of messages send by each account. While this isn't used in their detection, they use it as part of the evaluation. The observation is that human accounts don't often send more than 3 emails a day (although most people I know send far more than that), and so accounts that send more are candidates for bot accounts.
The paper focuses on figuring out accounts that are accessed from IPs on different ASes. This is a huge computation and so they resort to distributed processes a la Map-Reduce to solve it. The idea is that accounts that are accessed with bot patterns will form large connected components in the user-user (vertex for each user) graph. Edge weights between accounts are assigned based on the number of shared logins across ASes. To begin they look at connected componens with edge weights of 2 and then keep increasing the threshold by 1 until no component has more than 100 members. Large connected components are considered likely to be groups of bot accounts.

The authors compare two methods for computing the above. The first partitions on client IP and looks for accounts logged into by that IP. This unfortunatly requires sending lots of weight one edges over the network which is inefficient at they won't be considered. The second partitions on account names and looks for which IPs logged into that account. This allows some pruning of weight one accounts which makes it more efficient.

Comments
I thought this was an interesting approach, but I'm not sure how difficult it would be to subvert, if spammers knew it was running. For example, it would be easy to throttle the number of account signups and slowly increase them, to defeat the EWMA detection. Secondly, it seems fairly trivial to tie accounts to the AS that created them, thereby defeating the other line of defence. Obviously spammers don't do this at the moment as this system isn't deployed, but were it to be I'm sure they would adapt quickly. Perhaps a system based on IP prefixes (to not bias against DHCP) would be more effective. There is also the problem of many computers being behind a single IP, which this analysis will miss.

Still, I thought this was an interesting approach, and a nice use of a distributed framework for graph processing (which is basically what MR was originally designed for). Perhaps more subtle patterns and metrics could be used for a more difficult to circumvent system. I would keep this paper on the sylabus.

Not-a-Bot: Improving Service Availability in the Face of Botnet Attacks

Summary
This paper tackles the problem of separating user generated activity (emails, web requests and clicks) from the same activities being performed by a bot infesting the computer. The authors leverage the Trusted Platform Module (TPM) to start verified and trusted code to perform the necessary operations on the end hosts for their system to work.

The basic idea of the system is that actual human users can't generate all that many events over time. For example, it is very unlikely that a human will want to send more than one email per second (although emails with huge CC lists seem like a problem for the system). With this observation, the Not-A-Bot system proposes to have software request an attest (a token signed with a private key from the TPM) when it wants to perform an action (like send an email) which will check if an action from an input device (keyboard or mouse) has happened in a threshold number of seconds (1 in the paper), and if so, grant the attest that can be included in a header of the request. A verifier on the receiving end can use this attest to be more confident that the message or request is from a human a not a bot.

For added security the content of the message is part of the attest so it is impossible for an attest to be re-used by a bot to send multiple messages. Further, even though bots could simulate input actions they could still only send at the reduced threshold rate.

The authors implement NAB using the Xen hypervisor to prevent access from the untrusted OS into the attester's address space. They strip down Xen to reduce the size of the codebase that needs to be trusted. The paper also covers (in somewhat laborious detail) the usage of the TPM to verify the code size of the attester and to obtain the private key.

To evaluate their system the authors collected activity data from a number of users at Intel and overlay honeypot collected data for spam. They show that NAB can suppress 92% of spam email, 89% of bot generated DDoS requests, and 87% of fraudulent click-throughs. This seems to assume verifiers running everywhere.

Comments
I thought this paper took an interesting approach to solving the bot problem, but in the end I remain somewhat unconvinced of their methodology.

Firstly, even if bots could only send at (say) 1 email per second, there would still be a huge amount of bot generated email flying around given the sheer number of computers infected by bots.

Secondly, incremental deployment of this system seems difficult. If only a few end hosts are using NAB then all traffic needs to be treated as 'normal' anyway, since biasing against unsigned requests would certainly have very high false positive rates. If low numbers of servers are using verifiers then there is little gain to be had by running NAB on an end host and paying the extra costs.

Finally, the whole architecture seems a bit hard to swallow. It seems a bit much to ask users to run a hypervisor under their OS for this purpose and without the hypervisor the system is open to attack. The attester's size is verified at launch, before the host OS, but if the host OS can get at the address space of the attester it could easily be replaced with something that always grants attests, making the system useless.

This wasn't a bad paper, but given that the system seems a bit weak, and the lack of a serious networking component to the paper, I would probably remove this paper.

Thursday, November 19, 2009

Cutting the Electric Bill for Internet-Scale Systems

This was an interesting paper, and a bit of a shift from the types of papers we have read so far in the class. There are two main motivators for the paper:
  1. Internet-Scale systems are already widely replicated. This means major providers (the googles and amazons and so on) already have data centers in varied geographic locations and can route traffic for a particular application to any of a number of those data centers. In short, a request can be serviced (almost) equally by any of a number of data centers.
  2. Power does not cost the same amount in disparate geographic locations. The paper presents a lot of data on this front and determines that there is a large hour-by-hour pricing in differential between energy markets
Given the about motivators, it makes sense for a Internet-Scale system to route requests to data centers where power is as cheap as possible. This of course assumes that keeping data centers at low load allows one to save power. This is called the energy elasticity in the paper, and the authors claim that it is about 60% at the moment (that is, systems use about 60% of the power they use under load when then are idle). This is not a great number, but this is an active area of research and one can probably safely assume that it will only get better. Systems that can actually turn off machines under low load could help to achieve perfect (0%) elasticity.

The authors use numbers from Akami to run some simulations and show that significant savings are possible. One roadblock is the 95/5 billing model where traffic is divided into 5 minute intervals and the 95th percentile is used for billing. This inhibits the amount of work the system can actually offload. By relaxing this the authors can do quite a bit better.

I thought this was an interesting paper and one I would keep on the syllabus.

Scalable Applicaiton Layer Multicast

The (at least currently correct) assumption in this paper is that we are not going to have in network multicast anytime soon. Given this, the authors develop an application level multicast scheme that tries to get as close to 'ideal' in network performance.

The approach this paper takes is to build an overlay network out of participating applications and route multicast traffic using this overlay. The protocol they develop for this purpose is called NICE.

In NICE nodes are arranged in levels. Each level is grouped into nodes that are 'near' each other (distance defined by network distance and latency), and each group is assigned a 'leader' who is the graph theoretic center of the nodes in the group. This leader also participates in the next layer up with all the leaders of the groups below and a leader of that group is picked in the same manner. In this recursive fashion a tree is built which can be used to forward packets. To multicast a packet the leader of a group simply forwards to everyone else in the group. If any of those nodes are leaders of groups below them they will take the same action in those groups until everyone has the packet.

The authors evaluate the stress (how many extra packets need to go over a link compared to an in network scheme) and stretch (how many more hops packets take vs. in network) and also compare themselves to Narada, another application level multicast scheme. They come out ahead of Narada, but do impose significant overhead vs. in network schemes. The stress of roughly 1.6 in steady state would be a non-trival performance hit. Even worse the path length is much longer in NICE than in IP Multicast (about 23 hops vs. 15) which means many more chances for bottleneck links in the wide area.

I was also concerned about the join protocol. When a new node joins it finds a cluster close to it to join. If it is the center of the cluster is becomes the new leader. In the face of churn, many new leaders could generate a large amount of control traffic and an unstable tree. There was not really any evaluation of the robustness of NICE in the face of churn.

Wednesday, November 18, 2009

A Reliable Multicast Framework for Light-weight Sessions and Application Level Framing

The reliable multicast framework paper is interesting in it's development of their model. They begin with an application (wb, a multiuser whiteboard) and from there attempt to factor out what they deem traits that could be common to many applications that need multicast. Their overall approach is to keep as much functionality as they can in the client (as dictated by the end-to-end argument), doing only best effort delivery and requiring the clients to implement reliability and packet ordering enforcement.

The key contribution of this paper seems to be their mechanism for retransmitting lost packets. These losses are indicated by the clients, and the idea is to minimize the amount of extra data that needs to be sent. Clients determine a lost packet by their reception of a packet with a higher sequence number than expected. If this happens, a timer is set based on the distance of the client from the source (further away = longer timer). When this timer expires a request for retransmission is sent. If the packet is received while waiting, the timer is canceled. In this way it is expected that the first host after a lossy link will request retransmission and that it will get a prompt reply from the host just upstream of the loss. A small amount of randomness is added to the timer to try and still only use one request when two hosts equidistant from the source both miss a packet. In theory the randomness will allow one host to request retransmission first and since the response is multicast the other host will see it as well and cancel its timer.

This means hosts need to cache old requests so they can respond to retransmission requests. The amount of state needed to make this all work is not completely clear, nor is the issue addressed in the paper. Still, it's an interesting approach, and the authors run a number of simulations showing that if all state is kept their protocol performs well.

Thursday, November 12, 2009

Summary
The X-Trace paper presents a system for tracing causal paths through multi-host network architectures to help with debugging distributed applications. The authors identify the problem that modern systems are often made up of many, heterogeneous hosts and that a request must traverse a number of these hosts in its lifetime. Finding out why a given request is not performing as expected requires investigating all the hosts along that path, a difficult and time-consuming task.

X-Trace proposes to deal with this problem by adding a small amount of data in-band with each request. Each participating host must be modified to append its own data to the request, for example, an http-proxy adds some metadata, then pushes the request to the appropriate host, which adds its own data and so on. The recursive nature of this propagation assumes no particular architecture a-priori. Once the request is finished, the total metadata can be examined and used to build the resulting task tree. This tree can be used to discover what amounts of time are being taken at each component, and/or to identify any failures along the path at a particular component.

The paper then evaluates X-Trace in three example situations
  1. A web request with recursive DNS queries. Using X-Trace the authors are able to follow the recursive DNS query and isolate faults along the way. This also allowed them to see which cache the result was coming from and know why updated addresses might be out of date.
  2. A web hosting provider. Using X-Trace allowed for determining that a request was failing, say, in a PHP script, not in a DB query.
  3. An overlay network. Probably the most complex scenario, X-Trace allows for distinguishing between a number of possible faults in Chord including the receiving node failing, or a middle-box failure, either a particular process crashing, or the whole host going down.
Comments
I like this paper a lot. It gives a clear outline of the problem and proposes a useful and practical solution. I've used X-Trace in a couple of projects and it most certainly helps the debugging of a distributed application. In one case we were confused about why a request was taking so long. X-Trace allowed us to pinpoint the location in some client code that was doing a slow type conversion. There were at least 4 other systems involved in the request and each one would have to have been examined without the X-Trace data.

There is some concern that requiring modification to the client will hurt adoption. While this might be true, it's hard to see how something like X-Trace could work without such modification. I also think inter-node communication libraries like thrift might help the situation. We've added X-Trace support to the thrift protocol, making the actual client modifications a one line trivial change.

I would keep this paper on the syllabus.

NetFPGA: A Tool for Network Research and Education

Summary
Falling somewhere between pure software solutions like Click and completely custom hardware, NetFPGA gives students and researches a programmable hardware platform for designing network software. This allows users to actually interact with the hardware and deal with any associated issues, without actually having to build the hardware itself.

Users program the NetFPGA remotely by uploading programs which they can then send data to and receive data from. The devices can also be linked into the Stanford network so they can carry real data. Using this platform students have built an IP routers, making it clearly a good learning tool. However, to enable more complex research the new version has an on-board CPU, (the previous version did not, and was controlled by sending special packets from a remote controller) which is being used, for example, to write a 'shunt' that allows low cost intrusion detection by only offloading packets to the detector that really need to be examined.

Comments
There doesn't seem to be a whole lot of downside to this project. I do wonder what performance is like compared to a custom hardware router and Click, but this doesn't seem like a huge issue on a research platform.

Tuesday, November 10, 2009

A Policy-aware Switching Layer for Data Centers

Summary
This paper aims to solve a problem in data-center design. Currently, if an admin wants to ensure that traffic flows through a particular set of boxes (say a firewall then a load-balancer) there isn't a standard and simple way to do this. Either the boxes have to be physically interposed, which often isn't possible, or nasty tricks like messing with link weights needs to be used. The latter doesn't enforce a flow policy either, since link failure could still allow packets to avoid flowing through a particular box.

To solve this problem the authors propose "Policy Aware Routing". The basic idea is that a central location specifies declaratively what route packets should take. This specification is compiled into a set of rules that specify the next hop for a particular type of packet, given its previous hop. Routers are modified to check this information, consult their rule table, and forward packets appropriately. Middleboxes (firewalls, load-balancers etc) are taken off the physical network path. This both prevents unnecessary traffic to those boxes, and makes the addition of new boxes fairly simple.

To allow unmodified middleboxes to plug in frames are encapsulated and decapsulated at the switches so the boxes only see standard ethernet frames.


Comments
I liked the motivation for this paper. The current situation is clearly problematic and this paper proposes an elegant solution to the problem. There main issue I see is the increase in overhead and latency that this introduces. While the numbers from the paper might be acceptable for external applications requesting data from inside the data-center, inter-center apps, especially those that are network intensive, would probably be quite hurt by the added overhead of this system.

Still, this was an interesting paper and one I would keep on the reading list.

Internet Indirection Infrastructure

Summary
This paper develops a nice DHT use, that of Internet wide indirection routing. The core problem this addresses is that the point-to-point nature of IP makes it non-ideal when we have mobile hosts, and also for multicast or anycast. The i3 architecture allows these these to be expressed quite naturally.

The basic idea of i3 is that senders send data to an identifier, rather than to a particular address. Receivers express interest in data sent by publishing their interest in a particular identifier into the i3 layer. When data is sent, i3 can quickly look up all receivers who are interested in the data and their IP address and route data to them.

This routing is achieved by maintaining routing tables at nodes in a chord ring. Ids prefix match, but in the paper it is required that all prefixes be at the same node (thus allowing fast prefix matching). When data is sent to a particular id, it is forwarded to the responsible chord node, target hosts are determined locally at that node using longest prefix matching, and then forwarded on.

To support mobility, mobile receivers simply publish their new address under the same identifier and packets will automatically be routed to them. For multicast, many hosts can publish the same prefix and the responsible forwarder node can multiplex the packet. Likewise for anycast.

For more flexibility i3 allows data to be sent to stacks of identifiers, allowing packets to go to a number of i3 nodes before their final destination. This allows, for example, load balancing multicast requests by having them go through a tree of nodes rather than overloading the single node responsible for that prefix.

Comments
I liked this paper. It is a clever use of a DHT, and the architecture is explained well. The motivations and uses for i3 are also well covered. However, I was disappointed with the evaluation. There was no actual comparison with any other system. Clearly packets have to travel further in i3 than in IP, so a comparison against standard IP routing to determine the overhead of i3 should really have been performed.

I also feel the multicast/anycast should behave a bit differently than simple packet routing. i3 could be used almost like DNS for simply unicasting packets, where end clients cache IPs and send directly, while multicast/anycast would always need to go through the i3 layer.

This was an interesting paper and I would keep it on the syllabus.

Thursday, November 5, 2009

DNS Performance and the Effectiveness of Caching

Summary
This paper looks at how well DNS performs, and at the effectiveness of caching in the system. They use three nice network traces, two from MIT, and the other from KAIST in Korea, in which the authors collected all outbound DNS queries and their responses, and also all TCP SYN, FIN and RST packets.

The main findings of the paper are:
  • Host names fall into a Zipfian distribution (with the popular hostnames being looked up most of the time), which means that short TTLs should be effective as names that should stay in cache will be looked up soon again. This also means that sharing a cache a cache between clients doesn't help much, as the contents will be largely the same anyway.
  • Caching helps latency and works well since there is not much churn in the mappings of hostnames to IPs
  • Resolvers re-try too often and overload the network. A whopping 63% of the DNS query packets from the traces were from queries that got no answer, which indicates a much too aggressive re-try policy
Comments
It would have been nice to have a few more traces in this study. Also, why was UDP traffic ignored? The rise of file sharing since the writing of this paper has also almost certainly skewed the distribution of traffic away from web traffic and it would be interesting to revisit some of the results in this paper with that context.

The Development of the Domain Name System

Summary
This paper describes the design of DNS, which was needed as the old hosts.txt way of doing name resolution was becoming unwieldy. A decision was made to keep the new system simple (rather than a full ACID distributed DB) as the designers felt this would ease adoption.

In DNS each name is associated with a group of resource records with known types (today I imagine everything is IP). To find these records there are resolvers and name servers. Resolvers know how to query the system to resolve a name, name servers store the actual data.

DNS forms a tree, with the root nameservers at the root(s). This tree can be split into zones, where zones are usually simple subtrees in the overall structure. The administration of the tree starts at the top (so with root admins) and then at the 'root' of a zone, admin responsibilities change.

Comments
I enjoyed this paper. The authors give a good description of the system, and the rational behind it. However, the focus on keeping things simple doesn't really seem to have done so, and DNS today is actually quite complex and error prone. Perhaps a more robust structure at design time would have slowed adoption, but given the inevitability of a distributed name service it might have been worth having a system that would be easier to administer.

I also found it interesting that in 1983 each root server only handled about 1 query per second. This is an absurdly low query rate, and it actually reflects well on the DNS design that it can still work today with much much higher query rates.

Tuesday, November 3, 2009

Looking Up Data in P2P Systems

Summary
This paper takes a very high level look at a number of different DHT systems. It first motivates the problem, that we need distributed key lookup in the face of shifting nodes all administered by different users. We want to minimize both the number of hops it takes to find a keys true location as well and minimize the amount of state each node needs to maintain. In general the approach is to have the nodes organize themselves according to some geometry and then have each node know how to route a particular key request 'closer', in the geometry, to the node that's responsible for it. There are a number of possible geometries and the paper looks at some of the most common:

- chord: forms a ring and each node is assigned a position on the ring. nodes maintain information about other nodes 1/2, 1/4, 1/8 of the way around the ring, and so on. This way a key can be found in log(n) lookups. Nodes also maintain their immediate successors on the ring so in the worst case (all the more distant nodes are down) a request can just hop along one node at a time toward its location.

pastry/tapestry/kademlia: these use tree schemes in which each node knows of another nodes with a particular prefix (0,1,00,01...). Pastry maintains a leaf set of close nodes that it can route to and it can always route as long as more than half of those nodes don't fail.

can: forms a hypercube and nodes are responsible for a region of the cube. nodes maintain information about their neighbors in the cube and route along the straight line between their location and the location of the key.

Comments
This paper was very high level. If I didn't already know how chord and can worked I don't think I would now, and I don't feel that my basic understanding of the tree based systems was much improved either. It does present a good quick motivation for the problem, and perhaps as a first introduction to the topic it is a useful paper.

The main thing about multi-hop DHTs is that they seem like an obviously useful structure but they haven't actually found all that many uses. Aside from DNS, which isn't really the sort of thing this paper is describing, I'm not away of any major multi-hop DHT deployments. One-hop DHTs, on the other hand, have proven quite useful in systems like Dynamo or SEATTLE. Perhaps the inherent uncertainty in the number of hops for routing in this style of DHT makes them unattractive for real systems.

While this wasn't a bad paper, I would think that something that took a closer look at a particular type of DHT and some applications for it might be a better fit for the class.

Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications

The Chord paper was fun for me to re-read as it's actually a system I've implemented. Having done a few paper to actual system conversions I will say that the chord paper does an excellent jobs of laying out everything an implementor might need from good pseudo-code to detailed discussion of how the various components of a chord system should interact. The paper does make the system sound quite simple and easy to implement which is not entirely true, but the clear description certainly helps.

At a more general level chord is important as it's an approach to DHTs that is considerably simpler than the other models that where being proposed at the time. The one dimensional ring is much easier to reason about than CANs hypercube for example. The observation that efficient log(n) routing can be performed with such a simple structure is the most important contribution of this paper. The fact the current production DHTs like Cassandra and Dynamo are using ring structures (albeit with different routing algorithms) coupled with the analysis from the other paper we read this week shows points to this being the 'right' choice in the DHT space.

This paper doesn't have a lot to criticize. They present an elegant algorithm, good performance results, and analyze the system in failure/churn situations. Where I think this paper misses the mark, as did all the early DHT papers, is in assessing their target market. Early DHT work saw itself as aimed at the P2P community where it could replace gnutella or napster. This was misguided for two reasons, firstly that bittorrent was already pretty much taking over the P2P space and secondly that exact key lookups do not map well into the P2P space. People want keyword search and DHTs do not provide a simple way to make this happen. Systems like Cassandra or Dynamo show that DHTs are better suited as large data stores in cluster environments where admins can have total control of all nodes in the system and can therefore use more restrictive and efficient routing algorithms.