We demonstrate how simple and common failures in a network application may lead to loss of the control layer, and in effect, loss of network control. To address these concerns we present the ROSEMARY controller, which implements a network application containment and resilience strategy based around the notion of spawning applications independently within a micro-NOS.
Among the leading reference implementations of the Software Defined Networking (SDN) paradigm is the OpenFlow framework, which decouples the control plane into a centralized application. In this paper, we consider two aspects of OpenFlow that pose security challenges, and we propose two solutions that could address these concerns. The first challenge is the inherent communication bottleneck that arises between the data plane and the control plane, which an adversary could exploit by mounting a “control plane saturation attack” that disrupts network operations. Indeed, even well-mined adversarial models, such as scanning or denial-of-service (DoS) activity, can produce more potent impacts on OpenFlow networks than traditional networks. To address this challenge, we introduce an extension to the OpenFlow data plane called “connection migration”, which dramatically reduces the amount of data-to-control-plane interactions that arise during such attacks. The second challenge is that of enabling the control plane to expedite both detection of, and responses to, the changing flow dynamics within the data plane. For this, we introduce “actuating triggers” over the data plane’s existing statistics collection services. These triggers are inserted by control layer applications to both register for asynchronous call backs, and insert conditional flow rules that are only activated when a trigger condition is detected within the data plane’s statistics module. We present Avant-Guard, an implementation of our two data plane extensions, evaluate the performance impact, and examine its use for developing more scalable and resilient SDN security services.
The performance and operational characteristics of the DNS protocol are of deep interest to the research and network operations community. In this paper, we present measurement results from a unique dataset containing more than 26 billion DNS query-response pairs collected from more than 600 globally distributed recursive DNS resolvers. We use this dataset to reaffirm findings in published work and notice some significant differences that could be attributed both to the evolving nature of DNS traffic and to our differing perspective. For example, we find that although characteristics of DNS traffic vary greatly across networks, the resolvers within an organization tend to exhibit similar behavior. We further find that more than 50% of DNS queries issued to root servers do not return successful answers, and that the primary cause of lookup failures at root servers is malformed queries with invalid TLDs. Furthermore, we propose a novel approach that detects malicious domain groups using temporal correlation in DNS queries. Our approach requires no comprehensive labeled training set, which can be difficult to build in practice. Instead, it uses a known malicious domain as anchor, and identifies the set of previously unknown malicious domains that are related to the anchor domain. Experimental results illustrate the viability of this approach, i.e. , we attain a true positive rate of more than 96%, and each malicious anchor domain results in a malware domain group with more than 53 previously unknown malicious domains on average.