It has been a while since I wanted to do this post. As opposed to other occasions, the delay was not related to lack of time, but instead was a consequence of a long period of study.
With a networking background, mostly based on the OSI layered model, my understanding of computer communications was standards-based. I thought networks should be implemented with rigid compliance to the standards and protocols. Room for “network applications” or centralized control was the job (or let’s be fair, business) of major companies.
A centralized overview/control/configuration of the network enables escalable deployments. This has been proved many times over with, for instance, Cisco solutions like the Unified Communications solution. I remember Cisco Aironet Controllers for centralizing the configuration, AAA and surveying of WiFi deployments. Leaving the wireless side, we now see how VMWare early efforts to increase flexibility on virtualization solutions led to what we now know as Software Defined Networking, or SDN (many other things have and are shaping the evolution of this concept, but this is not the point of this post).
SDN proposes a big change of paradigm in what networks should be and how they should react to applications and traffic demands. In conventional networking, forwarding decisions are based on each packet’s header information. Forwarding protocols, like routing protocols, exchange control information with neighbors in order to “discover” the network topology. The information gathered is then processed by a routing algorithm that (very often) returns the best next-hop towards the intended receiver. The information is then used to reach the receiver, and encoded in each data packet in the form of headers (IP headers, for instance). These headers are read (and often rewritten) by each hop towards the receiver.
I won’t trouble you with more details about conventional networking. In the end, the take away should be:
- Forwarding decisions are based on tables. The information to fill these tables is gathered by each device (with the use of flooding, packet inspection, or routing algorithms)
- When a chunk of data arrives at a forwarding device, it commutes it to the corresponding output port based on the header information and matches found in its tables.
- Forwarding information, or tables, are based on Medium Access Control (MAC) and IP addresses (there are other tables, but seldom or never used for forwarding actions). MAC addresses are used by Switches to commute packets inside the same IP network, while the latter allows Routers to route packets through different networks, like the Internet.
The aforementioned are the pilars upon which conventional packet forwarding is done. That is, in conventional networking.
The shift towards SDN removes the task of creating and maintaining tables from forwarding devices. Instead, all tables are filled, edited and managed by a centralized SDN Controller; while the forwarding devices cache forwarding information for faster processing.
Key concept: when a forwarding device, or datapath, is not able to find what to do with an incoming packet (based on header information and its cached tables), it triggers an event sending a message to the Controller via a TLS tunnel (be it over the same data network, or using a completely separate control network). The network application running in the centralized Controller should handle this event and proceed to insert the appropriate table entries on the datapath’s cache.
Openflow is the (for now) agreed-upon interface between the Controller and the datapath’s tables. That is, we can safely say that it provides an Application Programming Interface (API) to modify the forwarding tables, and allows to generate events at the datapaths that can be handled at the Controller. Furthermore, it provides different Match fields (on top of just source and destination MAC or IP addresses), so forwarding decisions can also depend on other packet’s characteristics/headers, like: input port, output port, VLAN id, TCP/UDP source or destination ports, ICMP type, VPN tunnel id, MPLS tag, and many more. This not only provides a fine-grained description of the type of communication traversing the datapaths, or flows, but also allows to construct intricate conditions before determining the forwarding path (for example: if packet P arrived at in_port=2, with ip_dest=x, and udp_source=y, then change source and destination MAC addresses headers and forward P through out_port=1).
It is the role of the Controller application to catch, and correctly process the events generated by a any datapath in the network. After all, the controller is now the Brain of the network.
I found it natural just to ask “if all this information can be readily available, why can’t we built more efficient forwarding schemes?”. Well, we can. GossipMaximus, for instance, attempts to collect metrics of the network at the Controller and condition the forwarding decisions using said information.
Suppose the following network:
In the figure:
- All devices are Linux computers. As indicated by their names, some of them are running the Openflow Open vSwitch (OVSK) soft-switch.
- Only devices specified with the OVSK name are managed by Ryu (Controller).
- GW1 and GW2 represent two different paths from Source to OVSK-Server. Nothing more.
- Mgmt. is a conventional Layer-2 switch. Its sole purpose is to provide another network for communication between datapaths and Ryu (as well as a path for out-of-band configuration and monitoring).
Using the Experiment topology (or just topology from here forth) I designed a test where flows from Source need to reach OVSK-Server; either through GW1 or GW2.
In conventional networking, one must configure all devices’s routing tables one by one, until the communication is achieved (or wait for a convergence of a routing protocol, fill the ARP tables, an so on). Load balancing between these two paths may require other tools, or we can use Link Aggregation Control Protocol (LACP) by removing GW1 and GW2 and just plugging OVSK and OVSK-Server together. As more paths towards OVSK-Server are introduced, more configuration is required and the possibility of providing load balancing, or congestion aware forwarding increases its complexity.
On the other hand, we may write an SDN Ryu application that constantly queries OVSK’s ports, and then catches the response and process the information before attempting any forwarding action. These queries are specified in Openflow, and can be port-based or flow-based. A typical PortStatsReply message provides counters regarding: successful tx and rx, failed tx and rx, tx and rx errors, and even collisions (albeit Ethernet). With this information one can derive metrics to control the forwarding path. For example, knowing the period between queries and the number of successfully transmitted bits during such period, one can derive an indicator of congestion of a determined link.
Now, suppose we generate 20 Mbps contant bit-rate (CBR) uplink traffic for 100 seconds from Source towards OVSK-Server (link capacity is 100 Mbps). With conventional networking (using a single path from transmitter to destination), we expect the average throughput per flow to decrease once the aggregate throughput traversing the path exceeds link capacity. As shown in the figure below.
The question then is, can we keep the average throughput per flow flat by using both paths (through GW1 and GW2)? I attempted this using GossipMaximus, as shown in the figure below.
GossipMaximus queries datapaths each second and derives an estimation of link congestion. In the experiment, every time OVSK received a new flow (purposefully, each flow pointed to a different UDP destination port) it generated an event. Ryu catches such event and looks at the congestion indicator through all of OVSK’s paths. It then picks up the path with less congestion and injects this forwarding information into OVSK’s table cache. All subsequent traffic of the matched flow will follow the same path.
As shown in the figure, by conditioning the forwarding decision upon GossipMaximus’s congestion estimation metric, I was able to maintain the maximum average throughput per flow for a greater number of flows.
I think the ability to perform Openflow matches to header fields other than IP or MAC addresses, is by itself a great tool (there is people working on arbitrary bit matching). On top of that, the centralized control and configuration of the network allows to create network slices. That is, forwarding paths designed for certain flows (matching certain characteristics) that may required lower delay, or higher throughput.
I encourage you to keep looking at this SDN thing, specially because it enables virtually anyone to create network applications that can provide enhanced infrastructure services to upper layers. For instance:
- Adapt forwarding paths according to special conditions.
- Generate new and efficient ways to query the network and derive interesting indicators.
- Monitoring applications, and as (Ryu) applications/Classes can be instantiated,
- build complex network event handlers and actuators.
Right now I’m working on collecting other kind of metrics to derive Round-trip Time (RTT) estimations for both paths towards OVSK-Server. I plan to analyze how this metric changes as the number of flows traversing the network increases; hopefully to derive an (educated guessed) indicator/threshold that may further condition the forwarding decision. This can be useful for time-sensitive flows.
I will keep writing about the inner-workings of GossipMaximus in a later post.