Cumulus Networks

Cumulus Networks Technical Guides

Cumulus Networks has assembled a number of long form technical guides that provide deep dives into various features. These guides include deployment guides, solution briefs and white papers. Most of these documents are on our main website, but we are migrating them to this page.

Currently, the available documents here include the following:

Campus Architecture Solution Guide

Executive Summary

Cumulus Networks, with the potent combination of the Cumulus Linux network operating system and the Cumulus NetQ real-time visibility and telemetry platform, brings the proven success of open and disaggregated networking from the data center into campus networks. The Cumulus Networks vision of open standards driving open networking hardware with software disaggregation creates powerful opportunities to significantly reduce both CapEx and OpEx and streamline network operations.

This guide walks a network architect or administrator through the various technology options, design considerations and recommended choices for building a highly available modern campus network.

Cumulus Linux Network Operating System

Cumulus Linux is the industry-leading open network operating system (NOS) that drives open networking switches built on commodity hardware and switching silicon. It is the key that brings the same hardware and software disaggregation advantages to the network that IT departments already enjoy for their application servers and storage resources.

Incumbent and traditional networking vendors create lock-in by bundling a proprietary operating system on custom built and proprietary networking hardware. Open networking switches often contain the same hardware forwarding chipsets as traditional vendors. When you need to run Linux on a server today, you have your choice of any bare metal server from any manufacturer that meets your needs and budget. The commitment to the operating system is a separate choice from the hardware that it runs on.

Cumulus Linux is an open, native Linux distribution that allows for choice in hardware to provide a wide range of speeds, feeds and features. By being an open NOS, administrators are provided with maximum hardware and application choice. The same monitoring and operational tools already in use for Linux systems in the enterprise today can be easily reused to operate and monitor the network infrastructure.

Cumulus Linux is fully featured networking software providing the entire suite of enterprise networking features, including:

Since Cumulus Linux is a full Linux distribution based on Debian, administrators have the full power of a Linux system at their fingertips. This provides the ability to run containers or on-box agents for functions like streaming telemetry via NetQ, Telegraf or Prometheus or native automation support for tools like Ansible, Saltstack or Puppet.

Cumulus Hardware Partners

Cumulus Linux runs on over 100 open switch platforms from more than 10 hardware manufacturers, providing the widest choice and ultimate control over supply chain, procurement, speeds and features. Networking hardware from multiple vendors can become unified and simple to operate with Cumulus Linux running as a standard NOS. Cumulus Linux supports the widest range of platforms in the industry from open hardware vendors and is now extending this freedom and choice to the modern campus.

Refer to the hardware compatibility list for complete and detailed hardware platform support.

Cumulus NetQ Streaming Telemetry Platform

Built using the same open, modern, web-scale principals that inspires Cumulus Linux, Cumulus NetQ is a highly-scalable, modern network operations toolset that provides unprecedented visibility to your network operations in real time. Cumulus NetQ is designed from the ground up to process large amounts of data to deliver real-time insights and perform complex network-wide troubleshooting and validation tasks at nearly any scale.

Adding Cumulus NetQ to a network powered by Cumulus Linux heightens visibility, eases troubleshooting, and enriches existing monitoring and operational strategies with an all-in-one tool that provides information such as:

Cumulus NetQ provides a rich graphical UI, a command line interface, as well as a RESTful API to provide maximum flexibility to retrieve data about the real-time state of the entire network, and you can even go back in time.

In addition to monitoring Cumulus Linux network nodes, Cumulus NetQ agents can be installed on other Linux servers to extend visibility out to beyond the edge of the network. Cumulus NetQ agents monitor the Linux kernel and stream timestamped event and health data to a telemetry server either on premises or in the cloud. With this system-wide, timestamped telemetry, Cumulus NetQ delivers rich analytics and verifications, even against historical data points in the past, providing another dimension to troubleshooting and operations capabilities. The Cumulus NetQ agent can be installed on bare metal Linux hosts or inside Linux VMs.

The Cumulus NetQ agent currently supports:

All data from the Cumulus NetQ agent is streamed to the Cumulus NetQ appliance. This appliance is available as either a bare metal appliance from Cumulus Networks or as a virtual appliance for use on nearly any hypervisor.

Campus Networks Overview

Traditionally, a campus network refers to the portion of a network architecture that delivers connectivity to end users and connected devices across a single geographic location. The size and other physical characteristics of the single geographic location can vary significantly. A campus network may be a single floor, wing, building or cluster of buildings. Campus networks are built by connecting groups of smaller LANs over small physical areas.

The enterprise that owns the campus network normally owns and manages the physical wires and data infrastructure. Often, these physical characteristics guide and constrain design and architecture options in addition to business and application requirements. As a matter of practice, campus networks usually aim to build the fastest functional network that the physical infrastructure can support.

Enterprise IT is consistently under pressure to reduce costs while maintaining a highly available infrastructure to drive business functions. Automation of the network and its operations is proving to be the most useful strategy to deliver a range of benefits. The momentum of automation and its success in data centers and cloud environments are creating new opportunities for similar CapEx and OpEx reductions throughout the rest of the enterprise. Those that take advantage of automation enjoy benefits such as:

The networking industry is being revolutionized and more fully enabled by automation. By providing the full and untethered power of Linux to its users, Cumulus Linux is the original “automation first” network operating system allowing you to manage your network nodes as if they were any other Linux server.

Principles of Campus Network Design

As network designs and architectures are being planned, it is important to consider emerging networking trends and future needs of the enterprise. Several dimensions influence and guide campus design, such as:

The structure of the campus network is based on the concepts of hierarchy and modularity. Like all complex systems, networks are designed, built and grow as discrete sets of smaller building blocks. This allows for flexibility and customization depending on needs local to the users or business applications in that area and the ability to scale or expand by adding additional building blocks. A hierarchy allows for each layer or tier to specialize and optimize for its unique role.

Classically, a campus network hierarchy is modeled as three tiers:

  1. Core
  2. Distribution (Aggregation)
  3. Access
Three-tiered hierarchical model

Three-tiered hierarchical model

The three-tiered model is useful for understanding the distinct roles of network devices to provide a framework to customize and tailor for your own most effective use. In practice, there are no absolute rules that govern how a campus network must be physically built. Many larger campus networks are constructed with three distinct tiers, while a smaller network may find enough overlap and cost savings in the distribution and core tier functions that they can be collapsed into a single tier. A hierarchical design describes each tier’s unique set of functions, services and specific role in the design.

Core Tier

In large and multi-site campus environments, where there are multiple sets of distribution layer switches, a core layer is often used as an interconnection. The chief design goals for the core layer are based on delivering a nonstop forwarding service and ensuring there is an appropriate level of redundancy to provide seamless failover in the event of a failure of any component. The core must also be designed to allow for the occasional hardware or software upgrade or change without disrupting enterprise applications.

These design requirements drive a very simple and reliable design built on layer 3 routed point-to-point links. The core should not implement any complex policy services such as load balancers and firewalls, nor should it provide any direct user or device access. Connectivity into the core from the distribution layer and connections between core nodes is layer 3 only. A strictly layer 3 core allows for the elimination of layer 2 control plane protocols such as MLAG and spanning tree and affords a much simpler, reliable and scalable core.

Distribution (Aggregation) Tier

The distribution layer has a unique role by serving as a boundary between the access and core layers. A critical function of the distribution layer is segmenting the network to contain disruptions and changes to a smaller area. It supports a number of important services that fall outside of the highly specialized goals of the access and core layers. First, this layer serves as the aggregation point for all of the access switches in a segment of the campus. Often, routing, firewalls, load balancing, and other policy-based enforcement services occur in the distribution layer. It plays a role in the core layer routing design, acting as a summarization and redistribution point.

A discrete distribution layer is usually required for sites with more than a few access layer switches. It quickly becomes impractical to interconnect all access switches together directly, compared to aggregating at a distribution layer. The configuration and platform choice in this layer are often driven by requirements in the access and core layers and the needs of other network and policy implementation services such as wireless or security.

Access Tier

The access layer is the edge of the wired network and is where end user devices are attached to gain connectivity to the rest of the campus. Some devices attached at the access layer create a micro-extension beyond an access switch’s physical port. Devices such as wireless access points and IP phones are two notable examples that provide small extensions to the access layer.

Wireless clients and workstations behind IP phones are not directly connected to access switch ports

Wireless clients and workstations behind IP phones are not directly connected to access switch ports

The wide variety of devices that connect to the access layer, along with the large security attack surface, requires the access layer to be one of the most feature rich parts of the campus network. It provides security, QoS, device discovery and even physical infrastructure services like power over Ethernet.

Building Campus Networks

Segmentation

Designing a high speed, highly available and scalable network requires careful attention to many important details, which are covered in this guide. Attention to detail becomes increasingly challenging as the size of the campus network grows. To optimize design for local needs and as a means to be thorough and deliberate, modularizing or compartmentalizing a campus network is usually required when there is more than a handful of network devices in the topology.

Logically segmenting or dividing a network into sections is not always straightforward with clear boundaries. Many portions of the network rely on each other to function properly, so boundaries can be difficult to determine, especially when considering physical geography and proximity. The logical segment boundaries do not necessarily have to fall perfectly along the lines of the three-tier hierarchy model. There are no absolute rules that govern how a particular campus should be segmented. Below are some examples of common divisions of campus networks.

Core

Sharing a name with and also mapping to the hierarchical model tier, the core is the layer 3 backbone that interconnects the other segments of the campus network. A dedicated core segment consists of at least two nodes that are distinctly different from the layer 3 devices that terminate a WAN or service provider connection. Not all campus networks require a dedicated core.

The core segment optimizes for speed, simplicity and high availability. There may not be any native layer 2 extension into or across the core. As will be discussed in a later section, overlay networks and tunneling protocols such as VXLAN are now the de facto solution to extend layer 2 connectivity across layer 3 boundaries such as a campus core.

A campus core network interconnecting several other campus segments, including a large server room

A campus core network interconnecting several other campus segments, including a large server room

Aggregation and Access

This is the most familiar segment of a campus network. This includes the access and distribution layers of the traditional three-tiered hierarchical model. For large scale design, often with multiple buildings and multiple sets of distribution layer switches, it is useful to logically group a distribution switch and all of its southbound devices as a logical unit connected to the core.

Different aggregation & access segments can be designed differently to meet local device and user needs. Some business applications create much more east-west traffic (within a segment) versus north-south traffic (in and out of the network segment) which may drive the need for higher speeds or higher numbers of uplinks from the access layer.

High port density deployments may take advantage of breakout or fan-out cabling at the distribution layer to support extremely large numbers of access switches. For example, a 32x100G Trident3 based switch can have each 100G port split into 4x25G. This can allow for support of up to 128x25G ports or up to 128 switches to be interconnected. With access switches each having 48 ports x 128 switches, this example can provide 6144 access ports for end device connectivity.1

An access and aggregation block

An access and aggregation block

Depending on each use case, services such as security, wireless and VoIP may be implemented inside of this segment, but can also warrant their own logical segmentation depending on size and complexity.

Services

Some designs may require a set of enterprise network applications and services that, with enough scale or with unique enough requirements, are best suited as their own segment. A call center, for example, may require a large number of voice gateway and call processing nodes along with tight security compliance requirements for payment processing or patient health records.

A fully-featured enterprise campus may include wireless controllers, security appliances, voice and video services, server load balancers, identity and access control services, video surveillance and more. Depending on the geography and deployment models of these services, it can be useful to centralize them into a logical network segment of their own to be able to optimize platform and design choices.

Data Center or Server Racks

Some campus networks provide connectivity into a data center, server room or a set of server racks. Like other specialized segments, the connectivity requirements and needs of data centers and servers drive different platform and feature choices. Depending on the size and use case, a data center may connect to a distribution layer or directly into the core.

Data center networks are designed to accommodate high volumes of east-west traffic and optimize for speed, low latency and high availability. Data centers and server rooms with more than one rack are normally built using a leaf/spine or Clos style design. The close proximity of racks provides for minimal impact with increased cabling complexity.

Clos or leaf/spine-style network segments. Large server rooms with two or more full racks should deploy a leaf/spine network design.

Clos or leaf/spine-style network segments. Large server rooms with two or more full racks should deploy a leaf/spine network design.

WAN/Edge

Almost all campus networks have a connection to a WAN of some form. Similar to the data center, a WAN/edge segment has unique requirements that drive specific platform and feature choices. Circuit speed changes from the LAN to the WAN create the need for rich QoS with deep buffering capabilities to smooth out bursty traffic flows. Serving as the demarcation and handoff from a service provider can drive the need for features such as MPLS or may require large hardware forwarding table sizes to accommodate internet routing tables.

A WAN/edge segment may also include dedicated security appliances to perform the first layer of policy enforcement and monitoring. VPN and tunneling services are often implemented here to connect geographically disparate sites together or provide remote access for field and remote workers. The WAN/edge section may also include a DMZ (demilitarized zone) to help quarantine necessary inbound public internet traffic.

Design Considerations

Collapsing or Consolidating Layers

In cases where the network is contained to a single floor or even with multiple adjacent buildings and a sufficient number of fiber runs, it is possible to collapse the core into distribution switches as shown in the figure below.

A cluster of buildings with interconnected distribution switches and no dedicated core layer

A cluster of buildings with interconnected distribution switches and no dedicated core layer

Similarly, it may be possible in some instances such as a very small building or sparse connectivity needs to combine the access and distribution layers into a single switch or pair of switches and route directly into a core layer.

A large campus network with collapsed distribution and access segments that route directly into the layer 3 core

A large campus network with collapsed distribution and access segments that route directly into the layer 3 core

Layer 2 vs Layer 3 ECMP Device Connections

Often, application or business requirements create a need for layer 2 (VLAN) adjacency between endpoints. VLANs are used to group together devices that are in the same geographic area and have similar network service and policy enforcement needs. Switches use layer 2 links (trunks) to extend layer 2 connectivity to other devices across the network. At reasonably small scales, this works well with little reason for concern.

VLAN extension can become problematic with large numbers of switches or when a layer 2 adjacency needs to span a wide geographic area. A layer 2 segment is a shared failure domain, and the behaviors of spanning tree can lead to network brownouts, or temporary periods of reduced bandwidth or increased latency. The nature of broadcast domains and shared trunk links exposes a risk that loops or storms in one VLAN may affect all other VLANs on those trunks.

In addition, layer 2 networks running spanning tree result in topology changes or failures in one area having ripple effects, triggering reconvergence across the entire domain. This reconvergence can cause temporary traffic outages as spanning tree blocks ports to prevent loops. Plus, in large layer 2 networks MAC address tables are larger, especially at the distribution layer, and as a result, hardware ASIC limits may need to be closely monitored. Further, scaling a distribution layer beyond two switches becomes particularly difficult for spanning tree to reliably converge quickly. Most layer 2 designs rely heavily on proprietary MLAG protocols to form layer 2 trunks to a pair of switches. This dependency creates the undesirable side effects of increasing control plane complexity, introducing an additional point of failure and another dimension of vendor lock-in.

By contrast, interconnecting network devices using layer 3 routed equal cost multipath (ECMP) links provides a few distinct advantages over layer 2 trunks. By keeping layer 2 broadcast domains smaller, spanning tree scale and convergence issues are kept in check. In some campus network designs, routing can occur at the access layer switches. This removes the need for uplinks into the distribution to be layer 2 bonds. What follows is a reduction or elimination of MLAG as a control plane dependency and the peer links or inter-switch links that are required to support the pairing. Additionally, since distribution layer switches only perform layer 3 routing in this design, it addresses many of the layer 2 (MAC address) hardware table size concerns.

The emergence of EVPN for VXLAN enables network architects and designers to realize the benefits of a layer 3 ECMP network while still providing layer 2 extension across the campus. The section below on EVPN and VXLAN discusses more details about how to achieve layer 2 extension over a layer 3 network to achieve the best of both layer 2 and layer 3.

BGP Unnumbered

BGP Unnumbered is a Cumulus Linux feature that comprises two major tasks for configuring BGP, which drastically reduce configuration complexity and ease deployment.

Traditionally, when configuring BGP between two devices, an IPv4 address is needed to serve as the binding point for the session. For large numbers of point-to-point layer 3 links, like in a core, it can be burdensome to allocate and manage internal IPv4 subnet space for large numbers of point-to-point links.

The IPv6 standard prescribes that all IPv6-enabled interfaces automatically generate and assign a link local IPv6 address in the fe80::/10 address space. BGP Unnumbered takes advantage of this feature and eliminates the need to configure any IPv4 addresses on routed links for BGP.

The other major problem that BGP Unnumbered solves is that traditionally, BGP configuration requires an explicit peer IP address to be configured. This is the address that the BGP peer will open a TCP socket to, for BGP session establishment. BGP Unnumbered also solves this problem by listening for IPv6 router advertisements and dynamically building a BGP session to the IPv6 link local address in the router advertisement that it receives from its peer on the other side of the point-to-point link.

Another important component, RFC5549, provides for IPv4 routes to be advertised with IPv6 next hops. This enables an IPv6 network, even one built entirely out of automatically generated link local addresses, to be able to advertise IPv4 reachability from end to end without a need for IPv4 address assignment.

Because of this, BGP is much simpler to configure. Instead of requiring a peer address for each BGP neighbor, we can refer to the physical interface identifier and only need to specify if it is an internal (iBGP) neighbor or an external (eBGP) neighbor. This makes building and configuring a layer 3 routed network nearly plug and play.

For more information on BGP Unnumbered and eBGP inside the network, download the free ebook here: https://cumulusnetworks.com/lp/bgp-ebook/

EVPN and VXLAN

EVPN as a control plane for VXLAN has emerged from the data center as an ideal solution to provide the best of both layer 2 and layer 3 designs. Network designers and architects are beginning to apply the proven successes of building a mostly layer 3 routed network using BGP Unnumbered then applying EVPN and VXLAN to provide VLAN extension and layer 2 connectivity to anywhere in the campus it is required.

The key benefit of VXLAN is that layer 2 VLANs can be extended over a layer 3 routed network. This provides many of the same VLAN and MAC mobility benefits that drive layer 2 designs, while allowing for the network infrastructure to inherit the reliability and scale benefits of a routed layer 3 design.

While MAC address learning and move (mobility) convergence for EVPN with VXLAN is fast and satisfactory for most enterprise applications, use cases that require seamless mobility with nonstop packet forwarding across an entire layer 2 domain cannot currently be satisfied with a BGP EVPN-controlled VXLAN topology.

EVPN is an extension to the BGP routing protocol and natively supports multi-tenancy with VRFs. Using BGP to manage routing for the layer 3 network and also EVPN for VXLAN further reduces operational complexity by using the same control plane protocol to drive the entire network.

VXLAN is an encapsulation protocol that tunnels Ethernet frames between network endpoints over layer 3 IP networks. A network device that performs VXLAN encapsulation/decapsulation for the network is referred to as a VTEP (VXLAN tunnel endpoint). When a VTEP sends a frame into a VXLAN tunnel, it wraps the original frame in a VXLAN header that includes the VNI (virtual network identifier). The VNI maps to the original VLAN that the packet belongs to at the ingress switch. After applying a VXLAN header, it is encapsulated into a UDP/IP packet for transmission to another VTEP over an IP network.

When the VXLAN packet arrives at the remote VTEP, it reads the VNI from the VXLAN header. The VNI identifies which VLAN or interface the tunneled frame should be processed on after stripping off and discarding the VXLAN/UDP/IP encapsulation. After decapsulation and reclassification to an interface by the VNI, the original Ethernet frame is able to be bridged or routed again as if it were a regular inbound packet.

Multi-tenancy

Multi-tenancy is the concept that describes the logical separation of the network inside of shared network devices and across shared links. Isolation is achieved in a multi-tenant network by the use of VLANs at layer 2 and VRF (virtual routing and forwarding) for layer 3. BGP EVPN natively carries VRF specific, IP prefix routes. It is possible to achieve granular isolation and traffic control in several different ways.

The ideal multi-tenancy solution depends mostly on the size and scale of each tenant and the network architecture. Small networks built with layer 2 links are usually able to achieve isolation by using simple layer 2 VLANs. Isolation is enforced at the layer 3 boundary, or default gateway, where policy and security enforcement can take place.

In most traditional multi-tenancy deployments a policy device like a firewall isolates security zones, but this requires two subnets within the same security zone to still traverse the policy device, either creating a need for a larger firewall or limiting throughput of intra-zone traffic.

When using a layer 3 network, VRFs provide isolation by keeping IP traffic between two VRFs entirely isolated from each other. VRFs implement an entirely different and independent routing and IP forwarding table in the Linux kernel. Routing between VRFs is achieved through route leaking or by forwarding traffic through VRF-unaware paths. As a result, two hosts in different subnets but still within the same policy zone can use their local top of rack switch as a default gateway and only traverse the firewall or policy device when crossing between security zones (VRFs).

To provide information about VRFs and tenant reachability throughout the network, BGP-EVPN is used as a control plane protocol between network devices. EVPN speakers exchange their list of locally configured VRFs as well as the associated hosts within that VRF. It is through this exchange that routing policy is learned and traffic can be either routed locally within a VRF or through a policy device between VRFs.

Switch Stacking

Stacking is the concept of combining multiple smaller switches together to operate and be managed as a single unit. On the surface, it provides a number of benefits, but like any other aspect of networking, carries a few tradeoffs. At the time when switch stacking was conceived, network operations were a very manual CLI process. As networks scaled, recurring operating expenses also increased with the need for more manual operators. In response, network vendors looked for ways to consolidate points of management. A switch stack is a means to that end.

In addition, most single hardware ASICs at the time were unable to deliver wire-speed performance on all ports with a single chip. To achieve the required port density and attain wire speed forwarding on all ports, multiple ASICs had to be linked to create switch stacks and chassis. By contrast, commodity switching ASICs that power today’s open networking switches can support up to 128 ports on a single chip in a 1U form factor.

The other hidden risk in stacking is the complexity in the stacking management plane. This requires a highly complex and coordinated effort and state progression between the individual switches to join together and act as a single logical stack. This creates rigid constraints in platform choice since all switches in a stack must be from the same vendor, same model or family and usually the same code versions. Adding greater management plane complexity by implementing stacking is at odds with the broad goal of keeping management and control plane simple and reducing dependencies on proprietary protocols. Complex features tend to fail in complex and difficult to predict ways.

Automation tools such as Ansible, Saltstack and Puppet address the need to consolidate and reduce management points. Cloud scale operators are able to use automation to manage 40,000 servers with a single operator. Enterprises are noticing the success and impact of automation in their data centers and are applying these strategies into the rest of their campus networks.

Over time, switch stacks have become ingrained into legacy campus design. Being a modern network operating system and by following network design philosophies and principles that we believe are best, Cumulus Linux does not support switch stacking. In instances where switch stacks currently exist or are being considered, multiple 1U devices acting as standalone devices can be used to provide the connectivity that a switch stack would offer.

The recommended designs in the section below illustrate how to build campus networks without needing to configure and group switches using stacking. For a more detailed discussion about stacking, visit our blog.

Chassis Switches

Similar to the concept of stacking, chassis switches have traditionally been a way to scale port density while maintaining a single point of device management. Chassis improve on a switch stack in a few minor ways, but tend to carry a higher cost. A chassis switch can also provide for embedded and on box services like security or wireless. Chassis switches tend to be used more often in the core and distribution layers while switch stacks tend to be used in the access layer.

Chassis create lock-in and a certain inflexibility of the network devices and services that it can support. Cards or blades are proprietary form factors, making it impossible to act as standalone devices. A chassis is usually a long term investment due to its high initial cost combined with the idea that components of the chassis can be upgraded as capabilities and speeds improve over time. This continues to drive lock-in and forces long term decisions in other adjacent areas such as operations and management.

Open networking switches are almost exclusively built in fixed and non-modular form factors and support a wide range of pluggables. The Recommended Designs section below illustrates how campus networks can be built at scale without the need for large port-dense chassis switches.

For more information about switch platforms that run Cumulus Linux, refer to our hardware compatibility list.

Small Office - Remote Office

Small campus network sites such as remote and satellite offices are often able to collapse all LAN switching functions into a single switch or single pair of switches. A dedicated WAN/edge router is required and security devices may also be implemented near the WAN edge depending on business and compliance needs.

For these small networks, inter-subnet routing can be performed on either the LAN switch or on a layer 3 device upstream. Performing routing upstream provides the opportunity for more robust and granular security and policy enforcement beyond what a LAN switch can provide in its hardware forwarding path, but may not be able to meet performance requirements. Inter-subnet routing can be performed at wire speed and takes the most direct path when performed on the LAN switch.

Platform choice is flexible and includes options with or without Power over Ethernet (PoE). To see all 1G switch options, refer to the hardware compatibility list.

Small campus networks with consolidated network layers. Left: A small campus site with a single LAN switch. Right: One collapsed layer with one MLAG switch pair and security services in the WAN edge.

Small campus networks with consolidated network layers. Left: A small campus site with a single LAN switch. Right: One collapsed layer with one MLAG switch pair and security services in the WAN edge.

Single Site - Small Building

Campus networks that require more than two access switches also require a distribution layer. Interconnecting access switches in a mesh, ring or daisy chain fashion is not recommended. Access switches should be arranged in a hierarchy such that traffic between access switches flows through a distribution layer switch.

Exception: Two access switches configured as an MLAG pair with a peer link.

Connections between access layer and distribution layer switches can be built by using either layer 2 or layer 3 links. Designs using layer 2 links are usually selected if a VLAN must be extended beyond a single switch or MLAG pair, but EVPN and VXLAN discussed below should be considered to address this use case. At small scale, layer 2 designs are easy to configure and operate and are generally reliable.

However, most modern business applications do not require direct layer 2 adjacency. Enterprises without a need for native layer 2 extension across access switches and those that adopt EVPN and VXLAN are able to build a more reliable next generation campus network using layer 3 ECMP links.

This provides several key benefits:

Cumulus Networks recommends using eBGP Unnumbered to build and maintain the set of layer 3 links between the access and distribution switches. BGP Unnumbered uses auto-generated IPv6 link local addresses and several intuitive configuration optimizations to drastically reduce configuration and deployment complexity. For more information on eBGP Unnumbered, refer to the Cumulus Linux user guide.

Campus networks with a distribution layer. Left: A classic campus network with layer 2 bonds interconnecting the access and distribution layer. Routing for access VLANs occurs at the distribution layer. Right: An alternative design using layer 3 ECMP uplinks from the access layer and BGP Unnumbered. Routing for access VLANs occurs at the access layer switches.

Campus networks with a distribution layer. Left: A classic campus network with layer 2 bonds interconnecting the access and distribution layer. Routing for access VLANs occurs at the distribution layer. Right: An alternative design using layer 3 ECMP uplinks from the access layer and BGP Unnumbered. Routing for access VLANs occurs at the access layer switches.

An optional improvement to the layer 3 ECMP design in the figure above includes an additional pair of switches to transition in and out of the broader aggregation and access network segment. Often, this serves as a multi-purpose services segment or block. This design option more closely adheres to Clos leaf/spine network design principles that have proven success in modern data centers.

In this configuration, distribution layer switches only route IP packets between access switches and participate in BGP to facilitate route advertisement. This simplicity has several inherent benefits. Troubleshooting is easier since we only need to examine routing and no switching operations. This also keeps the distribution layer configuration much simpler as there are no layer 2 control plane protocols needed and all of the complexity of route filtering, summarization or redistribution into another routing protocol for this segment occurs on the border switches.

Layer 3 ECMP design with optional border switches.

Layer 3 ECMP design with optional border switches.

TierSupported Platforms
AccessAny model on the Cumulus HCL. See 1G models here.
DistributionAny Broadcom Trident3, Trident II+ or Mellanox Spectrum-based platform is preferred. Broadcom Tomahawk family is acceptable.
BorderAny Broadcom Trident3, Trident II+ or Mellanox Spectrum-based platform is preferred. Broadcom Tomahawk family is acceptable.

Redistribute Neighbor

The Cumulus Linux layer 3 Redistribute Neighbor feature helps modernise your campus network to make it more resilient, maximize utilisation, and eliminate the use of the Spanning Tree Protocol (STP).

Redistribute neighbor provides a way for IP subnets to span racks without forcing the end hosts to run a routing protocol by redistributing the Address Resolution Protocol (ARP) table (Linux IP neighbor table) into a dynamic routing protocol, such as OSPF or BGP. The host routes continue to be advertised into the routing domain as /32 prefix routes. Routing protocols can achieve reachability by routing on the Longest Prefix Match (LPM) based on these /32 host routes.

Benefits

Using Redistribute neighbor provides the following benefits:

Example Configuration

The following example shows a core network fabric connected to building A. The design closely reflects a multi data center spine and leaf fabric design where there is a spine and leaf fabric for the network core and for each building:

In the above example:

The routing configuration for the access switches (Acc-Sw1 and Acc-Sw3) is shown here:

cumulus@Acc-Sw1:mgmt-vrf:~# net show configuration commands
net add bgp autonomous-system 65011
net add routing import-table 10 route-map REDIST_NEIGHBOR
net add routing route-map LOOPBACK_ROUTES permit 10 match interface lo
net add routing route-map REDIST_NEIGHBOR permit 10 match interface vlan13
net add bgp router-id 10.0.0.11
net add bgp bestpath as-path multipath-relax
net add bgp neighbor swp19 interface remote-as external
net add bgp neighbor swp20 interface remote-as external
net add bgp neighbor peerlink.4094 interface remote-as internal
net add bgp ipv4 unicast redistribute connected route-map LOOPBACK_ROUTES
net add bgp ipv4 unicast redistribute table 10
cumulus@Acc-Sw3:mgmt-vrf:~# net show configuration commands
net add bgp autonomous-system 65013
net add bgp router-id 10.0.0.13
net add bgp bestpath as-path multipath-relax
net add bgp neighbor swp19 interface remote-as external
net add bgp neighbor swp20 interface remote-as external
net add bgp neighbor peerlink.4094 interface remote-as internal
net add bgp ipv4 unicast redistribute connected route-map LOOPBACK_ROUTES
net add bgp ipv4 unicast redistribute table 10

To verify connectivity between Host A (10.1.3.101) and Host C (10.1.3.103), which are on the same subnet, run these commands:

cumulus@Host_A:~$ 10.1.3.103 show eth1
4: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
    link/ether 44:38:39:00:00:75 brd ff:ff:ff:ff:ff:ff
    inet 10.1.3.101/24 scope global eth1
       valid_lft forever preferred_lft forever
    inet6 fe80::4638:39ff:fe00:75/64 scope link
       valid_lft forever preferred_lft forever
cumulus@Host_C:~$ 10.1.3.101 show eth1
7: eth1: <BROADCAST,MULTICAST,MASTER,UP,LOWER_UP> mtu 9000 qdisc noqueue state UP group default qlen 1000
    link/ether 44:38:39:00:00:73 brd ff:ff:ff:ff:ff:ff
    inet 10.1.3.103/24 brd 10.1.3.255 scope global eth1
       valid_lft forever preferred_lft forever
    inet6 fe80::4638:39ff:fe00:73/64 scope link
       valid_lft forever preferred_lft forever

In this design, Proxy ARP is configured on the VLAN attached to the host so that the switch responds to all ARP requests when a host sends an ARP request to a system on its subnet.

If you have many switches and need the VLAN across all the switches, you can specify a unique IP address on all the SVIs in the subnet, or you can use the anycast gateway with VRR. To conserve IP addresses, repeat physical IP addresses on a switch or switch pair (if you use MLAG).

The IP neighbor table on the hosts and the switches looks like this:

cumulus@Host_A:~$ sudo ip neighbor show | grep 10.1.3.103
10.1.3.103 dev eth1 lladdr 44:39:39:ff:00:13 REACHABLE
cumulus@Host_C:~$ sudo ip neigh show | grep 10.1.3.101
10.1.3.101 dev uplink lladdr 44:38:39:00:00:74 REACHABLE
cumulus@Acc-Sw1:mgmt-vrf:~# sudo ip neigh show 10.1.3.101
10.1.3.101 dev vlan13 lladdr 44:38:39:00:00:75 REACHABLE
cumulus@Acc-Sw3:mgmt-vrf:~# sudo ip neigh show 10.1.3.103
10.1.3.103 dev vlan13 lladdr 44:38:39:00:00:73 REACHABLE

The routing table on the access and aggregation switches look like this:

cumulus@Acc-Sw1:mgmt-vrf:~# net show bgp 10.1.3.103
BGP routing table entry for 10.1.3.103/32
Paths: (2 available, best #2, table default)
  Advertised to non peer-group peers:
  BAagg-Sw1(swp19) BAagg-Sw2(swp20)
  65021 65013
    fe80::4638:39ff:fe00:4d from BAagg-Sw2(swp20) (10.0.0.22)
    (fe80::4638:39ff:fe00:4d) (used)
      Origin incomplete, valid, external, multipath
      AddPath ID: RX 0, TX 20
      Last update: Mon May  4 06:03:10 2020

  65021 65013
    fe80::4638:39ff:fe00:7b from BAagg-Sw1(swp19) (10.0.0.21)
    (fe80::4638:39ff:fe00:7b) (used)
      Origin incomplete, valid, external, multipath, bestpath-from-AS 65021, best
      AddPath ID: RX 0, TX 10
      Last update: Fri May  1 06:14:53 2020
cumulus@Acc-Sw1:mgmt-vrf:~# net show route 10.1.3.103
RIB entry for 10.1.3.103
========================
Routing entry for 10.1.3.103/32
  Known via "bgp", distance 20, metric 0, best
  Last update 00:00:11 ago
  * fe80::4638:39ff:fe00:7b, via swp19
  * fe80::4638:39ff:fe00:4d, via swp20

FIB entry for 10.1.3.103
========================
10.1.3.103  proto bgp  metric 20
        nexthop via 169.254.0.1  dev swp19 weight 1 onlink
        nexthop via 169.254.0.1  dev swp20 weight 1 onlink
cumulus@BAagg-Sw1:mgmt-vrf:~$ net show bgp 10.1.3.101
BGP routing table entry for 10.1.3.101/32
Paths: (1 available, best #1, table default)
  Advertised to non peer-group peers:
  Acc-Sw1(swp19) Acc-Sw2(swp20) Acc-Sw3(swp21) Acc-Sw4(swp22)
  65011
    fe80::4638:39ff:fe00:7c from Acc-Sw1(swp19) (10.0.0.11)
    (fe80::4638:39ff:fe00:7c) (used)
      Origin incomplete, metric 0, valid, external, bestpath-from-AS 65011, best
      AddPath ID: RX 0, TX 24
      Last update: Fri May  1 06:14:51 2020
cumulus@BAagg-Sw1:mgmt-vrf:~$ net show bgp 10.1.3.103
BGP routing table entry for 10.1.3.103/32
Paths: (2 available, best #1, table default)
  Advertised to non peer-group peers:
  Acc-Sw1(swp19) Acc-Sw2(swp20) Acc-Sw3(swp21) Acc-Sw4(swp22)
  65013
    fe80::4638:39ff:fe00:a2 from Acc-Sw3(swp21) (10.0.0.13)
    (fe80::4638:39ff:fe00:a2) (used)
      Origin incomplete, metric 0, valid, external, multipath, bestpath-from-AS 65013, best
      AddPath ID: RX 0, TX 19
      Last update: Fri May  1 06:04:32 2020

  65014
    fe80::4638:39ff:fe00:26 from Acc-Sw4(swp22) (10.0.0.14)
    (fe80::4638:39ff:fe00:26) (used)
      Origin incomplete, metric 0, valid, external, multipath, bestpath-from-AS 65014
      AddPath ID: RX 0, TX 18
      Last update: Fri May  1 06:04:32 2020
cumulus@BAagg-Sw1:mgmt-vrf:~$ net show route 10.1.3.101
RIB entry for 10.1.3.101
========================
Routing entry for 10.1.3.101/32
  Known via "bgp", distance 20, metric 0, best
  Last update 3d01h54m ago
  * fe80::4638:39ff:fe00:7c, via swp19


FIB entry for 10.1.3.101
========================
10.1.3.101 via 169.254.0.1 dev swp19  proto bgp  metric 20 onlink
cumulus@BAagg-Sw1:mgmt-vrf:~$ net show route 10.1.3.103
RIB entry for 10.1.3.103
========================
Routing entry for 10.1.3.103/32
  Known via "bgp", distance 20, metric 0, best
  Last update 3d02h04m ago
  * fe80::4638:39ff:fe00:a2, via swp21
  * fe80::4638:39ff:fe00:26, via swp22


FIB entry for 10.1.3.103
========================
10.1.3.103  proto bgp  metric 20
        nexthop via 169.254.0.1  dev swp21 weight 1 onlink
        nexthop via 169.254.0.1  dev swp22 weight 1 onlink
cumulus@Acc-Sw3:mgmt-vrf:~# net show bgp 10.1.3.101  
BGP routing table entry for 10.1.3.101/32
Paths: (2 available, best #2, table default)
  Advertised to non peer-group peers:
  BAagg-Sw1(swp19) BAagg-Sw2(swp20)
  65021 65011
    fe80::4638:39ff:fe00:b7 from BAagg-Sw2(swp20) (10.0.0.22)
    (fe80::4638:39ff:fe00:b7) (used)
      Origin incomplete, valid, external, multipath
      AddPath ID: RX 0, TX 22
      Last update: Mon May  4 06:03:06 2020

  65021 65011
    fe80::4638:39ff:fe00:a1 from BAagg-Sw1(swp19) (10.0.0.21)
    (fe80::4638:39ff:fe00:a1) (used)
      Origin incomplete, valid, external, multipath, bestpath-from-AS 65021, best
      AddPath ID: RX 0, TX 19
      Last update: Fri May  1 06:14:51 2020
cumulus@Acc-Sw3:mgmt-vrf:~# net show route 10.1.3.101
RIB entry for 10.1.3.101
========================
Routing entry for 10.1.3.101/32
  Known via "bgp", distance 20, metric 0, best
  Last update 00:02:55 ago
  * fe80::4638:39ff:fe00:a1, via swp19
  * fe80::4638:39ff:fe00:b7, via swp20


FIB entry for 10.1.3.101
========================
10.1.3.101  proto bgp  metric 20
        nexthop via 169.254.0.1  dev swp19 weight 1 onlink
        nexthop via 169.254.0.1  dev swp20 weight 1 onlink

In this deployment, you can perform segmentation in one of two ways:

EVPN VXLAN

Layer 3 ECMP-based designs can be further improved to add layer 2 extension across the network. As discussed in the EVPN and VXLAN section, it’s possible to realize the reliability and scalability of a layer 3 design and also deliver layer 2 connectivity to devices across access switches and across the layer 3 ECMP network. Cumulus Linux makes EVPN and VXLAN easy to use, especially when combined with a network built using BGP Unnumbered. EVPN also uses BGP, so the same protocol seamlessly drives both functions.

Routing between VLANs in a VXLAN topology presents a few choices. For performance and best scalability, symmetric mode with IRB (integrated routing and bridging), is the recommended inter-subnet routing strategy. In this mode, routing always occurs at the access switch. For inter-subnet traffic local to the switch (or pair), symmetric mode routes the packet at the access switch and the packet takes the optimal path. For more information on EVPN symmetric mode, see the documentation here.

The design in the figure below prescribes for a VTEP to be configured at every switch in the access and distribution layer. Both access layer and distribution layer switches must perform VXLAN encapsulation and decapsulation and support routing in and out of VXLAN tunnels.

An example modern campus design. EVPN VXLAN provides VLAN extension and layer 2 connectivity across the campus. Distribution and access layer switches are both VTEPs and perform VXLAN encapsulation for the network.

An example modern campus design. EVPN VXLAN provides VLAN extension and layer 2 connectivity across the campus. Distribution and access layer switches are both VTEPs and perform VXLAN encapsulation for the network.

In this design above, northbound links from the distribution layer switches are ideally discrete layer 3 routed links. Since MLAG is not required for the southbound access layer connectivity, the introduction of MLAG for strictly this purpose weakens some of the design’s inherent advantages. A design that requires connectivity using MLAG at the distribution layer limits the distribution layer to a strict pair of switches, increases control plane dependencies, and increases troubleshooting complexity.

A similar design as the previous figure with distribution layer switches configured with MLAG.

A similar design as the previous figure with distribution layer switches configured with MLAG.

Platform choice in these designs is more rigid with the need for VXLAN encapsulation and VXLAN routing support. For access layer switches, Helix4 and Hurricane2-based platforms do not offer VXLAN support. As a result, we recommend Trident3 or Mellanox Spectrum-based platforms, which support VXLAN routing.

TierSupported Platforms
AccessBroadcom Trident3, Trident II+ or Mellanox Spectrum-based platform.
DistributionBroadcom Trident3, Trident II+ or Mellanox Spectrum-based platform is preferred. Broadcom Tomahawk family is acceptable, although routing in and out of VXLAN tunnels (RIOT) on Tomahawk platforms introduces a few limitations and extra configuration.

The non-MLAG design two figures above can be improved with the addition of dedicated border switches in the network segment. This option provides a number of significant benefits by removing complexity at the distribution layer switches. In contrast to the design above, the distribution layer switches do not need to serve as VTEPs for the network. VTEP switches are only in the access and border layers, which leaves the distribution layer to perform IP routing and delivering VXLAN packets between VTEPs.

Layer 3 design using EVPN VXLAN. Access switches act as VTEPs and support VXLAN encapsulation. VLANs can be extended across the campus to other access switches across the routed distribution layer.

Layer 3 design using EVPN VXLAN. Access switches act as VTEPs and support VXLAN encapsulation. VLANs can be extended across the campus to other access switches across the routed distribution layer.

This design option also provides more flexibility for northbound access to the core or the rest of the campus network. Border switch segments can be designed more flexibly and may be implemented more easily as an MLAG pair to retain the inherent design benefits of a modern layer 3 network.

Platform choice in this design follows closely with other VXLAN options. Access and border switches must be able to support VXLAN encapsulation/decapsulation and also routing in and out of VXLAN tunnels (RIOT) to support symmetric mode. Helix4, Hurricane2 and other native 1Gbps switch ASICs do not offer VXLAN support. As a result, we recommend Trident3 or Mellanox Spectrum-based platforms, which support VXLAN routing. Distribution switches do not perform any VXLAN encapsulation/decapsulation in this design and have fewer feature driven requirements.

TierSupported Platforms
AccessAny Broadcom Trident3, Trident II+ or Mellanox Spectrum-based platform.
DistributionAny supported platform on the Cumulus HCL.
BorderAny Broadcom Trident3, Trident II+ or Mellanox Spectrum-based platform is preferred. Broadcom Tomahawk family is acceptable, although routing in and out of VXLAN tunnels (RIOT) on Tomahawk platforms introduces a few limitations and extra configuration.

Multiple Distribution Layers

The maximum size of a single access layer is bound by the number of ports on the distribution layer switches. For example, a 32 port distribution layer switch can interconnect 30 access layer switches, leaving two uplinks. To scale out access layer connectivity, more port dense distribution switches can be used, or an additional access aggregation block can be added. Designs built using layer 2 bonds for switch uplinks should limit the size of the access layer to avoid problems with spanning tree and layer 2 scaling as discussed in an earlier section.

When adding or designing for a more than two aggregation and access network segments, the addition of a dedicated core should be considered. As discussed earlier, a dedicated layer 3 core layer provides several advantages and, most importantly, reduces cable complexity for interconnections to other network blocks. A multi-building campus may not be able to connect all buildings and network segments to each other due to geographical and cabling challenges.

Multiple campus segments connected with a collapsed core design. Full link redundancy creates a mesh and does not scale.

Multiple campus segments connected with a collapsed core design. Full link redundancy creates a mesh and does not scale.

Multiple campus segments interconnected with a dedicated core. A network segment only needs to connect to the core instead of every other segment.

Multiple campus segments interconnected with a dedicated core. A network segment only needs to connect to the core instead of every other segment.

A campus core layer can be easily built using BGP Unnumbered. In the design above, the distribution layer switches and core layer switches peer with each other using eBGP Unnumbered to manage IP addressing and also create the BGP neighbor relationships to exchange routes. With a Cumulus Linux data center or server room, all of the routed links included in the grey box in the figure above can be built and maintained using BGP Unnumbered. The WAN/edge devices can also run BGP and peer with the rest of the campus core.

The addition of a dedicated core layer with multiple interconnected campus network segments also introduces additional implementation options with EVPN and VXLAN. In the design in the figure above, layer 2 VLANs cannot be extended across the core layer. The access aggregation segment on bottom of the figure is a layer 2 access aggregation design, so VLANs can be natively trunked through the distribution layer to other access switches. Since a dedicated core layer is only layer 3, VLAN extension stops at this network segment and cannot cross the core.

EVPN and VXLAN is the solution to enable campus network designers to build a truly dedicated layer 3 core network and still provide VLAN and layer 2 extension to users and devices across the entire campus beyond what was traditionally possible.

There are two main variations of a large scale EVPN VXLAN campus when there is a dedicated layer 3 campus core. The main difference being at which layer the VTEPs are implemented. A traditional layer 2 access and aggregation block can be constructed with layer 2 trunks and MLAG bonding, with VTEPs at the distribution layer. However, use caution to avoid creating large layer 2 domains, extended over large numbers of switches with this design.

Cumulus Networks does not recommend implementing VTEP functionality in the core layer. Core switches should only perform simple IP routing to more closely adhere to core layer design principles. This may require that a set of border switches be used to provide VXLAN encapsulation/decapsulation to other areas of the campus depending upon the particular needs and use cases.

An EVPN VXLAN-enabled campus with VTEPs at the distribution layer. Access switches use layer 2 bonds to an MLAG pair of distribution switches. Access VLANs can be extended through the core to other distribution access blocks or to an EVPN VXLAN-enabled data center.

An EVPN VXLAN-enabled campus with VTEPs at the distribution layer. Access switches use layer 2 bonds to an MLAG pair of distribution switches. Access VLANs can be extended through the core to other distribution access blocks or to an EVPN VXLAN-enabled data center.

This design increases platform flexibility and choice on access switches as they do not need to perform VXLAN encapsulation. Inside of a distribution-access segment, VLANs are extended and trunked using normal 802.1q VLAN tagging through the distribution layer. Routing for access VLANs occurs in the distribution layer. BGP Unnumbered is recommended to configure the layer 3 links to the core and also carry the EVPN address family to enable VXLAN. For performance and best scalability, symmetric mode with IRB (integrated routing and bridging), is the recommended inter-subnet routing strategy.

Symmetric mode with distributed anycast gateways provides for devices to move between access switches and always perform routing at the local switch. All routing for access VLANs occurs in the context of a tenant (VRF). The global routing table is used to carry and advertise routing information for the network endpoints (such as VTEP addresses and loopbacks). All IP subnets for access VLANs exist as a member of a VRF. More information about VRFs in this design is discussed below.

Data center network segments that implement VXLAN and EVPN can also be folded into the broader EVPN VXLAN-enabled campus. This provides opportunities to consolidate and centralize server and compute resources that had been traditionally co-located in the access or distribution layer. For more information about using EVPN and VXLAN in a data center network, visit our EVPN solution page.

Distribution layer or border switches serve as VTEPs; thus, they carry the most stringent platform requirements. Distribution layer switches must be able to support VXLAN encapsulation/decapsulation and also routing in and out of VXLAN tunnels (RIOT). Core switches do not serve as VTEPs and only perform IP routing to more closely adhere to core layer design principles.

TierSupported Platforms
AccessAny model on the Cumulus HCL. See 1G models here.
DistributionAny Broadcom Trident3, Trident II+ or Mellanox Spectrum-based platform is preferred. Broadcom Tomahawk family is acceptable, although routing in and out of VXLAN tunnels (RIOT) on Tomahawk platforms introduces afew limitations and extra configuration.
BorderAny Broadcom Trident3, Trident II+ or Mellanox Spectrum-based platform is preferred. Broadcom Tomahawk family is acceptable, although routing in and out of VXLAN tunnels (RIOT) on Tomahawk platforms introduces a few limitations and extra configuration.
CoreAny model on the Cumulus HCL.

Another variation of a campus EVPN VXLAN design is a more modern and scalable network version that uses BGP Unnumbered through the core down to the access layer switches. Access layer switches are VTEPs and perform VXLAN encapsulation and decapsulation and also routing in and out of VXLAN tunnels (RIOT). For performance and best scalability, symmetric mode with IRB (integrated routing and bridging), is the recommended inter-subnet routing strategy.

A more scalable and robust EVPN VXLAN campus network. Not all VXLAN connections are indicated. Access switches serve as VTEPs leaving the distribution and core layers to perform only IP routing. BGP Unnumbered is used to build the core, distribution and access layers and also carries the EVPN address family to enable VXLAN across the entire campus.

A more scalable and robust EVPN VXLAN campus network. Not all VXLAN connections are indicated. Access switches serve as VTEPs leaving the distribution and core layers to perform only IP routing. BGP Unnumbered is used to build the core, distribution and access layers and also carries the EVPN address family to enable VXLAN across the entire campus.

Symmetric mode with distributed anycast gateways provides for devices to move between access switches and always route at the local switch. All routing for access VLANs occurs in the context of a tenant (VRF). The global routing table is used to carry and advertise routing information for the network endpoints (such as VTEP addresses and loopbacks). All IP subnets for access VLANs exist as a member of a VRF.

To provide the highest level of isolation and traffic control, VRFs can be implemented natively with EVPN to organize and classify IP subnets across the network. This is the concept of multi-tenancy. A VRF can be used to organize subnets across a layer 3 network similar to organizing devices by VLAN in flat layer 2 networks. VRFs are implemented as separate instances of a layer 3 routing table. Route lookups are performed only within a specific table, providing for isolation between layer 3 IP networks and even overlapping address space.

In the figure below, wireless devices and workstations are in different VLANs and different IP subnets, but are in the same VRF named “Data.” The VTEP access switch can route between subnets that are within the same VRF directly. For traffic between VRFs such as “Voice” and “Data”, there is no direct path due to VRF isolation. Packets must flow to a centralized location where VRF route leaking occurs or travels “out-and-back” through a VRF-unaware device. Notice in the figure below, inter-VRF traffic from the “Voice” VRF to the “Data” VRF must go through a centralized point and take a different path to get into the completely separate “data” IP network.

Intra-VRF and inter-VRF routing paths. Not all VXLAN tunnel connections are indicated.

Intra-VRF and inter-VRF routing paths. Not all VXLAN tunnel connections are indicated.

Platform requirements in this example differ from the previous example when VTEPs are at the distribution layer. In this configuration, the access layer switch requirements now must include VXLAN capabilities, but the distribution layer switches do not. In this design, the distribution layer switches perform only IP routing and inherit the same simplicity and ease of use benefits as layer 3 core designs.

TierSupported Platforms
AccessAny Broadcom Trident3, Trident II+ or Mellanox Spectrum-based platform.
DistributionAny model on the Cumulus HCL.
BorderAny Broadcom Trident3, Trident II+ or Mellanox Spectrum-based platform. Broadcom Tomahawk family is acceptable, although routing in and out of VXLAN tunnels (RIOT) on Tomahawk platforms introduces a few limitations and extra configuration.
CoreAny model on the Cumulus HCL.

Network Telemetry and Validation with Cumulus NetQ

Cumulus NetQ is a monitoring agent that delivers pertinent, real-time telemetry data for campus deployments of any size. As data is collected and processed by the telemetry server, Cumulus NetQ is able to provide a deep, rich set of network health status and validation checks.

For a single pane view, NetQ brings a modern graphical user interface. Important information such as network service health, IP addresses, MAC addresses, events and alarms are all pulled together into an easy to consume, browser-based view. A command line interface is also provided that can be used from any node that runs a NetQ agent. This provides access to any information, from any node, from anywhere an operator may be logged in.

Cumulus NetQ GUI

Cumulus NetQ GUI

NetQ also provides an API and structured data outputs to provide capabilities to integrate into nearly any 3rd party monitoring and alerting tools. Every NetQ CLI command has a JSON output option, which provides a structured response that is easy to parse and interpret programmatically. Cumulus NetQ also provides a full RESTful API with a standard OpenAPI definition that can be found in the API User Guide.

cumulus@oob-mgmt-server:~$ netq leaf01 show ip routes json
{
    "routes":[
        {
            "origin":"no",
            "lastChanged":1571689245.9370000362,
            "hostname":"leaf01",
            "prefix":"10.0.0.22/32",
            "vrf":"default",
            "nexthops":"169.254.0.1: swp52"
        },
        {
            "origin":"yes",
            "lastChanged":1571689245.9360001087,
            "hostname":"leaf01",
            "prefix":"10.0.0.11/32",
            "vrf":"default",
            "nexthops":"lo"
        },
        {
            "origin":"yes",
            "lastChanged":1571689245.9360001087,
            "hostname":"leaf01",
            "prefix":"10.2.4.1/32",
            "vrf":"vrf1",
            "nexthops":"vlan24-v0"
        },
        {
            "origin":"no",
            "lastChanged":1571689245.9360001087,
            "hostname":"leaf01",
            "prefix":"10.0.0.12/32",
            "vrf":"default",
            "nexthops":"169.254.1.2: peerlink.4094"
        },
        {
            "origin":"no",
            "lastChanged":1571689245.9360001087,
            "hostname":"leaf01",
            "prefix":"0.0.0.0/0",
            "vrf":"mgmt",
            "nexthops":"Blackhole"
        },

...

NetQ delivers advanced path tracing, even through a VXLAN overlay, by either layer 3 IP address or by layer 2 MAC address. NetQ agents stream forwarding information from every node, which provides a unique opportunity to perform a more complete path trace. A NetQ trace discovers all possible paths and checks each path, hop by hop for errors and possible problems. As shown in the example below, Cumulus NetQ CLI provides layer 2, layer 3 and VXLAN overlay trace capabilities.

cumulus@oob-mgmt-server:~$ netq trace 10.2.4.104 from server01 pretty
Number of Paths: 8
Number of Paths with Errors: 0
Number of Paths with Warnings: 0
Path MTU: 1500

 server01 bond0 -- swp1  leaf02  vni: 104001 swp52 -- swp2 spine02 swp4 -- swp52 vni: 104001  leaf04  bond04 -- bond0 server04
                                             swp52 -- swp2 spine02 swp3 -- swp52 vni: 104001  leaf03  bond04 -- bond0 server04
          bond0 -- swp1  leaf02  vni: 104001 swp51 -- swp2 spine01 swp4 -- swp51 vni: 104001  leaf04  bond04 -- bond0 server04
                                             swp51 -- swp2 spine01 swp3 -- swp51 vni: 104001  leaf03  bond04 -- bond0 server04
          bond0 -- swp1  leaf01  vni: 104001 swp52 -- swp1 spine02 swp4 -- swp52 vni: 104001  leaf04  bond04 -- bond0 server04
                                             swp52 -- swp1 spine02 swp3 -- swp52 vni: 104001  leaf03  bond04 -- bond0 server04
          bond0 -- swp1  leaf01  vni: 104001 swp51 -- swp1 spine01 swp4 -- swp51 vni: 104001  leaf04  bond04 -- bond0 server04
                                             swp51 -- swp1 spine01 swp3 -- swp51 vni: 104001  leaf03  bond04 -- bond0 server04

Fabric-wide service validations and health checks bring accurate alerting and diagnostics, and can provide prebuilt comprehensive testing for CI/CD automated workflows. NetQ agents stream a wide breadth of both control plane and data plane information that can be cross referenced and checked to better represent the status of the entire network instead of individual pieces. Services like BGP, EVPN, MLAG and NTP can be verified as the total of all nodes working together in addition to physical interface settings, licenses, and environmental sensors.

On-demand validation of a BGP fabric from the NetQ GUI

On-demand validation of a BGP fabric from the NetQ GUI

NetQ CLI Examples

netq show macs

cumulus@switch:~$ netq leaf01 show macs

Matching mac records:
Origin MAC Address        VLAN   Hostname          Egress Port          Remote Last Changed
------ ------------------ ------ ----------------- -------------------- ------ -------------------------
no     00:03:00:11:11:01  13     leaf01            bond01               no     Mon Oct 21 21:55:52 2019
no     00:03:00:22:22:01  24     leaf01            bond02               no     Mon Oct 21 21:55:52 2019
no     00:03:00:33:33:01  13     leaf01            vni13::10.0.0.134    yes    Mon Oct 21 21:55:52 2019
no     00:03:00:44:44:01  24     leaf01            vni24::10.0.0.134    yes    Mon Oct 21 21:55:52 2019
no     02:03:00:11:11:01  13     leaf01            bond01               no     Mon Oct 21 21:55:52 2019
no     02:03:00:11:11:02  13     leaf01            bond01               no     Mon Oct 21 21:55:51 2019
no     02:03:00:22:22:01  24     leaf01            bond02               no     Mon Oct 21 21:55:51 2019
no     02:03:00:22:22:02  24     leaf01            bond02               no     Mon Oct 21 21:55:51 2019
no     02:03:00:33:33:01  13     leaf01            vni13::10.0.0.134    yes    Mon Oct 21 21:55:52 2019
no     02:03:00:33:33:02  13     leaf01            vni13::10.0.0.134    yes    Mon Oct 21 21:55:51 2019
no     02:03:00:44:44:01  24     leaf01            vni24::10.0.0.134    yes    Mon Oct 21 21:55:52 2019
no     02:03:00:44:44:02  24     leaf01            vni24::10.0.0.134    yes    Mon Oct 21 21:55:51 2019
no     44:38:39:00:00:15  13     leaf01            peerlink             no     Mon Oct 21 21:55:51 2019
no     44:38:39:00:00:15  24     leaf01            peerlink             no     Mon Oct 21 21:55:51 2019
no     44:38:39:00:00:23  13     leaf01            vni13::10.0.0.134    yes    Mon Oct 21 21:55:52 2019
no     44:38:39:00:00:23  24     leaf01            vni24::10.0.0.134    yes    Mon Oct 21 21:55:52 2019
no     44:38:39:00:00:5c  13     leaf01            vni13::10.0.0.134    yes    Mon Oct 21 21:55:52 2019
no     44:38:39:00:00:5c  24     leaf01            vni24::10.0.0.134    yes    Mon Oct 21 21:55:52 2019
no     44:39:39:ff:40:95  4001   leaf01            vxlan4001::10.0.0.13 yes    Mon Oct 21 21:55:51 2019
                                                   4
no     c2:10:71:7d:82:de  4001   leaf01            vxlan4001::10.0.0.41 yes    Mon Oct 21 21:55:52 2019
no     f6:6b:a7:96:8c:b2  4001   leaf01            vxlan4001::10.0.0.42 yes    Mon Oct 21 21:55:51 2019
yes    44:38:39:00:00:03  13     leaf01            bridge               no     Mon Oct 21 21:55:52 2019
yes    44:38:39:00:00:03  24     leaf01            bridge               no     Mon Oct 21 21:55:52 2019
yes    44:38:39:00:00:03  4001   leaf01            bridge               no     Mon Oct 21 21:55:52 2019
yes    44:39:39:ff:00:13  13     leaf01            bridge               no     Mon Oct 21 21:55:52 2019
yes    44:39:39:ff:00:24  24     leaf01            bridge               no     Mon Oct 21 21:55:52 2019
yes    44:39:39:ff:40:94  4001   leaf01            bridge               no     Mon Oct 21 21:55:52 2019

netq show ip routes

cumulus@switch:~$ netq leaf01 show ip routes

Matching routes records:
Origin VRF             Prefix                         Hostname          Nexthops                            Last Changed
------ --------------- ------------------------------ ----------------- ----------------------------------- -------------------------
no     default         10.0.0.12/32                   leaf01            169.254.1.2: peerlink.4094          Mon Oct 21 20:20:45 2019
no     default         10.0.0.13/32                   leaf01            169.254.0.1: swp51,                 Mon Oct 21 20:20:45 2019
                                                                        169.254.0.1: swp52
no     default         10.0.0.134/32                  leaf01            169.254.0.1: swp51,                 Mon Oct 21 20:20:45 2019
                                                                        169.254.0.1: swp52
no     default         10.0.0.14/32                   leaf01            169.254.0.1: swp51,                 Mon Oct 21 20:20:45 2019
                                                                        169.254.0.1: swp52
no     default         10.0.0.21/32                   leaf01            169.254.0.1: swp51                  Mon Oct 21 20:20:45 2019
no     default         10.0.0.22/32                   leaf01            169.254.0.1: swp52                  Mon Oct 21 20:20:45 2019
no     default         10.0.0.41/32                   leaf01            169.254.0.1: swp51,                 Mon Oct 21 20:20:45 2019
                                                                        169.254.0.1: swp52
no     default         10.0.0.42/32                   leaf01            169.254.0.1: swp51,                 Mon Oct 21 20:20:45 2019
                                                                        169.254.0.1: swp52
no     mgmt            0.0.0.0/0                      leaf01            Blackhole                           Mon Oct 21 20:20:45 2019
no     vrf1            0.0.0.0/0                      leaf01            Blackhole                           Mon Oct 21 20:20:45 2019
no     vrf1            10.0.0.0/8                     leaf01            10.0.0.41: vlan4001,                Mon Oct 21 20:20:45 2019
                                                                        10.0.0.42: vlan4001
no     vrf1            10.0.0.253/32                  leaf01            10.0.0.41: vlan4001,                Mon Oct 21 20:20:45 2019
                                                                        10.0.0.42: vlan4001
no     vrf1            10.1.3.103/32                  leaf01            10.0.0.134: vlan4001                Mon Oct 21 20:20:45 2019
no     vrf1            10.2.4.104/32                  leaf01            10.0.0.134: vlan4001                Mon Oct 21 20:20:45 2019
yes    default         10.0.0.11/32                   leaf01            lo                                  Mon Oct 21 20:20:45 2019
yes    default         10.0.0.112/32                  leaf01            lo                                  Mon Oct 21 20:20:45 2019
yes    default         169.254.1.0/30                 leaf01            peerlink.4094                       Mon Oct 21 20:20:45 2019
yes    default         169.254.1.1/32                 leaf01            peerlink.4094                       Mon Oct 21 20:20:45 2019
yes    mgmt            192.168.0.0/24                 leaf01            eth0                                Mon Oct 21 20:20:45 2019
yes    mgmt            192.168.0.11/32                leaf01            eth0                                Mon Oct 21 20:20:45 2019
yes    vrf1            10.1.3.0/24                    leaf01            vlan13-v0                           Mon Oct 21 20:20:45 2019
yes    vrf1            10.1.3.1/32                    leaf01            vlan13-v0                           Mon Oct 21 20:20:45 2019
yes    vrf1            10.1.3.11/32                   leaf01            vlan13                              Mon Oct 21 20:20:45 2019
yes    vrf1            10.2.4.0/24                    leaf01            vlan24-v0                           Mon Oct 21 20:20:45 2019
yes    vrf1            10.2.4.1/32                    leaf01            vlan24-v0                           Mon Oct 21 20:20:45 2019
yes    vrf1            10.2.4.11/32                   leaf01            vlan24                              Mon Oct 21 20:20:45 2019

netq show ip address

cumulus@switch:~$ netq leaf01 show ip address

Matching address records:
Address                   Hostname          Interface                 VRF             Last Changed
------------------------- ----------------- ------------------------- --------------- -------------------------
10.0.0.11/32              leaf01            lo                        default         Mon Oct 21 20:20:45 2019
10.0.0.112/32             leaf01            lo                        default         Mon Oct 21 20:20:45 2019
10.1.3.1/24               leaf01            vlan13-v0                 vrf1            Mon Oct 21 20:20:45 2019
10.1.3.11/24              leaf01            vlan13                    vrf1            Mon Oct 21 20:20:45 2019
10.2.4.1/24               leaf01            vlan24-v0                 vrf1            Mon Oct 21 20:20:45 2019
10.2.4.11/24              leaf01            vlan24                    vrf1            Mon Oct 21 20:20:45 2019
169.254.1.1/30            leaf01            peerlink.4094             default         Mon Oct 21 20:20:45 2019
192.168.0.11/24           leaf01            eth0                      mgmt            Mon Oct 21 20:20:45 2019

netq check

The ability to check bgp, evpn, clag, interfaces, license, mtu, ntp, ospf/ospf6, sensors, vlan and vxlan are supported.

cumulus@oob-mgmt-server:~$ netq check sensors
Total Nodes: 13, Failed Nodes: 5, Checked Sensors: 136, Failed Sensors: 0
Hostname          Name            Description                         State      Value      Message
----------------- --------------- ----------------------------------- ---------- ---------- -----------------------------------
edge01            -               -                                   unknown    -          No sensor info present
server01          -               -                                   unknown    -          No sensor info present
server02          -               -                                   unknown    -          No sensor info present
server03          -               -                                   unknown    -          No sensor info present
server04          -               -                                   unknown    -          No sensor info present

netq trace

netq trace provides layer 2, layer 3 and VXLAN overlay support.

cumulus@oob-mgmt-server:~$ netq trace 10.2.4.104 from server01 pretty
Number of Paths: 8
Number of Paths with Errors: 0
Number of Paths with Warnings: 0
Path MTU: 1500

 server01 bond0 -- swp1  leaf02  vni: 104001 swp52 -- swp2 spine02 swp4 -- swp52 vni: 104001  leaf04  bond04 -- bond0 server04
                                             swp52 -- swp2 spine02 swp3 -- swp52 vni: 104001  leaf03  bond04 -- bond0 server04
          bond0 -- swp1  leaf02  vni: 104001 swp51 -- swp2 spine01 swp4 -- swp51 vni: 104001  leaf04  bond04 -- bond0 server04
                                             swp51 -- swp2 spine01 swp3 -- swp51 vni: 104001  leaf03  bond04 -- bond0 server04
          bond0 -- swp1  leaf01  vni: 104001 swp52 -- swp1 spine02 swp4 -- swp52 vni: 104001  leaf04  bond04 -- bond0 server04
                                             swp52 -- swp1 spine02 swp3 -- swp52 vni: 104001  leaf03  bond04 -- bond0 server04
          bond0 -- swp1  leaf01  vni: 104001 swp51 -- swp1 spine01 swp4 -- swp51 vni: 104001  leaf04  bond04 -- bond0 server04
                                             swp51 -- swp1 spine01 swp3 -- swp51 vni: 104001  leaf03  bond04 -- bond0 server04

Notes


  1. A design this large must be built using layer 3 links ↩︎

Cumulus Linux Security Guide

Cumulus Linux is a powerful operating system for routers that comes with secure defaults and is ready to use. This document discusses additional security measures that enable you to further secure your switch to meet corporate, regulatory, and governmental standards. It focuses on three types of security measures:

Not all security measures are created equal; certain measures can ruin a user experience without adding a significant amount of security. Cumulus Networks recommends that you focus on security measures with the greatest benefits and the fewest drawbacks. Then, expand to include other security mitigations as your comfort level increases.

High Impact to Security and Low Impact to Usability

This section discusses issues that have the biggest security impacts with the least impact to management and user experiences.

Secure Switch Hardware

Securing the switch hardware is vital because an attacker with physical access to the hardware can eventually have access to the entire device, allowing them to change configurations, capture all the traffic moving through the switch, and even steal the switch itself. If there is a security breach in the hardware, the entire system is compromised. Securing the router from various attacks or misconfigurations is very important.

Prevent Denial of Service

Denial of Service (DOS) attacks aim to disrupt normal use of a service or device. To create a DOS attack, an attacker sends a very large number of redundant and unnecessary requests to a target system to overwhelm it and block intended users from accessing the service or application being served by the target system. DOS attacks commonly target routers and firewalls. Cumulus Linux comes with a built-in check system for these types of attacks. When enabled, the switch can intelligently analyze packets coming into the system and drop packets that match specific criteria.

To enable automatic checks on your switch, open the /etc/cumulus/datapath/traffic.conf file in a text editor and change the value of the dos_enable setting to true:

cumulus@switch:~$ sudo nano /etc/cumulus/datapath/traffic.conf
...
dos_enable = true

Restart the switchd service for the changes to take effect.

cumulus@switch:~$ sudo systemctl restart switchd.service

Restarting the switchd service causes all network ports to reset, interrupting network services, in addition to resetting the switch hardware configuration.

To specify which DOS checks you want to enable, open the /usr/lib/python2.7/dist-packages/cumulus/__chip_config/bcm/datapath.conf file and enable the desired DOS checks by setting the corresponding values to true:

cumulus@switch:~$ sudo nano /usr/lib/python2.7/dist-packages/cumulus/__chip_config/bcm/datapath.conf
...
# Enabling/disabling Denial of service (DOS) prevention checks
# To change the default configuration:
# enable/disable the individual DOS checks.
dos.sip_eq_dip = true
dos.smac_eq_dmac = true
dos.tcp_hdr_partial = true
dos.tcp_syn_frag = true
dos.tcp_ports_eq = true
dos.tcp_flags_syn_fin = true
dos.tcp_flags_fup_seq0 = true
dos.tcp_offset1 = true
dos.tcp_ctrl0_seq0 = true
dos.udp_ports_eq = true
dos.icmp_frag = true
dos.icmpv4_length = true
dos.icmpv6_length = true
dos.ipv6_min_frag = true

Restart the switchd service for the changes to take effect.

cumulus@switch:~$ sudo systemctl restart switchd.service

Restarting the switchd service causes all network ports to reset, interrupting network services, in addition to resetting the switch hardware configuration.

Configure Switch Ports

Cyber attackers often steal information through vulnerable switch ports. Many companies with VLANs use VLAN 1 instead of choosing a custom VLAN ID because VLAN 1 is the default VLAN ID on most network devices. Because this default is very well known, it is the first place attackers look to gain VLAN access.

By default, the router configuration protects against VLAN hopping attacks, where an attacker can try to fool the target switch into sending traffic from other networks by using generic tags or using dynamic VLAN negotiation protocols. If successful, the attacker can gain access to networks that are connected to the switch, but otherwise unavailable to the attacker. Cumulus Linux is built to mitigate this threat by ignoring packet requests coupled with generic tags.

To protect your ports, ensure that access ports are not assigned to VLAN 1 or any unused VLAN ID. The following shows an example of how you configure switch ports 1 to 48 as access ports for VLAN 99:

cumulus@switch:~$ net add bridge bridge ports swp1-48
cumulus@switch:~$ net add interface swp1-48 bridge access 99
cumulus@switch:~$ net commit

Ensure that no trunk ports use VLAN 1 and be thoughtful when assigning and removing VLANs to and from ports. In this example, Cumulus Linux adds swp3 as a trunk port and assigns it to VLANs 100 and 200.

cumulus@switch:~$ net add interface swp3 bridge vids 100,200
cumulus@switch:~$ net commit

Customize Control Plane Policies

Cumulus Linux comes out of the box with a default control plane security policy that is located in the /etc/cumulus/acl/policy.d/ directory. You can see a full list of the default rules here.

Best practices dictate that:

Be sure to apply changes to the default control plane policies with cl-acltool so that they are accelerated in hardware correctly.

The following command installs all the ACLs and control plane policy rules in the /etc/cumulus/acl/policy.d/ directory:

cumulus@switch:~$ sudo cl-acltool -i

You can verify that the rules are applied with the following command:

cumulus@switch:~$ sudo cl-acltool -L all

Disable Insecure SSL and TLS Protocol Versions in NGINX

Cumulus Linux is packaged with NGINX, an open source web server that supports the Cumulus Linux RESTful HTTP API through HTTPS. By default, NGINX is enabled and listening on localhost (127.0.0.1) port 8080.

For backward compatibility, NGINX natively supports the outdated and vulnerable SSLv3 and TLSv1 protocols. To secure against potential exploits using those protocols, you must disable them. Open the /etc/nginx/nginx.conf file in a text editor and edit the ssl_protocols line to allow only the TLSv1.1 and TLSv1.2 protocols:

cumulus@switch:~$ sudo nano /etc/nginx/nginx.conf
...
ssl_protocols TLSv1.1 TLSv1.2; # Dropping SSLv3, ref: POODLE

Restart the nginx service for the setting to take effect:

cumulus@switch:~$ sudo systemctl restart nginx.service

If you are not using the Cumulus HTTP API, you can uninstall NGINX to completely eliminate the TLS attack vector:

cumulus@switch:~$ sudo apt-get remove python-cumulus-restapi nginx-common
cumulus@switch:~$ sudo apt-get autoremove

Management Security

Enable Management VRF

Management virtual routing tables and forwarding (VRF) separates out-of-band management network traffic from the in-band data plane network. Management VRF increases network security by separating the management routing tables from the main routing tables. It also protects the management network from interference or attacks directed at the routing infrastructure of the main network, which might prevent you from managing the router or switch.

To take advantage of management VRF, connect interface eth0 to the out-of-band management network. Management VRF is enabled by default in Cumulus Linux 4.0 and later. For earlier versions of Cumulus Linux, use the following commands to configure a management VRF and eth0 to be a member of that VRF:

cumulus@switch:~$ net add vrf mgmt
cumulus@switch:~$ net commit

For more information about working with a management VRF, such as running services in a specific VRF, refer to the Management VRF documentation.

Be sure to enable all the network services inside each VRF, including:

Customize the Management ACL

The management access control list (ACL) is the main list of user permissions for Cumulus Linux. Review and customize the management ACL as soon as possible during the installation process to help prevent user errors or malicious behavior by restricting the abilities of administrative users. Due to many unique needs and environments, the management ACL is highly customizable; it is important that you change the defaults.

Use the following guidelines as a starting point to build your management ACL. These guidelines reference example IP addresses and ports that are shown in detail in the firewall rules in the next section.

Create iptables rules to restrict traffic flow in both directions — inbound and outbound. In the following example, the switch is assigned the management IP address 192.168.200.29.

To create the sample iptables firewall rules, create the /etc/cumulus/acl/policy.d/50management-acl.rules file, then add the following contents to the file:

cumulus@switch:~$ sudo nano /etc/cumulus/acl/policy.d/50management-acl.rules
[iptables]
INGRESS_INTF = swp+
-A INPUT -i eth0 -p udp --dport 68 -j ACCEPT
-A INPUT -i eth0 -s 192.168.200.0/24 -d 192.168.200.29 -p tcp --dport 22 -j ACCEPT
-A INPUT -i eth0 -s 192.168.200.10 -d 192.168.200.29 -p udp --dport 161 -j ACCEPT
-A INPUT -i eth0 -s 192.168.200.1 -d 192.168.200.29 -p udp --dport 123 -j ACCEPT
-A INPUT -i eth0 -s 192.168.200.1 -d 192.168.200.29 -p udp --dport 53 -j ACCEPT
-A INPUT -i eth0 -s 192.168.200.0/24 -d 192.168.200.29 -p udp --dport 5342 -j ACCEPT
-A INPUT -i eth0 -s 192.168.200.0/24 -d 192.168.200.29 -p udp --sport 5342 -j ACCEPT
-A INPUT -i eth0 -j LOG
-A INPUT -i eth0 -j DROP
-A OUTPUT -o eth0 -p udp --dport 67 -j ACCEPT
-A OUTPUT -o eth0 -s 192.168.200.29 -d 192.168.200.0/24 -p tcp --sport 22 -j ACCEPT
-A OUTPUT -o eth0 -s 192.168.200.29 -d 192.168.200.0/24 -p udp --sport 5342 -j ACCEPT
-A OUTPUT -o eth0 -s 192.168.200.29 -d 192.168.200.0/24 -p udp --dport 5342 -j ACCEPT
-A OUTPUT -o eth0 -s 192.168.200.29 -d 192.168.200.1 -p udp --dport 161 -j ACCEPT
-A OUTPUT -o eth0 -s 192.168.200.29 -d 192.168.200.1 -p udp --dport 123 -j ACCEPT
-A OUTPUT -o eth0 -s 192.168.200.29 -d 192.168.200.1 -p udp --dport 53 -j ACCEPT
-A OUTPUT -o eth0 -s 192.168.200.29 -d 192.168.200.1 -p udp --dport 514 -j ACCEPT
-A OUTPUT -o eth0 -s 192.168.200.29 -d 192.168.200.1 -p udp --dport 6343 -j ACCEPT
-A OUTPUT -o eth0 -s 192.168.200.29 -d 192.168.200.250 -p tcp --dport 31980 -j ACCEPT
-A OUTPUT -o eth0 -j LOG
-A OUTPUT -o eth0 -j DROP
-A INPUT --in-interface $INGRESS_INTF -s 10.0.0.0/8 -d 10.0.0.0/8 -j ACCEPT
-A INPUT --in-interface $INGRESS_INTF -j LOG
-A INPUT --in-interface $INGRESS_INTF -j DROP

This ACL does not take effect until you apply it with the cl-acltool -i command, as described above.

User Security

Lock the root Account

Safeguarding access to the root account is imperative to a secure system. Users should not have direct access to the root account. This helps to prevent universal changes from being made to the system accidentally. By default, the root account is locked.

To verify the root account status, run the following command:

cumulus@switch:~$ sudo passwd -S root

In the command output, the L (immediately following the word root) indicates the account is locked:

cumulus@switch:~$ sudo passwd -S root
root L 08/07/2019 0 99999 7 -1

The P (immediately following the word root) indicates the account is not locked:

cumulus@switch:~$ sudo passwd -S root
root P 09/10/2019 0 99999 7 -1

To lock the root account, run the following command:

cumulus@switch:~$ sudo passwd -l root

Harden sudo Access

The sudo command allows you to execute programs in Linux with the security privileges of a superuser. It is important to enforce sudo rules to avoid abuse. By default, sudo credentials are cached for a set amount of time after you execute a command with privileges using sudo.

To increase security, configure Cumulus Linux to require the superuser password with the sudo command. Run the visudo command to edit the default settings and change the timestamp_timeout option to 0:

cumulus@switch:~$ sudo visudo
...
Defaults        env_reset,timestamp_timeout=0

Ensure that there are no occurrences of NOPASSWD or !authenticate in the /etc/sudoers file or in files in the /etc/sudoers.d directory:

cumulus@switch:~$  sudo grep NOPASSWD /etc/sudoers
cumulus@switch:~$  sudo grep NOPASSWD /etc/sudoers.d/*
cumulus@switch:~$  sudo grep \!authenticate /etc/sudoers.d/*
cumulus@switch:~$  sudo grep \!authenticate /etc/sudoers

Limit User Sessions

It is important to limit the number of sessions a user can run at the same time so that a single user or account does not make contradictory changes. Good practice is to set the limit to 10.

Add the following line at the top of the /etc/security/limits.conf file:

cumulus@switch:~$ sudo nano /etc/security/limits.conf
hard maxlogins 10

Add the following lines to the /etc/profile.d/autologout.sh script to set the inactivity timeout to 10 minutes (600 seconds). Create the file if it does not exist:

cumulus@switch:~$ sudo nano /etc/profile.d/autologout.sh
TMOUT=600
readonly TMOUT
export TMOUT

Add the following line to the /etc/profile file to set a console timeout:

cumulus@switch:~$ sudo nano /etc/profile.d/autologout.sh
export TMOUT=600

Remote Access Security

Secure Shell (SSH) is a protocol for secure remote login and other secure network services over an insecure network. It is most commonly used to allow administrators to securely access remote systems, such as Linux.

By default, Cumulus Linux includes the following SSH security settings:

SSH public key files are protected with permissive mode 0644:

cumulus@switch:~$ ls -l /etc/ssh/*.pub
-rw-r--r-- 1 root root 602 Apr 29  2017 /etc/ssh/ssh_host_dsa_key.pub
-rw-r--r-- 1 root root 174 Apr 29  2017 /etc/ssh/ssh_host_ecdsa_key.pub
-rw-r--r-- 1 root root  94 Apr 29  2017 /etc/ssh/ssh_host_ed25519_key.pub
-rw-r--r-- 1 root root 394 Apr 29  2017 /etc/ssh/ssh_host_rsa_key.pub

Verify the SSH private host key files under /etc/ssh are set to read/write for your user account only:

cumulus@switch:~$ ls -alL /etc/ssh/ssh_host*key
-rw------- 1 root root  668 Apr 29  2017 /etc/ssh/ssh_host_dsa_key
-rw------- 1 root root  227 Apr 29  2017 /etc/ssh/ssh_host_ecdsa_key
-rw------- 1 root root  399 Apr 29  2017 /etc/ssh/ssh_host_ed25519_key
-rw------- 1 root root 1679 Apr 29  2017 /etc/ssh/ssh_host_rsa_key

To secure SSH further, consider enabling or reviewing the following options in the /etc/ssh/sshd_config file:

For more information about the options in sshd, read the man page.

Secure Network Protocols

Enable NTP Authentication

Network time protocol (NTP) synchronizes the time between a computer client and/or server to another time source or server. Time synchronization is critical for authentication and log management. To mitigate attacks involving forged time synchronization, connect your Cumulus Linux switch to an authenticated NTP server.

Add authentication keys to the /etc/ntp.keys file:

cumulus@switch:~$ sudo nano /etc/ntp.keys
#
# PLEASE DO NOT USE THE DEFAULT VALUES HERE.
#
#65535  M  akey
#1      M  pass
1  M  cumulus123

After you add the keys to the file, add at least two servers and their associated trusted keys to the /etc/ntp.conf file:

cumulus@switch:~$ sudo nano /etc/ntp.conf
server 192.168.0.254 key 1
keys /etc/ntp.keys
trustedkey 1
controlkey 1
requestkey 1

To configure time synchronization to occur at least once every 24 hours, add or correct the following lines in the /etc/ntp.conf file. Replace [source] in the following line with an authoritative time source:

cumulus@switch:~$ sudo nano /etc/ntp.conf
maxpoll = 17
server [source] iburst

Restart NTP for the changes to take effect:

cumulus@switch:~$ sudo systemctl restart ntp@mgmt.service

Secure Routing Protocols

Open Shortest Path First (OSPF) and Border Gateway Protocol (BGP) are dynamic routing protocols that allow routers to negotiate with each other to determine the best path to send traffic through the network. During this negotiation, routers learn about the networks connected to the other routers. The routers then use this information to determine where to send traffic on the network.

If left unsecured, an attacker can exploit a dynamic routing protocol such as OSPF or BGP to reroute packets to a rogue system instead of its intended destination. To help mitigate this threat, enable authentication on these protocols.

To configure OSPF authentication, two NCLU commands are required: one to add the key and a second to enable authentication on an interface (swp1 in the command example):

cumulus@switch:~$ net add interface swp1 ospf message-digest-key 1 md5 thisisthekey
cumulus@switch:~$ net add interface swp1 ospf authentication message-digest
cumulus@switch:~$ net commit

For more information, read the OSPF chapter of the Cumulus Linux user guide.

To configure BGP authentication for an existing BGP neighbor, only a single password is required:

cumulus@switch:~$ net add bgp neighbor 1.1.1.1 password BGPPWD
cumulus@switch:~$ net commit

For more information, read the BGP chapter of the Cumulus Linux user guide.

Remove File Transfer Services

Trivial File Transfer Protocol (TFTP) is a simple and unauthenticated alternative to File Transfer Protcol (FTP) and is often used to update the configuration of network devices. By nature, TFTP contains no means to authenticate the user. If using TFTP is not mandatory within your organization, disable or uninstall it.

To uninstall the TFTP service:

cumulus@switch:~$ sudo apt-get remove atftp atftpd
cumulus@switch:~$ sudo apt-get remove tftpd-hpa

To uninstall the FTP service:

cumulus@switch:~$ sudo apt-get remove vsftpd

To verify that neither TFTP nor FTP is installed:

cumulus@switch:~$ sudo dpkg -l | grep *ftp*

Medium Impact to Security, Medium Impact to Usability

This section discusses items that have similar impacts to security as well as management and user experiences.

Hardware Security

Configure 802.1X

802.1X is a popular technology because it authenticates devices that physically attach to the switch. It can also assign these devices different levels of access to the network after they authenticate. There are many use cases for this technology and each configuration varies widely. For additional details, see the 802.1X Interfaces chapter in the Cumulus Linux user guide.

The following example is a starting point to build on. This is a base 802.1X configuration:

  1. Run the following commands:

    cumulus@switch:~$ net add dot1x radius server-ip 192.168.200.254 vrf mgmt
    cumulus@switch:~$ net add dot1x radius client-source-ip 192.168.200.29
    cumulus@switch:~$ net add dot1x radius shared-secret supersecret
    cumulus@switch:~$ net add dot1x send-eap-request-id
    cumulus@switch:~$ net add dot1x dynamic-vlan
    cumulus@switch:~$ net commit
    
  2. Open the hostapd.conf file in a text editor and change the last two values in the file from 1 to 0:

    cumulus@switch:~$ sudo nano hostapd.conf
    ...
    radius_das_require_event_timestamp=0
    radius_das_require_message_authenticator=0
    
  3. Restart hostapd.service.

    cumulus@switch:~$ sudo systemctl restart hostapd.service
    
  4. Enable 802.1X on switch ports (swp1 through swp4 in this example):

    cumulus@switch:~$ net add interface swp1-4 dot1x
    
  5. Configure the 802.1X reauthentication period:

    cumulus@switch:~$ net add dot1x eap-reauth-period 3600
    
  6. Configure the maximum number of stations:

    cumulus@switch:~$ net add dot1x max-number-stations 1
    cumulus@switch:~$ net commit
    

For a deeper dive into campus features, read the campus architecture guide.

Disable USB Ports

The Cumulus Linux switch comes with several USB ports as part of the external hardware. USB drives are standard among many industries and therefore easily accessible to those who want to do harm to the switch. While a best practice for any switch, disabling the USB ports is especially important if Cumulus Linux is set up in a publicly available area.

Management Security

Set Password Policies

User passwords are the easiest way to break into any system. After a hacker steals a password, they have access to whatever the user has and can obtain information without raising too much suspicion. Therefore, many companies enforce specific user password requirements.

The default password requirements for Cumulus Linux are strong cryptographic hash (SHA-512). No accounts with nullok exist in the /etc/pam.d file.

Password configurations should be consistent with NIST password complexity guidelines, but companies can set their own individual requirements for users.

Remove Unnecessary Services

Unnecessary services that remain installed can cause open sockets that become target attack vectors. These services can be accidentally misused and can cause malfunctions. It is important to uninstall any programs or services that are not in use.

Enable an Emergency User Account

If your organization relies on a central authentication system such as TACACS or RADIUS, consider enabling an emergency administration account to access Cumulus Linux during times when the authentication systems are unavailable. Create the emergency admin account with its age set to never expire.

Run the following command to set the password policy for the emergency administrator account to never expire. Replace [Emergency_Administrator] with the correct emergency administrator account. You can create an emergency administrator user account with your standard user creation workflow.

cumulus@switch:~$ sudo chage -I -1 -M 99999 [Emergency_Administrator]

Configure a Login Banner

To prominently disclose the security and restrictions in place on your switch, enable login banners for all users so that they see your message upon login. Proper disclosure of your security policies upon login can help rule out legal defenses for inappropriate use of the equipment. Consult with the legal representative in your organization to obtain the proper wording of the login banner message.

To enable a login banner for all SSH login sessions:

  1. Edit the /etc/issue.net file and add the login banner text approved by your organizational security policy:

    cumulus@switch:~$ sudo nano /etc/issue.net
    You are accessing an Information System (IS) that is provided for authorized use only.
    
  2. Edit the /etc/ssh/sshd_config file to enable the login banner:

    cumulus@switch:~$ sudo nano /etc/ssh/sshd_config
    ...
    Banner /etc/issue.net
    
  3. Restart the ssh service:

    cumulus@switch:~$ sudo systemctl restart ssh@mgmt.service
    

Configure System Audits

Configure your system to log administrative events, then periodically audit those logs to ensure your system security policies are working as desired, as well as to detect any unauthorized attempts to gain access to your system. Auditing the system can also be helpful when troubleshooting issues. By looking at specific log events, you can identify consistent problems.

Cumulus Linux has many audit logs enabled by default. Make sure that the overall level of audit logging required conforms to your organizational security policy, as well as its need to track performance information about the system.

To view the default configurations:

cumulus@switch:~$ sudo grep log_file /etc/audit/auditd.conf
log_file = /var/log/audit/audit.log
max_log_file = 6
max_log_file_action = ROTATE

To view the size of the audit logs:

cumulus@switch:~$ df /var/log/audit/ -h
Filesystem      Size  Used Avail Use% Mounted on
/dev/sda4       5.8G  931M  4.6G  17% /var/log

Secure Network Protocols

Prevent Source Routing

Source routing is a common security threat that allows attackers to send packets to your network, then use the returned information to break into your network. If your organization is not purposefully using source routing, disable it.

To disable IPv4 source-routed packets, set the current behavior with the following command:

cumulus@switch:~$ sudo sysctl -w net.ipv4.conf.default.accept_source_route=0

Check the default (boot up) setting:

cumulus@switch:~$ sudo sysctl net.ipv4.conf.default.accept_source_route

If the default value is not 0, add or update the following line in the /etc/sysctl.conf file so the setting persists after a reboot:

cumulus@switch:~$ sudo nano /etc/sysctl.conf
...
net.ipv4.conf.default.accept_source_route=0

Alternatively, you can create a new file in the /etc/sysctl.d directory, then add the net.ipv4.conf.default.accept_source_route = 0 line to the file.

Prevent ICMP Redirects

Internet Control Message Protocol (ICMP) is a great troubleshooting tool, but can be a security threat if your router automatically accepts an ICMP redirect message. Attackers can use this to their advantage by sending unrecognized redirects to either capture your traffic or create a DOS attack.

To prevent IPv4 ICMP redirect messages from being accepted, set a runtime configuration with the following commands. A runtime configuration does not persist when you reboot the switch.

cumulus@switch:~$ sudo sysctl -w net.ipv4.conf.default.accept_redirects=0

Check the default (boot up) setting:

cumulus@switch:~$ sudo sysctl net.ipv4.conf.default.accept_redirects

If the default value is not 0, you can make the setting persist after a reboot. Add or update the following line in the /etc/sysctl.conf file:

cumulus@switch:~$ sudo nano /etc/sysctl.conf
...
net.ipv4.conf.default.accept_redirects = 0

Alternatively, you can create a new file in the /etc/sysctl.d directory and add the net.ipv4.conf.default.accept_redirects =0 line.

To prevent IPv4 ICMP redirect messages from being sent, set a runtime configuration with the following commands. A runtime configuration does not persist when you reboot the switch.

cumulus@switch:~$ sudo sysctl -w net.ipv4.conf.default.send_redirects=0
cumulus@switch:~$ sudo sysctl -w net.ipv4.conf.all.send_redirects=0

Check the default (boot up) setting:

cumulus@switch:~$ sudo sysctl net.ipv4.conf.default.send_redirects
net.ipv4.conf.default.send_redirects = 1

cumulus@switch:~$ sudo sysctl net.ipv4.conf.all.send_redirects
net.ipv4.conf.all.send_redirects = 1

If the default value is not 0, you can make the setting persist after a reboot. Add or update the following lines in the /etc/sysctl.conf file:

cumulus@switch:~$ sudo nano /etc/sysctl.conf
...
net.ipv4.conf.default.send_redirects = 0
net.ipv4.conf.all.send_redirects = 0

Alternatively, you can create a new file in the /etc/sysctl.d directory and add the net.ipv4.conf.default.send_redirects = 0 and net.ipv4.conf.all.send_redirects = 0 lines.

Low Impact to Security, High Impact to Usability

This section discusses items that have a relatively low impact on security but have the potential to greatly disturb the user experience.

Password Protect the Bootloader

A bootloader is a program that launches the operating system when the switch is powered on or rebooted. Adding a password to the bootloader does not significantly improve the security of a system but it can cause accidental outages.

For example, if you configure the switch to have a bootloader password, then work on the switch remotely and make a change that requires a system reboot, when the system begins to boot, it halts and waits for the bootloader password, which you cannot enter unless you are physically at the switch. Instead of a quick reboot, the switch sits offline until someone can enter the bootloader password on the switch and allow it to launch the operating system and bring the switch back online.

Debian Packages

One of the most tempting services to configure is the Debian package manager that controls the software and updates installed on your switch. For example, you might think it is a good idea to configure the package manager to remove all outdated software packages after a new update is completed. While it makes more disk space available, it also prevents you from quickly rolling back to a previous version if a software glitch causes the system to malfunction or stop communicating.