This chapter discusses the various architectures and strategies available from the top of rack (ToR) switches all the way down to the server hosts.

Contents

 This chapter covers ...

Layer 2 - Architecture


Traditional Spanning Tree - Single Attached

SummaryMore Information

Bond/Etherchannel is not configured on host to multiple switches (bonds can still occur but only to one switch at a time), so leaf01 and leaf02 see two different MAC addresses.

Configurations

 Click here to expand...

leaf01 Config

auto bridge
iface bridge
  bridge-vlan-aware yes
  bridge-ports swp1 peerlink
  bridge-vids 1-2000
  bridge-stp on

auto bridge.10
iface bridge.10
  address 10.1.10.2/24

auto peerlink
iface peerlink
    bond-slaves glob swp49-50

auto swp1
iface swp1
  mstpctl-portadminedge yes
  mstpctl-bpduguard yes

Example Host Config (Ubuntu)

auto eth1
iface eth1 inet manual

auto eth1.10
iface eth1.10 inet manual

auto eth2
iface eth1 inet manual

auto eth2.20
iface eth2.20 inet manual

auto br-10
iface br-10 inet manual
  bridge-ports eth1.10 vnet0

auto br-20
iface br-20 inet manual
  bridge-ports eth2.20 vnet1

Benefits

  • Established technology
    • Interoperability with other vendors
    • Easy configuration for customer
    • Immense documentation from multiple vendors and industry
  • Ability to use spanning tree commands
  • Layer 2 reachability to all VMs

Caveats

  • The load balancing mechanism on the host can cause problems. If there is only host pinning to each NIC, there are no problems, but if you are doing a bond, you need to look at an MLAG solution.
  • No active-active host links. Some operating systems allow HA (NIC failover), but this still does not utilize all the bandwidth. VMs are using one NIC, not two.
Active-Active ModeActive-Passive Mode

L2 to L3 Demarcation

  • None (not possible with traditional spanning tree)

 


  • ToR layer (recommended)
  • Spine layer
  • Core/edge/exit
 More Info...

VRR can be configured on a pair of switches at any level in the network. However, the higher up the network you configure it, the larger the L2 domain becomes. The benefit here is L2 reachability. The drawback is the L2 domain is more difficult to troubleshoot, does not scale as well, and the pair of switches running VRR needs to carry the entire MAC address table of everything below it in the network. Minimizing the L2 domain as much as possible is recommended by Cumulus Professional Services. Please see this presentation for more information.

MLAG

SummaryMore Information

MLAG (multi-chassis link aggregation) is when both uplinks are utilized at the same time. VRR gives the ability for both spines to act as gateways simultaneously for HA (high availability) and active-active mode (both are being used at the same time).

Configurations

 Click here to expand...

leaf01 Config

auto bridge
iface bridge
  bridge-vlan-aware yes
  bridge-ports host-01 peerlink
  bridge-vids 1-2000
  bridge-stp on

auto bridge.10
iface bridge.10
  address 172.16.1.2/24
  address-virtual 44:38:39:00:00:10 172.16.1.1/24

auto peerlink
iface peerlink
    bond-slaves glob swp49-50

auto peerlink.4094
iface peerlink.4094
    address 169.254.1.2
    clagd-enable yes
    clagd-peer-ip 169.254.1.2
    clagd-system-mac 44:38:39:FF:40:94

auto host-01
iface host-01
  bond-slaves swp1
  clag-id 1
  {bond-defaults removed for brevity}

Example Host Config (Ubuntu)

auto bond0
iface bond0 inet manual
  bond-slaves eth0 eth1
  {bond-defaults removed for brevity}

auto bond0.10
iface bond0.10 inet manual

auto vm-br10
iface vm-br10 inet manual
  bridge-ports bond0.10 vnet0

Benefits

  • 100% of links utilized

Caveats

  • More complicated (more moving parts)
  • More configuration
  • No interoperability between vendors
  • ISL (inter-switch link) required

Additional Comments

Active-Active ModeActive-Passive ModeL2->L3 Demarcation

None

  • ToR layer (recommended)
  • Spine layer
  • Core/edge/exit

Layer 3 Architecture

Single-attached Hosts

SummaryMore Information

The server (physical host) has only has one link to one ToR switch.

Configurations

 Click here to expand...

leaf01 Config

/etc/network/interfaces

auto swp1
iface swp1
  address 172.16.1.1/30

/etc/frr/frr.conf

router ospf
  router-id 10.0.0.11
interface swp1
  ip ospf area 0

leaf02 Config

/etc/network/interfaces

auto swp1
iface swp1
  address 172.16.2.1/30

/etc/frr/frr.conf

router ospf
  router-id 10.0.0.12
interface swp1
  ip ospf area 0

host1 Example Config (Ubuntu)

auto eth1
iface eth1 inet static
  address 172.16.1.2/30
  up ip route add 0.0.0.0/0 nexthop via 172.16.1.1

host2 Example Config (Ubuntu)

auto eth1
iface eth1 inet static
  address 172.16.2.2/30
  up ip route add 0.0.0.0/0 nexthop via 172.16.2.1

Benefits

  • Relatively simple network configuration
  • No STP
  • No MLAG
  • No L2 loops
  • No crosslink between leafs
  • Greater route scaling and flexibility

Caveats

  • No redundancy for ToR, upgrades would cause downtime
  • Many customers do not have software to support application layer redundancy

Additional Comments

  • For additional bandwidth links between host and leaf may be bonded
FHR (First Hop Redundancy)More Information
  • No redundancy, uses single ToR as gateway.

Redistribute Neighbor

SummaryMore Information

Redistribute neighbor daemon grabs ARP entries dynamically, utilizes redistribute table for FRRouting to grab these dynamic entries and redistribute them into the fabric.

 

Benefits

  • Configuration in FRRouting is simple (route-map + redist table)
  • Supported by Cumulus Networks

Caveats

  • Silent hosts don't receive traffic (depending on ARP).
  • IPv4 only.
  • If two VMs are on same L2 domain, they could learn about each other directly rather than utilizing gateway, which causes problems (VM migration for example, or getting their network routed). Put hosts on /32 (no other L2 adjacency).
  • VM move does not trigger route withdrawal from original leaf (4 hour timeout).
  • Clearing ARP impacts routing. May not be obvious.
  • No L2 adjacency between servers without VXLAN.
FHR (First Hop Redundancy)More Information
  • Equal cost route installed on server/host/hypervisor to both ToRs to load balance evenly.
  • For host/VM/container mobility, use the same default route on all hosts (such as x.x.x.1) but don't distribute or advertise the .1 on the ToR into the fabric. This allows the VM to use the same gateway no matter which pair of leafs it is cabled to.

Routing on the Host

SummaryMore Information

Routing on the host means there is a routing application (such as FRRouting) either on the bare metal host (no VMs/containers) or the hypervisor (for example, Ubuntu with KVM). This is highly recommended by the Cumulus Networks Professional Services team.

Benefits

  • No requirement for MLAG
  • No spanning-tree or layer 2 domain
  • No loops
  • 3 or more ToRs can be used instead of usual 2
  • Host and VM mobility
  • Traffic engineering can be used to migrate traffic from one ToR to another for upgrading both hardware and software

Caveats

  • Certain hypervisors or host OSes might not support a routing application like FRRouting and will require a virtual router on the hypervisor
  • No L2 adjacnecy between servers without VXLAN

 

FHR (First Hop Redundancy)More Information
  • The first hop is still the ToR, just like redistribute neighbor
  • A default route can be advertised by all leaf/ToRs for dynamic ECMP paths

Routing on the VM

SummaryMore Information

Instead of routing on the hypervisor, each virtual machine utilizes its own routing stack.

 

Benefits

  • In addition to routing on host:
    • Hypervisor/base OS does not need to be able to do routing
    • VMs can be authenticated into routing fabric

Caveats

  • All VMs must be capable of routing
  • Scale considerations might need to be taken into an account —
    instead of one routing process, there are as many as there are VMs
  • No L2 adjacency between servers without VXLAN
FHR (First Hop Redundancy)More Information
  • The first hop is still the ToR, just like redistribute neighbor 
  • Multiple ToRs (2+) can be used

Virtual Router

SummaryMore Information

Virtual router (vRouter) runs as a VM on the hypervisor/host, sends routes to the ToR using BGP or OSPF.

 

Benefits

In addition to routing on a host:

  • Multi-tenancy can work (multiple customers sharing same racks)
  • Base OS does not need to be routing capable

Caveats

  • ECMP might not work correctly (load balancing to multiple ToRs); Linux kernel in older versions is not capable of ECMP per flow (does it per packet)
  • No L2 adjacency between servers without VXLAN

 

FHR (First Hop Redundancy)More Information
  • The gateway would be the vRouter, which has two routes out (two ToRs)
  • Multiple vRouters could be used

Anycast with Manual Redistribution

SummaryMore Information

In contrast to routing on the host (preferred), this method allows a user to route to the host. The ToRs are the gateway, as with redistribute neighbor, except because there is no daemon running, the networks must be manually configured under the routing process. There is a potential to black hole unless a script is run to remove the routes when the host no longer responds.

Configurations

 Click here to expand...

leaf01 Config

/etc/network/interfaces

auto swp1
iface swp1
  address 172.16.1.1/30

/etc/frr/frr.conf

router ospf
  router-id 10.0.0.11
interface swp1
  ip ospf area 0

leaf02 Config

/etc/network/interfaces

auto swp2
iface swp2
  address 172.16.1.1/30

/etc/frr/frr.conf

router ospf
  router-id 10.0.0.12
interface swp1
  ip ospf area 0

Example Host Config (Ubuntu)

auto lo
iface lo inet loopback

auto lo:1
iface lo:1 inet static
  address 172.16.1.2/32
  up ip route add 0.0.0.0/0 nexthop via 172.16.1.1 dev eth0 onlink nexthop via 172.16.1.1 dev eth1 onlink

auto eth1
iface eth2 inet static
  address 172.16.1.2/32

auto eth2
iface eth2 inet static
  address 172.16.1.2/32

 

 

Benefits

  • Most benefits of routing on the host 
  • No requirement for host to run routing
  • No requirement for redistribute neighbor

Caveats

  • Removing a subnet from one ToR and re-adding it to another (hence, network statements from your router process) is a manual process
  • Network team and server team would have to be in sync, or server team controls the ToR, or automation is being used whenever VM migration happens
  • When using VMs/containers it is very easy to black hole traffic, as the leafs continue to advertise prefixes even when VM is down
  • No L2 adjacency between servers without VXLAN

 

FHR (First Hop Redundancy)More Information
  • The gateways would be the ToRs, exactly like redistribute neighbor with an equal cost route installed
 

Network Virtualization

LNV with MLAG

 SummaryMore Information
 

The host runs LACP (Etherchannel/bond) to the pair of ToRs. LNV (Lightweight Network Virtualization) then transports the L2 bridges across an L3 fabric.

Configurations

 Click here to expand...

leaf01 Config

/etc/network/interfaces

auto lo
iface lo inet loopback 
  address 10.0.0.11/32
  vxrd-src-ip 10.0.0.11
  vxrd-svcnode-ip 10.10.10.10
  clagd-vxlan-anycast-ip 36.0.0.11

auto vni-10
iface vni-10 
  vxlan-id 10 
  vxlan-local-tunnelip 10.0.0.11

auto br-10 
iface br-10
  bridge-ports swp1 vni-10

leaf02 Config

/etc/network/interfaces

auto lo
iface lo inet loopback 
  address 10.0.0.12/32
  Vxrd-src-ip 10.0.0.12
  vxrd-svcnode-ip 10.10.10.10
  clagd-vxlan-anycast-ip 36.0.0.11

auto vni-10
iface vni-10 
  vxlan-id 10 
  vxlan-local-tunnelip 10.0.0.12

auto br-10 
iface br-10
  bridge-ports swp1 vni-10

Benefits

  • Layer 2 domain is reduced to the pair of ToRs
  • Aggregation layer is all L3 (VLANs do not have to exist on spine switches)
  • Greater route scaling and flexibility
  • High availability

Caveats

 

Active-Active ModeActive-Passive ModeDemarcation

None

  • ToR layer or exit leafs
  More Information