LNV active-active mode allows a pair of MLAG switches to act as a single VTEP, providing active-active VXLAN termination for bare metal as well as virtualized workloads. 

Contents

 This chapter covers ...

Terminology and Definitions

TermDefinition
vxrdVXLAN registration daemon. Runs on the switch that is mapping VLANs to VXLANs. The vxrd daemon needs to be configured to register to a service node. This turns the switch into a VTEP.
VTEPVirtual tunnel endpoint. This is an encapsulation and decapsulation point for VXLANs.
active-active VTEPA pair of switches acting as a single VTEP.
ToRTop of rack switch. Also referred to as a leaf or access switch.
SpineThe aggregation switch for multiple leafs. Specifically used when a data center is using a Clos network architecture. Read more about spine-leaf architecture in this white paper.
vxsndVXLAN service node daemon, that can be run to register multiple VTEPs.
exit leafA switch dedicated to peering the Clos network to an outside network. Also referred to as border leafs, service leafs or edge leafs.
anycastWhen an IP address is advertised from multiple locations. Allows multiple devices to share the same IP and effectively load balance traffic across them. With LNV, anycast is used in 2 places:
  1. To share a VTEP IP address between a pair of MLAG switches.  
  2. To load balance traffic for service nodes (e.g. service nodes share an IP address).
ASICApplication-specific integrated circuit. Also referred to as hardware, or hardware accelerated. Encapsulation and decapsulation are required for the best performance VXLAN-supported ASIC.
RIOTBroadcom feature for Routing in and out of tunnels. Allows a VXLAN bridge to have a switch VLAN interface associated with it, and traffic to exit a VXLAN into the layer 3 fabric. Also called VXLAN Routing.
VXLAN RoutingIndustry standard term for ability to route in and out of a VXLAN. Equivalent to Broadcom RIOT feature.

Configuring LNV Active-Active Mode

LNV requires the following underlying technologies to work correctly.

TechnologyMore Information
MLAGRefer to the MLAG chapter for more detailed configuration information. Configurations for the demonstration are provided below.
OSPF or BGPRefer to the OSPF chapter or the BGP Chapter for more detailed configuration information. Configurations for the demonstration are provided below.
LNVRefer to the LNV chapter for more detailed configuration information. Configurations for the demonstration are provided below.
STPBPDU filter and BPDU guard should be enabled in the VXLAN interfaces if STP is enabled in the bridge that is connected to the VXLAN.
Configurations for the demonstration are provided below. 

active-active VTEP Anycast IP Behavior

Each individual switch within an MLAG pair should be provisioned with a virtual IP address in the form of an anycast IP address for VXLAN data-path termination. The VXLAN termination address is an anycast IP address that you configure as a clagd parameter (clagd-vxlan-anycast-ip) under the loopback interface. clagd dynamically adds and removes this address as the loopback interface address as follows:

1

When the switches boot up,  ifupdown2 places all VXLAN interfaces in a PROTO_DOWN state. The configured anycast addresses are not configured yet.

2

MLAG peering takes place, and a successful VXLAN interface consistency check between the switches occurs.

3clagd (the daemon responsible for MLAG) adds the anycast address to the loopback interface. It then changes the local IP address of the VXLAN interface from a unique address to the anycast virtual IP address and puts the interface in an UP state. 

Failure Scenario Behaviors

ScenarioBehavior
The peer link goes down.

The primary MLAG switch continues to keep all VXLAN interfaces up with the anycast IP address while the secondary switch brings down all VXLAN interfaces and places them in a PROTO_DOWN state. The secondary MLAG switch removes the anycast IP address from the loopback interface and changes the local IP address of the VXLAN interface to the configured unique IP address.

One of the switches goes down.

The other operational switch continues to use the anycast IP address.

clagd  is stopped.

All VXLAN interfaces are put in a PROTO_DOWN state. The anycast IP address is removed from the loopback interface and the local IP addresses of the VXLAN interfaces are changed from the anycast IP address to unique non-virtual IP addresses.

MLAG peering could not be established between the switches.

clagd brings up all the VXLAN interfaces after the reload timer expires with the configured anycast IP address. This allows the VXLAN interface to be up and running on both switches even though peering is not established.

When the peer link goes down but the peer switch is up (i.e. the backup link is active).All VXLAN interfaces are put into a PROTO_DOWN state on the secondary switch.
A configuration mismatch between the MLAG switchesThe VXLAN interface is placed into a PROTO_DOWN state on the secondary switch.

Checking VXLAN Interface Configuration Consistency

The LNV active-active configuration for a given VXLAN interface has to be consistent between the MLAG switches for correct traffic behavior.  MLAG ensures that the configuration consistency is met before bringing up the VXLAN interfaces.

The consistency checks include:

  • The anycast virtual IP address for VXLAN termination must be the same on each pair of switches.
  • A VXLAN interface with the same VXLAN ID must be configured and administratively up on both switches.

You can use the clagctl command to check if any VXLAN switches are in a PROTO_DOWN state. 

Configuring the Anycast IP Address

With MLAG peering, both switches use an anycast IP address for VXLAN encapsulation and decapsulation. This allows remote VTEPs to learn the host MAC addresses attached to the MLAG switches against one logical VTEP, even though the switches independently encapsulate and decapsulate layer 2 traffic originating from the host. The anycast address under the loopback interface can be configured as shown below.

leaf01:   /etc/network/interfaces snippet

auto lo
iface lo inet loopback
  address 10.0.0.11/32
  vxrd-src-ip 10.0.0.11
  vxrd-svcnode-ip 10.10.10.10
  clagd-vxlan-anycast-ip 10.10.10.20

leaf02: /etc/network/interfaces snippet

auto lo
iface lo inet loopback
  address 10.0.0.12/32
  vxrd-src-ip 10.0.0.12
  vxrd-svcnode-ip 10.10.10.10
  clagd-vxlan-anycast-ip 10.10.10.20

Explanation of Variables

VariableExplanation
vxrd-src-ip
The unique IP address for the vxrd to bind to.
vxrd-svcnode-ip
The service node anycast IP address in the topology. In this demonstration, this is an anycast IP address being shared by both spine switches.
clagd-vxlan-anycast-ip
The anycast address for the MLAG pair to share and bind to when MLAG is up and running.

Example VXLAN Active-Active Configuration

 

Note the configuration of the local IP address in the VXLAN interfaces below. They are configured with individual IP addresses, which clagd changes to anycast upon MLAG peering. 

Quagga Configuration

The layer 3 fabric can be configured using BGP or OSPF. The following example uses BGP Unnumbered. The MLAG switch configuration for the topology above is shown below.

Layer 3 IP Addressing

The IP address configuration for this example:

spine01: /etc/network/interfaces

auto lo
iface lo inet loopback
    address 10.0.0.21/32
    address 10.10.10.10/32
    
auto eth0
iface eth0 inet dhcp

# downlinks
auto swp1
iface swp1

auto swp2
iface swp2

auto swp3
iface swp3

auto swp4
iface swp4

auto swp29
iface swp29

auto swp30
iface swp30

spine02: /etc/network/interfaces

auto lo
iface lo inet loopback
    address 10.0.0.22/32
    address 10.10.10.10/32

auto eth0
iface eth0 inet dhcp

# downlinks
auto swp1
iface swp1

auto swp2
iface swp2

auto swp3
iface swp3

auto swp4
iface swp4

auto swp29
iface swp29

auto swp30
iface swp30

leaf01: /etc/network/interfaces

auto lo
iface lo inet loopback
    address 10.0.0.11/32
    vxrd-src-ip 10.0.0.11
	vxrd-svcnode-ip 10.10.10.10
	clagd-vxlan-anycast-ip 10.10.10.20
	   
auto eth0
iface eth0 inet dhcp

# peerlinks
auto swp49
iface swp49

auto swp50
iface swp50

auto peerlink
iface peerlink
  bond-slaves swp49 swp50
  bond-mode 802.3ad
  bond-miimon 100
  bond-use-carrier 1
  bond-lacp-rate 1
  bond-min-links 1
  bond-xmit-hash-policy layer3+4
      
auto peerlink.4094
iface peerlink.4094
  address 169.254.1.1/30
  clagd-peer-ip 169.254.1.2
  clagd-backup-ip 10.0.0.12 
  clagd-sys-mac 44:38:39:FF:40:94

# Downlinks
auto swp1
iface swp1

  
auto bond0 
iface bond0
    bond-slaves swp1 
    clag-id 1
    bond-miimon 100
    bond-min-links 1
    bond-mode 802.3ad
    bond-xmit-hash-policy layer3+4
    bond-lacp-rate 1       

# bridges / vlan that contain peerlink and downlinks for L2 connectivity

auto native
iface native
  bridge-ports peerlink bond0 vxlan1
  bridge-stp on
  mstpctl-portbpdufilter vxlan1=yes
  mstpctl-bpduguard vxlan1=yes 
     
auto vlan10
iface vlan10
  bridge-ports peerlink.10 bond0.10 vxlan10
  bridge-stp on
  mstpctl-portbpdufilter vxlan10=yes
  mstpctl-bpduguard vxlan10=yes      

auto vlan20
iface vlan20
  bridge-ports peerlink.20 bond0.20 vxlan20
  bridge-stp on
  mstpctl-portbpdufilter vxlan20=yes
  mstpctl-bpduguard vxlan20=yes

#vxlan config
auto vxlan1
iface vxlan1
  vxlan-id 1
  vxlan-local-tunnelip 10.0.0.11
  
auto vxlan10
iface vxlan10
  vxlan-id 10
  vxlan-local-tunnelip 10.0.0.11
    
auto vxlan20
iface vxlan20
  vxlan-id 20
  vxlan-local-tunnelip 10.0.0.11
  
# uplinks
auto swp51
iface swp51

auto swp52
iface swp52  

leaf02: /etc/network/interfaces

auto lo
iface lo inet loopback
    address 10.0.0.12/32
    vxrd-src-ip 10.0.0.12
	vxrd-svcnode-ip 10.10.10.10
	clagd-vxlan-anycast-ip 10.10.10.20
	   
auto eth0
iface eth0 inet dhcp

# peerlinks
auto swp49
iface swp49

auto swp50
iface swp50

auto peerlink
iface peerlink
  bond-slaves swp49 swp50
  bond-mode 802.3ad
  bond-miimon 100
  bond-use-carrier 1
  bond-lacp-rate 1
  bond-min-links 1
  bond-xmit-hash-policy layer3+4
      
auto peerlink.4094
iface peerlink.4094
  address 169.254.1.2/30
  clagd-peer-ip 169.254.1.1
  clagd-backup-ip 10.0.0.11
  clagd-sys-mac 44:38:39:FF:40:94

# Downlinks
auto swp1
iface swp1

  
auto bond0 
iface bond0
    bond-slaves swp1 
    clag-id 1
    bond-miimon 100
    bond-min-links 1
    bond-mode 802.3ad
    bond-xmit-hash-policy layer3+4
    bond-lacp-rate 1       

# bridges / vlan that contain peerlink and downlinks for L2 connectivity

auto native
iface native
  bridge-ports peerlink bond0 vxlan1
  bridge-stp on
  mstpctl-portbpdufilter vxlan1=yes
  mstpctl-bpduguard vxlan1=yes    
   
auto vlan10
iface vlan10
  bridge-ports peerlink.10 bond0.10 vxlan10
  bridge-stp on
  mstpctl-portbpdufilter vxlan10=yes
  mstpctl-bpduguard vxlan10=yes      

auto vlan20
iface vlan20
  bridge-ports peerlink.20 bond0.20 vxlan20
  bridge-stp on
  mstpctl-portbpdufilter vxlan20=yes
  mstpctl-bpduguard vxlan20=yes

#vxlan config
auto vxlan1
iface vxlan1
  vxlan-id 1
  vxlan-local-tunnelip 10.0.0.12
  
auto vxlan10
iface vxlan10
  vxlan-id 10
  vxlan-local-tunnelip 10.0.0.12
    
auto vxlan20
iface vxlan20
  vxlan-id 20
  vxlan-local-tunnelip 10.0.0.12
  
# uplinks
auto swp51
iface swp51

auto swp52
iface swp52  

leaf3: /etc/network/interfaces

auto lo
iface lo inet loopback
  address 10.0.0.13/32
  vxrd-src-ip 10.0.0.13
  vxrd-svcnode-ip 10.10.10.10
  clagd-vxlan-anycast-ip 10.10.10.30
	   
auto eth0
iface eth0 inet dhcp

# peerlinks
auto swp49
iface swp49

auto swp50
iface sw50p

auto peerlink
iface peerlink
  bond-slaves swp49 swp50
  bond-mode 802.3ad
  bond-miimon 100
  bond-use-carrier 1
  bond-lacp-rate 1
  bond-min-links 1
  bond-xmit-hash-policy layer3+4
      
auto peerlink.4094
iface peerlink.4094
  address 169.254.1.1/30
  clagd-peer-ip 169.254.1.2
  clagd-backup-ip 10.0.0.14
  clagd-sys-mac 44:38:39:FF:40:95

# Downlinks
auto swp1
iface swp1
  
auto bond0 
iface bond0
    bond-slaves swp1 
    clag-id 1
    bond-miimon 100
    bond-min-links 1
    bond-mode 802.3ad
    bond-xmit-hash-policy layer3+4
    bond-lacp-rate 1       

# bridges / vlan that contain peerlink and downlinks for L2 connectivity

auto native
iface native
  bridge-ports peerlink bond0 vxlan1
  bridge-stp on
  mstpctl-portbpdufilter vxlan1=yes
  mstpctl-bpduguard vxlan1=yes    
   
auto vlan10
iface vlan10
  bridge-ports peerlink.10 bond0.10 vxlan10
  bridge-stp on
  mstpctl-portbpdufilter vxlan10=yes
  mstpctl-bpduguard vxlan10=yes      

auto vlan20
iface vlan20
  bridge-ports peerlink.20 bond0.20 vxlan20
  bridge-stp on
  mstpctl-portbpdufilter vxlan20=yes
  mstpctl-bpduguard vxlan20=yes

#vxlan config
auto vxlan1
iface vxlan1
  vxlan-id 1
  vxlan-local-tunnelip 10.0.0.13
    
auto vxlan10
iface vxlan10
  vxlan-id 10
  vxlan-local-tunnelip 10.0.0.13
    
auto vxlan20
iface vxlan20
  vxlan-id 20
  vxlan-local-tunnelip 10.0.0.13
  
# uplinks
auto swp51
iface swp51

auto swp52
iface swp52    

leaf4: /etc/network/interfaces

auto lo
iface lo inet loopback
  address 10.0.0.14/32
  vxrd-src-ip 10.0.0.14
  vxrd-svcnode-ip 10.10.10.10
  clagd-vxlan-anycast-ip 10.10.10.30
	   
auto eth0
iface eth0 inet dhcp

# peerlinks
auto swp49
iface swp49

auto swp50
iface swp50

auto peerlink
iface peerlink
  bond-slaves swp49 swp50
  bond-mode 802.3ad
  bond-miimon 100
  bond-use-carrier 1
  bond-lacp-rate 1
  bond-min-links 1
  bond-xmit-hash-policy layer3+4
      
auto peerlink.4094
iface peerlink.4094
  address 169.254.1.2/30
  clagd-peer-ip 169.254.1.1
  clagd-backup-ip 10.0.0.13
  clagd-sys-mac 44:38:39:FF:40:95

# Downlinks
auto swp1
iface swp1
  
auto bond0 
iface bond0
    bond-slaves swp1 
    clag-id 1
    bond-miimon 100
    bond-min-links 1
    bond-mode 802.3ad
    bond-xmit-hash-policy layer3+4
    bond-lacp-rate 1       

# bridges / vlan that contain peerlink and downlinks for L2 connectivity

auto native
iface native
  bridge-ports peerlink bond0 vxlan1
  bridge-stp on
  mstpctl-portbpdufilter vxlan1=yes
  mstpctl-bpduguard vxlan1=yes    
   
auto vlan10
iface vlan10
  bridge-ports peerlink.10 bond0.10 vxlan10
  bridge-stp on
  mstpctl-portbpdufilter vxlan10=yes
  mstpctl-bpduguard vxlan10=yes      

auto vlan20
iface vlan20
  bridge-ports peerlink.20 bond0.20 vxlan20
  bridge-stp on
  mstpctl-portbpdufilter vxlan20=yes
  mstpctl-bpduguard vxlan20=yes

#vxlan config
auto vxlan1
iface vxlan1
  vxlan-id 1
  vxlan-local-tunnelip 10.0.0.14
  
auto vxlan10
iface vxlan10
  vxlan-id 10
  vxlan-local-tunnelip 10.0.0.14
    
auto vxlan20
iface vxlan20
  vxlan-id 20
  vxlan-local-tunnelip 10.0.0.14
  
# uplinks
auto swp51
iface swp51

auto swp52
iface swp52    

Quagga Configuration

The service nodes and registration nodes must all be routable between each other. The L3 fabric on Cumulus Linux can either be BGP or OSPF.  In this example, OSPF is used to demonstrate full reachability. 

The Quagga configuration using OSPF:

spine01:/etc/quagga/Quagga.conf

!
interface swp1
 no ipv6 nd suppress-ra
 ipv6 nd ra-interval 3
!
interface swp2
 no ipv6 nd suppress-ra
 ipv6 nd ra-interval 3
!
interface swp3
 no ipv6 nd suppress-ra
 ipv6 nd ra-interval 3
!
interface swp4
 no ipv6 nd suppress-ra
 ipv6 nd ra-interval 3
!
interface swp29
 no ipv6 nd suppress-ra
 ipv6 nd ra-interval 3
!
interface swp30
 no ipv6 nd suppress-ra
 ipv6 nd ra-interval 3
!
router bgp 65020
  bgp router-id 10.0.0.21
  network 10.0.0.21/32
  network 10.10.10.10/32
  bgp bestpath as-path multipath-relax
  bgp bestpath compare-routerid
  bgp default show-hostname  
  neighbor FABRIC peer-group
  neighbor FABRIC remote-as external
  neighbor FABRIC description Internal Fabric Network
  neighbor FABRIC advertisement-interval 0
  neighbor FABRIC timers 1 3
  neighbor FABRIC timers connect 3
  neighbor FABRIC capability extended-nexthop
  neighbor FABRIC prefix-list dc-spine in
  neighbor FABRIC prefix-list dc-spine out
  neighbor swp1 interface
  neighbor swp1 peer-group FABRIC
  neighbor swp2 interface
  neighbor swp2 peer-group FABRIC
  neighbor swp3 interface
  neighbor swp3 peer-group FABRIC
  neighbor swp4 interface
  neighbor swp4 peer-group FABRIC
  neighbor swp29 interface
  neighbor swp29 peer-group FABRIC
  neighbor swp30 interface
  neighbor swp30 peer-group FABRIC      
!
ip prefix-list dc-spine seq 10 permit 0.0.0.0/0
ip prefix-list dc-spine seq 15 permit 10.0.0.0/24 le 32
ip prefix-list dc-spine seq 20 permit 10.10.10.0/24 le 32
ip prefix-list dc-spine seq 30 permit 172.16.1.0/24
ip prefix-list dc-spine seq 40 permit 172.16.2.0/24
ip prefix-list dc-spine seq 50 permit 172.16.3.0/24
ip prefix-list dc-spine seq 60 permit 172.16.4.0/24
ip prefix-list dc-spine seq 500 deny any
!

spine02: /etc/quagga/Quagga.conf

!
interface swp1
 no ipv6 nd suppress-ra
 ipv6 nd ra-interval 3
!
interface swp2
 no ipv6 nd suppress-ra
 ipv6 nd ra-interval 3
!
interface swp3
 no ipv6 nd suppress-ra
 ipv6 nd ra-interval 3
!
interface swp4
 no ipv6 nd suppress-ra
 ipv6 nd ra-interval 3
!
interface swp29
 no ipv6 nd suppress-ra
 ipv6 nd ra-interval 3
!
interface swp30
 no ipv6 nd suppress-ra
 ipv6 nd ra-interval 3
!
router bgp 65020
  bgp router-id 10.0.0.22
  network 10.0.0.22/32
  network 10.10.10.10/32
  bgp bestpath as-path multipath-relax
  bgp bestpath compare-routerid
  bgp default show-hostname  
  neighbor FABRIC peer-group
  neighbor FABRIC remote-as external
  neighbor FABRIC description Internal Fabric Network
  neighbor FABRIC advertisement-interval 0
  neighbor FABRIC timers 1 3
  neighbor FABRIC timers connect 3
  neighbor FABRIC capability extended-nexthop
  neighbor FABRIC prefix-list dc-spine in
  neighbor FABRIC prefix-list dc-spine out
  neighbor swp1 interface
  neighbor swp1 peer-group FABRIC
  neighbor swp2 interface
  neighbor swp2 peer-group FABRIC
  neighbor swp3 interface
  neighbor swp3 peer-group FABRIC
  neighbor swp4 interface
  neighbor swp4 peer-group FABRIC
  neighbor swp29 interface
  neighbor swp29 peer-group FABRIC  
  neighbor swp30 interface
  neighbor swp30 peer-group FABRIC  
!
ip prefix-list dc-spine seq 10 permit 0.0.0.0/0
ip prefix-list dc-spine seq 15 permit 10.0.0.0/24 le 32
ip prefix-list dc-spine seq 20 permit 10.10.10.0/24 le 32
ip prefix-list dc-spine seq 30 permit 172.16.1.0/24
ip prefix-list dc-spine seq 40 permit 172.16.2.0/24
ip prefix-list dc-spine seq 50 permit 172.16.3.0/24
ip prefix-list dc-spine seq 60 permit 172.16.4.0/24
ip prefix-list dc-spine seq 500 deny any
!

leaf01: /etc/quagga/Quagga.conf

!
interface swp51
 no ipv6 nd suppress-ra
 ipv6 nd ra-interval 3
!
interface swp52
 no ipv6 nd suppress-ra
 ipv6 nd ra-interval 3
!
router bgp 65011
  bgp router-id 10.0.0.11
  network 10.0.0.11/32 
  network 172.16.1.0/24
  network 10.10.10.20/32
  bgp bestpath as-path multipath-relax
  bgp bestpath compare-routerid
  bgp default show-hostname  
  neighbor FABRIC peer-group
  neighbor FABRIC remote-as external
  neighbor FABRIC description Internal Fabric Network
  neighbor FABRIC advertisement-interval 0
  neighbor FABRIC timers 1 3
  neighbor FABRIC timers connect 3
  neighbor FABRIC capability extended-nexthop
  neighbor FABRIC filter-list dc-leaf-out out
  neighbor swp51 interface
  neighbor swp51 peer-group FABRIC
  neighbor swp52 interface
  neighbor swp52 peer-group FABRIC
!
ip as-path access-list dc-leaf-out permit ^$
!

leaf02: /etc/quagga/Quagga.conf

!
interface swp51
 no ipv6 nd suppress-ra
 ipv6 nd ra-interval 3
!
interface swp52
 no ipv6 nd suppress-ra
 ipv6 nd ra-interval 3
!
router bgp 65012
  bgp router-id 10.0.0.12
  network 10.0.0.12/32
  network 172.16.1.0/24
  network 10.10.10.20/32
  bgp bestpath as-path multipath-relax
  bgp bestpath compare-routerid
  bgp default show-hostname  
  neighbor FABRIC peer-group
  neighbor FABRIC remote-as external
  neighbor FABRIC description Internal Fabric Network
  neighbor FABRIC advertisement-interval 0
  neighbor FABRIC timers 1 3
  neighbor FABRIC timers connect 3
  neighbor FABRIC capability extended-nexthop
  neighbor FABRIC filter-list dc-leaf-out out
  neighbor swp51 interface
  neighbor swp51 peer-group FABRIC
  neighbor swp52 interface
  neighbor swp52 peer-group FABRIC
!
ip as-path access-list dc-leaf-out permit ^$
!

leaf03: /etc/quagga/Quagga.conf

!
interface swp51
 no ipv6 nd suppress-ra
 ipv6 nd ra-interval 3
!
interface swp52
 no ipv6 nd suppress-ra
 ipv6 nd ra-interval 3
!
router bgp 65013
  bgp router-id 10.0.0.13
  network 10.0.0.13/32
  network 172.16.3.0/24
  network 10.10.10.30/32
  bgp bestpath as-path multipath-relax
  bgp bestpath compare-routerid
  bgp default show-hostname  
  neighbor FABRIC peer-group
  neighbor FABRIC remote-as external
  neighbor FABRIC description Internal Fabric Network
  neighbor FABRIC advertisement-interval 0
  neighbor FABRIC timers 1 3
  neighbor FABRIC timers connect 3
  neighbor FABRIC capability extended-nexthop
  neighbor FABRIC filter-list dc-leaf-out out
  neighbor swp51 interface
  neighbor swp51 peer-group FABRIC
  neighbor swp52 interface
  neighbor swp52 peer-group FABRIC
!
ip as-path access-list dc-leaf-out permit ^$
!

leaf04: /etc/quagga/Quagga.conf

!
interface swp51
 no ipv6 nd suppress-ra
 ipv6 nd ra-interval 3
!
interface swp52
 no ipv6 nd suppress-ra
 ipv6 nd ra-interval 3
!
router bgp 65014
  bgp router-id 10.0.0.14
  network 10.0.0.14/32
  network 172.16.3.0/24
  network 10.10.10.30/32
  bgp bestpath as-path multipath-relax
  bgp bestpath compare-routerid
  bgp default show-hostname  
  neighbor FABRIC peer-group
  neighbor FABRIC remote-as external
  neighbor FABRIC description Internal Fabric Network
  neighbor FABRIC advertisement-interval 0
  neighbor FABRIC timers 1 3
  neighbor FABRIC timers connect 3
  neighbor FABRIC capability extended-nexthop
  neighbor FABRIC filter-list dc-leaf-out out
  neighbor swp51 interface
  neighbor swp51 peer-group FABRIC
  neighbor swp52 interface
  neighbor swp52 peer-group FABRIC
!
ip as-path access-list dc-leaf-out permit ^$
!

Host Configuration

In this example, the servers are running Ubuntu 14.04. A layer2 bond must be mapped from server01 and server03 to the respective switch. In Ubuntu this is done with subinterfaces.

server01

auto lo
iface lo inet loopback

auto lo
iface lo inet static
  address 10.0.0.31/32
  
auto eth0
iface eth0 inet dhcp

auto eth1
iface eth1 inet manual
    bond-master bond0
        
auto eth2
iface eth2 inet manual
    bond-master bond0
    
auto bond0
iface bond0 inet static
  bond-slaves none
  bond-miimon 100
  bond-min-links 1
  bond-mode 802.3ad
  bond-xmit-hash-policy layer3+4
  bond-lacp-rate 1
  address 172.16.1.101/24

auto bond0.10
iface bond0.10 inet static
  address 172.16.10.101/24
  
auto bond0.20
iface bond0.20 inet static
  address 172.16.20.101/24

server03

auto lo
iface lo inet loopback

auto lo
iface lo inet static
  address 10.0.0.33/32
  
auto eth0
iface eth0 inet dhcp

auto eth1
iface eth1 inet manual
    bond-master bond0
        
auto eth2
iface eth2 inet manual
    bond-master bond0
    
auto bond0
iface bond0 inet static
  bond-slaves none
  bond-miimon 100
  bond-min-links 1
  bond-mode 802.3ad
  bond-xmit-hash-policy layer3+4
  bond-lacp-rate 1
  address 172.16.1.103/24

auto bond0.10
iface bond0.10 inet static
  address 172.16.10.103/24
  
auto bond0.20
iface bond0.20 inet static
  address 172.16.20.103/24

Enable the Registration Daemon

The registration daemon (vxrd) must be enabled on each ToR switch acting as a VTEP, that is participating in LNV. The daemon is installed by default.

  1. Open the /etc/default/vxrd configuration file in a text editor.
  2. Enable the daemon, then save the file.

    START=yes
  3. Restart the vxrd daemon.

    cumulus@leaf:~$ sudo systemctl restart vxrd.service

Configuring a VTEP

The registration node was configured earlier in /etc/network/interfaces; no additional configuration is typically needed. Alternatively, the configuration can be done in /etc/vxrd.conf, which has additional configuration knobs available.

Enable the Service Node Daemon

  1. Open the /etc/default/vxsnd configuration file in a text editor.
  2. Enable the daemon, then save the file:

    START=yes
  3. Restart the daemon.

    cumulus@spine:~$ sudo systemctl restart vxsnd.service

Configuring the Service Node

To configure the service node daemon, edit the  /etc/vxsnd.conf  configuration file:

spine01: /etc/vxsnd.conf

svcnode_ip = 10.10.10.10
 
src_ip = 10.0.0.21
 
svcnode_peers = 10.0.0.21 10.0.0.22
 Full configuration of vxsnd.conf
[common]
# Log level is one of DEBUG, INFO, WARNING, ERROR, CRITICAL
#loglevel = INFO

# Destination for log message.  Can be a file name, 'stdout', or 'syslog'
#logdest = syslog

# log file size in bytes. Used when logdest is a file
#logfilesize = 512000

# maximum number of log files stored on disk. Used when logdest is a file
#logbackupcount = 14

# The file to write the pid. If using monit, this must match the one
# in the vxsnd.rc
#pidfile = /var/run/vxsnd.pid

# The file name for the unix domain socket used for mgmt.
#udsfile = /var/run/vxsnd.sock

# UDP port for vxfld control messages
#vxfld_port = 10001

# This is the address to which registration daemons send control messages for
# registration and/or BUM packets for replication
svcnode_ip = 10.10.10.10

# Holdtime (in seconds) for soft state. It is used when sending a
# register msg to peers in response to learning a <vni, addr> from a
# VXLAN data pkt
#holdtime = 90

# Local IP address to bind to for receiving inter-vxsnd control traffic
src_ip = 10.0.0.21

[vxsnd]
# Space separated list of IP addresses of vxsnd to share state with
svcnode_peers = 10.0.0.21 10.0.0.22

# When set to true, the service node will listen for vxlan data traffic
# Note: Use 1, yes, true, or on, for True and 0, no, false, or off,
# for False
#enable_vxlan_listen = true

# When set to true, the svcnode_ip will be installed on the loopback
# interface, and it will be withdrawn when the vxsnd is no longer in
# service.  If set to true, the svcnode_ip configuration
# variable must be defined.
# Note: Use 1, yes, true, or on, for True and 0, no, false, or off,
# for False
#install_svcnode_ip = false

# Seconds to wait before checking the database to age out stale entries
#age_check = 90

spine02: /etc/vxsnd.conf

svcnode_ip = 10.10.10.10

src_ip = 10.0.0.22

svcnode_peers = 10.0.0.21 10.0.0.22
 Full configuration of vxsnd.conf
[common]
# Log level is one of DEBUG, INFO, WARNING, ERROR, CRITICAL
#loglevel = INFO

# Destination for log message.  Can be a file name, 'stdout', or 'syslog'
#logdest = syslog

# log file size in bytes. Used when logdest is a file
#logfilesize = 512000

# maximum number of log files stored on disk. Used when logdest is a file
#logbackupcount = 14

# The file to write the pid. If using monit, this must match the one
# in the vxsnd.rc
#pidfile = /var/run/vxsnd.pid

# The file name for the unix domain socket used for mgmt.
#udsfile = /var/run/vxsnd.sock

# UDP port for vxfld control messages
#vxfld_port = 10001

# This is the address to which registration daemons send control messages for
# registration and/or BUM packets for replication
svcnode_ip = 10.10.10.10

# Holdtime (in seconds) for soft state. It is used when sending a
# register msg to peers in response to learning a <vni, addr> from a
# VXLAN data pkt
#holdtime = 90

# Local IP address to bind to for receiving inter-vxsnd control traffic
src_ip = 10.0.0.22

[vxsnd]
# Space separated list of IP addresses of vxsnd to share state with
svcnode_peers = 10.0.0.21 10.0.0.22

# When set to true, the service node will listen for vxlan data traffic
# Note: Use 1, yes, true, or on, for True and 0, no, false, or off,
# for False
#enable_vxlan_listen = true

# When set to true, the svcnode_ip will be installed on the loopback
# interface, and it will be withdrawn when the vxsnd is no longer in
# service.  If set to true, the svcnode_ip configuration
# variable must be defined.
# Note: Use 1, yes, true, or on, for True and 0, no, false, or off,
# for False
#install_svcnode_ip = false

# Seconds to wait before checking the database to age out stale entries
#age_check = 90


 

Considerations for Virtual Topologies Using Cumulus VX

Node ID

vxrd requires a unique node_id for each individual switch. This node_id is based off of the first interface's MAC address; when using certain virtual topologies like Vagrant, both leaf switches within an MLAG pair can generate the same exact unique node_id. One of the node_ids must then be configured manually (or make sure the first interface always has a unique MAC address), as they are not unique.

To verify the node_id that gets configured by your switch, use the vxrdctl get config command:

cumulus@leaf01$ vxrdctl get config
{
    "concurrency": 1000,
    "config_check_rate": 60,
    "debug": false,
    "eventlet_backdoor_port": 9000,
    "head_rep": true,
    "holdtime": 90,
    "logbackupcount": 14,
    "logdest": "syslog",
    "logfilesize": 512000,
    "loglevel": "INFO",
    "max_packet_size": 1500,
    "node_id": 13,
    "pidfile": "/var/run/vxrd.pid",
    "refresh_rate": 3,
    "src_ip": "10.2.1.50",
    "svcnode_ip": "10.10.10.10",
    "udsfile": "/var/run/vxrd.sock",
    "vxfld_port": 10001
}

To set the node_id manually:

  1. Open /etc/vxrd.conf in a text editor.
  2. Set the node_id value within the common section, then save the file:

    [common]
    node_id = 13

Ensure that each leaf has a separate node_id so that LNV can function correctly.

Bonds with Vagrant

Bonds (or LACP Etherchannels) fail to work in a Vagrant setup unless the link is set to promiscuous mode.  This is a limitation on virtual topologies only, and is not needed on real hardware.

auto swp49
iface swp49
  #for vagrant so bonds work correctly
  post-up ip link set $IFACE promisc on

auto swp50
iface swp50
  #for vagrant so bonds work correctly
  post-up ip link set $IFACE promisc on

For more information on using Cumulus VX and Vagrant, refer to the Cumulus VX documentation.

Troubleshooting with LNV Active-Active

In addition to the troubleshooting for single-attached LNV, there is now the MLAG daemon (clagd) to consider.  The clagctl command gives the output of MLAG behavior and any inconsistencies that may arise between a MLAG pair.

cumulus@leaf01$ clagctl
The peer is alive
     Our Priority, ID, and Role: 32768 44:38:39:00:00:35 primary
    Peer Priority, ID, and Role: 32768 44:38:39:00:00:36 secondary
          Peer Interface and IP: peerlink.4094 169.254.1.2
               VxLAN Anycast IP: 10.10.10.30
                      Backup IP: 10.0.0.14 (inactive)
                     System MAC: 44:38:39:ff:40:95
CLAG Interfaces
Our Interface      Peer Interface     CLAG Id   Conflicts              Proto-Down Reason
----------------   ----------------   -------   --------------------   -----------------
           bond0   bond0              1         -                      -
         vxlan20   vxlan20            -         -                      -
          vxlan1   vxlan1             -         -                      -
         vxlan10   vxlan10            -         -                      -

The additions to normal MLAG behavior are the following:

OutputExplanation
VXLAN Anycast IP: 10.10.10.30The anycast IP address being shared by the MLAG pair for VTEP termination is in use and is 10.10.10.30.
Conflicts: -
There are no conflicts for this MLAG Interface.
Proto-Down Reason: -
The VXLAN is up and running (there is no Proto-Down).

In the next example the vxlan-id on VXLAN10 was switched to the wrong vxlan-id. When the clagctl command is run, you will see that VXLAN10 went down because this switch was the secondary switch and the peer switch took control of VXLAN. The reason code is vxlan-single indicating that there is a vxlan-id mis-match on VXLAN10

cumulus@leaf02$ clagctl
The peer is alive
    Peer Priority, ID, and Role: 32768 44:38:39:00:00:11 primary
     Our Priority, ID, and Role: 32768 44:38:39:00:00:12 secondary
          Peer Interface and IP: peerlink.4094 169.254.1.1
               VxLAN Anycast IP: 10.10.10.20
                      Backup IP: 10.0.0.11 (inactive)
                     System MAC: 44:38:39:ff:40:94
CLAG Interfaces
Our Interface      Peer Interface     CLAG Id   Conflicts              Proto-Down Reason
----------------   ----------------   -------   --------------------   -----------------
           bond0   bond0              1         -                      -
         vxlan20   vxlan20            -         -                      -
          vxlan1   vxlan1             -         -                      -
         vxlan10   -                  -         -                      vxlan-single

Caveats and Errata

  • The VLAN used for the peer link layer 3 subinterface should not be reused for any other interface in the system. A high VLAN ID value is recommended. For more information on VLAN ID ranges, refer to the section above.
  • Active-active mode only works with LNV in this release. Integration with controller-based VXLANs such as VMware NSX and Midokura MidoNet will be supported in the future.

Related Information