VXLAN routing, sometimes referred to as inter-VXLAN routing, provides IP routing between VXLAN VNIs in overlay networks. The routing of traffic is based on the inner header or the overlay tenant IP address.
Because VXLAN routing is fundamentally routing, it is most commonly deployed with a control plane, such as Ethernet Virtual Private Network (EVPN). You can set up static routing too, either with or without the Cumulus Lightweight Network Virtualization (LNV) for MAC distribution and BUM handling.
This topic describes the platform and hardware considerations for VXLAN routing. For a detailed description of different VXLAN routing models and configuration examples, refer to EVPN.
VXLAN routing supports full layer 3 multi-tenancy; all routing occurs in the context of a VRF. Also, VXLAN routing is supported for dual-attached hosts where the associated VTEPs function in active-active mode.
The following chipsets support VXLAN routing:
- Broadcom Trident II+, Trident3, and Maverick
- Broadcom Tomahawk and Tomahawk+, using an internal loopback on one or more switch ports
- Broadcom Trident II, static VXLAN routing only, using an external loopback on one or more switch ports
- Mellanox Spectrum
- Using ECMP with VXLAN routing is supported only on RIOT-capable Broadcom switches (Trident 3, Maverick, Trident 2+) in addition to Tomahawk, Tomahawk+ and Mellanox Spectrum-A1 switches.
- For additional restrictions and considerations for VXLAN routing with EVPN, refer to the EVPN chapter.
VXLAN Routing Data Plane and the Broadcom Trident II+, Trident3, Maverick, Tomahawk, and Tomahawk+ Platforms
Trident II+, Trident3, and Maverick
The Trident II+, Trident3, and Maverick ASICs provide native support for VXLAN routing, also referred to as Routing In and Out of Tunnels (RIOT).
You can specify a VXLAN routing profile in the
vxlan_routing_overlay.profile field of the
/usr/lib/python2.7/dist-packages/cumulus/__chip_config/bcm/datapath.conf file to control the maximum number of overlay next hops (adjacency entries). The profile is one of the following:
- default: 15% of the underlay next hops are set apart for overlay (8k next hops are reserved)
- mode-1: 25% of the underlay next hops are set apart for overlay
- mode-2: 50% of the underlay next hops are set apart for overlay
- mode-3: 80% of the underlay next hops are set apart for overlay
- disable: disables VXLAN routing
The following shows an example of the VXLAN Routing Profile section of the
datapath.conf file where the default profile is enabled.
The Trident II+ and Trident3 ASICs support a maximum of 48k underlay next hops.
For any profile you specify, you can allocate a maximum of 2K (2048) VXLAN SVI interfaces.
To disable the VXLAN routing capability on a Trident II+ or Trident3 switch, set the
vxlan_routing_overlay.profile field to disable.
Tomahawk and Tomahawk+
The Tomahawk and Tomahawk+ ASICs do not support RIOT natively; you must configure the switch ports for VXLAN routing to use internal loopback (also referred to as internal hyperloop). The internal loopback facilitates the recirculation of packets through the ingress pipeline to achieve VXLAN routing.
For routing into a VXLAN tunnel, the first pass of the ASIC performs routing and routing rewrites of the packet MAC source and destination address and VLAN, then packets recirculate through the internal hyperloop for VXLAN encapsulation and underlay forwarding on the second pass.
For routing out of a VXLAN tunnel, the first pass performs VXLAN decapsulation, then packets recirculate through the hyperloop for routing on the second pass.
You only need to configure a number of switch ports that must be in internal loopback mode based on the amount of bandwidth required. No additional configuration is necessary.
To configure one or more switch ports for loopback mode, edit the
/etc/cumulus/ports.conf file and change the port speed to loopback. In the example below, swp8 and swp9 are configured for loopback mode:
After you save your changes to the
ports.conf file, restart
switchd for the changes to take effect.
VXLAN Routing Data Plane and Broadcom Trident II Platforms
The Trident II ASIC does not support RIOT natively or VXLAN routing using internal loopback. To achieve VXLAN routing in a deployment using Trident II switches, use an external gateway. For routing without an external gateway, you must loopback one or more switch ports using an external loopback cable. This is also referred to as external hyperloop.
On Broadcom Trident II switches, only static VXLAN routing is supported with the use of external loopback.
External hyperloop is set up so that the port at one end of the loopback is a layer 2 port attached to the bridge while the port at the other end is configured with a layer 3 interface. The layer 3 interface is configured with the gateway IP address for the corresponding VLAN/VNI. Traffic exiting a VXLAN tunnel is bridged out the layer 2 port if it needs to be routed (exactly as it would if it were going to an external gateway) but at the other end, because traffic is addressed to the gateway IP address, it gets regular routing treatment. For redundancy and increased bandwidth, two or more pairs of ports are typically put into an external hyperloop and bonded together.
The following diagram illustrates the configuration and operation of an external hyperloop.
In the above diagram, VTEPs exit01 and exit02 are acting as VXLAN layer 3 gateways. On exit01, two pairs of ports are externally looped back (swp45, swp46) and (swp47, swp48). The ports swp46 and swp48 are bonded together and act as the layer 2 end; therefore, this bond interface (named inside) is a member of the bridge. The ports swp45 and swp47 are bonded together (named outside) and act as the layer 3 end with SVIs configured for VLANs 100 and 200 with the corresponding gateway IP addresses. Because the two layer 3 gateways are in an MLAG configuration, they use a virtual IP address as the gateway IP. The relevant interface configuration on exit01 is as follows:
For the external hyperloop to work correctly, you must configure the following
After you save your changes to the
switchd.conf file, restart
switchd for the change to take effect.
hal.bcm.per_vlan_router_mac_lookup = TRUE limits the Trident II switch to a configurable 512 local IP addresses (SVIs and so on). Use this only as a last resort. This is only a limitation on this specific ASIC.
VXLAN Routing Data Plane and the Mellanox Spectrum Platform
There is no special configuration required for VXLAN routing on the Mellanox Spectrum platform.