This documentation is for an older version of the software. If you are using the current version of Cumulus Linux, this content may not be up to date. The current version of the documentation is available here. If you are redirected to the main page of the user guide, then this page may have been renamed; please search for it there.

Protocol Independent Multicast - PIM

Protocol Independent Multicast (PIM) is a multicast control plane protocol, that advertises multicast sources and receivers over a routed layer 3 network. Layer 3 multicast relies on PIM to advertise information about multicast capable routers and the location of multicast senders and receivers. For this reason, multicast cannot be sent through a routed network without PIM.

PIM has two modes of operation: Sparse Mode (PIM-SM) and Dense Mode (PIM-DM).

Cumulus Linux only supports PIM Sparse Mode.

PIM Overview

/images/download/attachments/7112639/05roles.png
Network ElementDescription
First Hop Router (FHR)The FHR is the first router attached closest to the source. The FHR is responsible for the PIM register process. Each multicast source will have a single FHR.
Last Hop Router (LHR)The LHR is the last router in the path, attached to an interested multicast receiver. There is a single LHR for each network subnet with an interested receiver, however multicast groups can have multiple LHRs throughout the network.
Rendezvous Point (RP)The RP allows for the discovery of multicast sources and multicast receivers. The RP is responsible for sending PIM Register Stop messages to FHRs.
PIM Shared Tree (RP Tree) or (*,G) TreeThe Shared Tree is the multicast tree rooted at the RP. When receivers wish to join a multicast group, messages are sent along the shared tree towards the RP.
PIM Shortest Path Tree (SPT) or (S,G) TreeThe SPT is the multicast tree rooted at the multicast source for a given group. Each multicast source will have a unique SPT. The SPT may match the RP Tree, but this is not a requirement. The SPT represents the most efficient way to send multicast traffic from a source to the interested receivers.
Outgoing Interface (OIF)The Outgoing interface indicates the interface a PIM or multicast packet should be sent on. OIFs are the interfaces towards the multicast receivers.
Incoming Interface (IIF)The Incoming Interface indicates the interface a PIM or multicast packet should be received on. IIFs can be the interfaces towards the multicast, or towards the RP.
Reverse Path Forwarding Interface (RPF Interface)Reverse Path Forwarding is the unicast route towards a source or receiver.
Multicast Route (mroute)A multicast route indicates the multicast source and multicast group as well as associated OIFs, IIFs, and RPF information.
Star-G mroute (*,G)The (*,G) mroute represents the RP Tree. The * is a wildcard indicating any multicast source. The G is the multicast group. An example (*,G) would be (*, 239.1.2.9).
S-G mroute (S,G)This is the mroute representing the SPT. The S is the multicast source IP. The G is the multicast group. An example (S,G) would be (10.1.1.1, 239.1.2.9).

PIM Messages

PIM Message

Description

PIM Hello

PIM hellos announce the presence of a multicast router on a segment. PIM hellos are sent every 30 seconds by default.

22.1.2.2 > 224.0.0.13: PIMv2, length 34
 Hello, cksum 0xfdbb (correct)
 Hold Time Option (1), length 2, Value: 1m45s
 0x0000: 0069
 LAN Prune Delay Option (2), length 4, Value: 
 T-bit=0, LAN delay 500ms, Override interval 2500ms
 0x0000: 01f4 09c4
 DR Priority Option (19), length 4, Value: 1
 0x0000: 0000 0001
 Generation ID Option (20), length 4, Value: 0x2459b190
 0x0000: 2459 b190

PIM Join/Prune (J/P)

PIM J/P messages indicate the groups that a multicast router would like to receive or no longer receive. Often PIM Join/Prune messages are described as distinct message types, but are actually a single PIM message with a list of groups to join and a second list of groups to leave. PIM J/P messages can be to join or prune from the SPT or RP trees (also called (*,G) Joins or (S,G) Joins).

PIM Join/Prune messages are sent to PIM neighbors on individual interfaces. Join/Prune messages are never unicast.

/images/download/attachments/7112639/pim-join-prune.png

This PIM Join/Prune is for group 239.1.1.9, with 1 Join and 0 Prunes for the group. Join/Prunes for multiple groups can exist in a single packet.

21:49:59.470885 IP (tos 0x0, ttl 255, id 138, offset 0, flags [none], proto PIM (103), length 54)
 22.1.2.2 > 224.0.0.13: PIMv2, length 34
 Join / Prune, cksum 0xb9e5 (correct), upstream-neighbor: 22.1.2.1
 1 group(s), holdtime: 3m30s
 group #1: 225.1.0.0, joined sources: 0, pruned sources: 1
 pruned source #1: 33.1.1.1(S)

PIM Register

PIM register messages are unicast packets sent from a FHR destined to the RP to advertise a new multicast group. The FHR fully encapsulates the original multicast packet in a PIM register messages. The RP is responsible for decapsulating the PIM register message and forwarding it along the (*,G) tree towards the receivers.

PIM Null Register

PIM Null Register is a special type of PIM Register message where the "Null-Register" flag is set within the packet. Null Register messages are used for a FHR to signal to an RP that a source is still sending multicast traffic. Unlike normal PIM Register messages Null Register messages do not encapsulate the original data packet.

PIM Register Stop

PIM Register Stop messages are sent by an RP to the FHR to indicate that PIM Register messages should no longer be sent.

21:37:00.419379 IP (tos 0x0, ttl 255, id 24, offset 0, flags [none], proto PIM (103), length 38)
 100.1.2.1 > 33.1.1.10: PIMv2, length 18
 Register Stop, cksum 0xd8db (correct) group=225.1.0.0 source=33.1.1.1

IGMP Membership Report (IGMP Join)

IGMP Membership Reports are sent by multicast receivers to tell multicast routers of their interest in a specific multicast group. IGMP Join messages trigger PIM *,G Joins. IGMP version 2 messages are sent to the All Hosts multicast address, 224.0.0.1. IGMP version 3 messages are sent to an IGMP v3 specific multicast address, 224.0.0.22.

IGMP Leave

IGMP Leaves tell a multicast router that a multicast receiver no longer wants the multicast group. IGMP Leave messages trigger PIM *,G Prunes.

PIM Neighbors

When PIM is configured on an interface, PIM Hello messages are sent to the link local multicast group 224.0.0.13. Any other router configured with PIM on the segment that hears the PIM Hello messages will build a PIM neighbor with the sending device.

PIM neighbors are stateless. No confirmation of neighbor relationship is exchanged between PIM endpoints.

PIM Sparse Mode (PIM-SM)

PIM Sparse Mode (PIM-SM) is a “pull” multicast distribution method. This means that multicast traffic is only sent through the network if receivers explicitly ask for it. When a receiver “pulls” multicast traffic, the network must be periodically notified that the receiver wishes to continue the multicast stream.

This behavior is in contrast to PIM Dense Mode (PIM-DM), where traffic is flooded, and the network must be periodically notified that the receiver wishes to stop receiving the multicast stream.

PIM-SM has three configuration options: Any-source Multicast (ASM), Bi-directional Multicast (BiDir), and Source Specific Multicast (SSM):

  • Any-source Mulitcast (ASM) is the traditional, and most commonly deployed PIM implementation. ASM relies on Rendevous Points to connect multicast senders and receivers that then dynamically determine the shortest path through the network between source and receiver, to efficiently send multicast traffic.
  • Bidirectional PIM (BiDir) forwards all traffic through the multicast Rendezvous Point (RP), rather than tracking multicast source IPs, allowing for greater scale, while resulting in inefficient forwarding of network traffic.
  • Source Specific Multicast (SSM) requires multicast receivers to know exactly which source they wish to receive multicast traffic from, rather than relying on multicast Rendezvous Points. SSM requires the use of IGMPv3 on the multicast clients.

Cumulus Linux only supports ASM and SSM. PIM BiDir is not currently supported.

Any-source Multicast Routing

Multicast routing behaves differently depending on whether the source is sending before receivers request the multicast stream, or if a receiver tries to join a stream before there are any sources.

Receiver Joins First

When a receiver joins a group, an IGMP Membership Join message is sent to the IGMPv3 multicast group, 224.0.0.22. The PIM multicast router for the segment, listening to the IGMPv3 group, receives the IGMP Membership Join message, and becomes an LHR for this group.

/images/download/attachments/7112639/06igmp.png

This creates a (*,G) mroute, with an OIF of the interface on which the IGMP Membership Report was received and an IIF of the RPF interface for the RP.

The LHR generates a PIM (*,G) Join message, and sends it from the interface towards the RP. Each multicast router between the LHR and the RP will build a (*,G) mroute with the OIF being the interface on which the PIM Join message was received and an Incoming Interface of the Reverse Path Forwarding interface for the RP.

/images/download/attachments/7112639/07pimjoin.png

When the RP receives the (*,G) Join message, it will not send any additional PIM Join messages. The RP will maintain a (*,G) state as long as the receiver wishes to receive the multicast group.

Unlike multicast receivers, multicast sources do not send IGMP (or PIM) messages to the FHR. A multicast source begins sending and the FHR will receive the traffic and build both a (*,G) and an (S,G) mroute. The FHR will then begin the PIM Register process.

PIM Register Process

When a First Hop Router (FHR) receives a multicast data packet from a source, the FHR does not know if there are any interested multicast receivers in the network. The FHR encapsulates the data packet in a unicast PIM register message. This packet is sourced from the FHR and destined to the RP address. The RP will build an (S,G) mroute and decapsulate the multicast packet and forward it along the (*,G) tree.

As the unencapsulated multicast packet travels down the (*,G) tree towards the interested receivers. At the same time, the RP will send a PIM (S,G) Join towards the FHR. This will build an (S,G) state on each multicast router between the RP and FHR.

When the FHR receives a PIM (S,G) Join, it will continue encapsulating and sending PIM Register messages, but will also make a copy of the packet and send it along the (S,G) mroute.

The RP then receives the multicast packet along the (S,G) tree and sends a PIM Register Stop to the FHR to end the register process.

/images/download/attachments/7112639/08data.png /images/download/attachments/7112639/09register.png
PIM SPT Switchover

When the LHR receives the first multicast packet, in order to efficiently forward traffic through the network, it will send a PIM (S,G) Join towards the FHR. This builds the Shortest Path Tree (SPT), or the tree that is the shortest path to the source.

When the traffic arrives over the SPT, a PIM (S,G) RPT Prune will be sent up the Shared Tree towards the RP. This removes multicast traffic from the shared tree; multicast data will only be sent over the SPT.

The LHR will now send both (*,G) Joins and (S,G) RPT Prune messages towards the RP.

  1. Create the necessary prefix-lists using the FRRouting CLI:

    cumulus@switch:~$ sudo vtysh
    switch# configure terminal
    switch(config)# ip prefix-list ssm-range permit 232.0.0.0/8 ge 32
    switch(config)# ip prefix-list ssm-range permit 238.0.0.0/8 ge 32
    
  2. Configure SPT switchover for the ssm-range prefix-list:

    switch(config)# ip pim spt-switchover infinity-and-beyond prefix-list ssm-range
    switch(config)# ip prefix-list ssm-range seq 5 deny 238.0.0.0/32
    

The configured prefix-list can be viewed with the show ip mroute command:

cumulus@switch:~$ show ip mroute
Source          Group           Proto  Input      Output     TTL  Uptime
*               232.0.0.0       IGMP   swp31s0    pimreg     1    00:03:3
                                IGMP              br1        1    00:03:38
*               238.0.0.0       IGMP   swp31s0    br1        1    00:02:08

In the example above, 232.0.0.0 has been configured for SPT switchover, identified by pimreg.

Sender Starts Before Receivers Join

As previously mentioned, a multicast sender can send multicast data without any additional IGMP or PIM signaling. When the FHR receives the multicast traffic it will encapsulate it and send a PIM Register to the Rendezvous Point (RP).

When the RP receives the PIM Register, it will build an (S,G) mroute; however, there is no (*,G) mroute and no interested receivers.

The RP will drop the PIM Register message and immediately send a PIM Register Stop message to the FHR.

Receiving a PIM Register Stop without any associated PIM Joins leaves the FHR without any outgoing interfaces. The FHR will drop this multicast traffic until a PIM Join is received.

PIM Register messages are sourced from the interface that received the multicast traffic and are destined to the RP address. The PIM Register is not sourced from the interface towards the RP.

PIM Null-Register

In order to notify the RP that multicast traffic is still flowing when the RP has no receiver, or if the RP is not on the SPT tree, the FHR will periodically send PIM Null Register messages. The FHR sends a PIM Register with the Null-Register flag set, but without any data. This special PIM Register notifies the RP that a multicast source is still sending, should any new receivers come online.

After receiving a PIM Null-Register, the RP immediately sends a PIM Register Stop to acknowledge the reception of the PIM Null Register message.

PIM and ECMP

PIM uses the RPF procedure to choose an upstream interface in order to build a forwarding state. If equal-cost multipaths (ECMP) are configured, PIM can use choose the RPF based on ECMP using hash algorithms.

The ip pim ecmp command enables PIM to use all the available nexthops for the installation of mroutes. For example, if you have 4-way ECMP, PIM spreads the S,G and *,G mroutes across the 4 different paths.

cumulus@switch:~$ sudo vtysh
switch# configure terminal
switch(config)# ip pim ecmp

The ip pim ecmp rebalance command rebalances the traffic across the paths, allowing for all RPFs to be reconfigured, even if their nexthop is still valid.

cumulus@switch:~$ sudo vtysh
switch# configure terminal
switch(config)# ip pim ecmp rebalance

The show ip pim nexthop provides you with a way to review which nexthop will be selected for a specific source/group:

cumulus@switch:~$ sudo vtysh
switch# show ip pim nexthop 
Number of registered addresses: 3 
Address         Interface      Nexthop
-------------------------------------------
6.0.0.9         swp31s0        169.254.0.9 
6.0.0.9         swp31s1        169.254.0.25 
6.0.0.11        lo             0.0.0.0 
6.0.0.10        swp31s0        169.254.0.9 
6.0.0.10        swp31s1        169.254.0.25 

Configuration

Getting Started

PIM is included in the FRRouting package. To configure PIM on a switch:

  1. Open /etc/frr/daemons in a text editor.

  2. Add the following line to the end of the file to enable pimd, and save the file:

     zebra=yes
     pimd=yes
    
  3. Run the systemctl restart command to restart FRRouting:

     cumulus@switch:~$ sudo systemctl restart frr
    
  4. In a terminal, run the vtysh command to start the FRRouting CLI on the switch.

     cumulus@switch:~$ sudo vtysh
     cumulus# 
    
  5. Run the following commands to configure the PIM interfaces:

     cumulus# configure terminal
     cumulus(config)# int swp1
     cumulus(config-if)# ip pim sm
    

    PIM must be enabled on all interfaces facing multicast sources or multicast receivers, as well as on the interface where the RP address is configured.

  6. Run the following commands to enable IGMP (either version 2 or 3) on the interfaces with hosts attached. IGMP version 3 is the default, so you only need to specify the version if you want to use IGMP version 2: cumulus# configure terminal cumulus(config)# int swp1 cumulus(config-if)# ip igmp cumulus(config-if)# ip igmp version 2 #skip this step if you are using version 3

    IGMP must be configured on all interfaces where multicast receivers exist.

  7. Configure a group mapping for a static RP:

     cumulus# configure terminal 
     cumulus(config)# ip pim rp 192.168.0.1
    

    Each PIM-SM enabled device must configure a static RP to a group mapping, and all PIM-SM enabled devices must have the same RP to group mapping configuration.

Complete Multicast Network Configuration Example

The following is example configuration:

RP# show run
Building configuration...
Current configuration:
!
log syslog
ip multicast-routing
ip pim rp 192.168.0.1 224.0.0.0/4
username cumulus nopassword
!
!
interface lo
 description RP Address interface
 ip ospf area 0.0.0.0
 ip pim sm
!
interface swp1
 description interface to FHR
 ip ospf area 0.0.0.0
 ip ospf network point-to-point
 ip pim sm
!
interface swp2
 description interface to LHR
 ip ospf area 0.0.0.0
 ip ospf network point-to-point
 ip pim sm
!
router ospf
 ospf router-id 192.168.0.1
!
line vty
!
end

FHR# show run
!
log syslog
ip multicast-routing
ip pim rp 192.168.0.1 224.0.0.0/4
username cumulus nopassword
!
interface bridge10.1
 description Interface to multicast source
 ip ospf area 0.0.0.0
 ip ospf network point-to-point
 ip pim sm
!
interface lo
 ip ospf area 0.0.0.0
 ip pim sm
!
interface swp49
 description interface to RP
 ip ospf area 0.0.0.0
 ip ospf network point-to-point
 ip pim sm
!
interface swp50
 description interface to LHR
 ip ospf area 0.0.0.0
 ip ospf network point-to-point
 ip pim sm
!
router ospf
 ospf router-id 192.168.1.1
!
line vty
!
end

LHR# show run
!
log syslog
ip multicast-routing
ip pim rp 192.168.0.1 224.0.0.0/4
username cumulus nopassword
!
interface bridge10.1
 description interface to multicast receivers
 ip igmp
 ip ospf area 0.0.0.0
 ip ospf network point-to-point
 ip pim sm
!
interface lo
 ip ospf area 0.0.0.0
 ip pim sm
!
interface swp49
 description interface to RP
 ip ospf area 0.0.0.0
 ip ospf network point-to-point
 ip pim sm
!
interface swp50
 description interface to FHR
 ip ospf area 0.0.0.0
 ip ospf network point-to-point
 ip pim sm
!
router ospf
 ospf router-id 192.168.2.2
!
line vty
!
end

Source Specific Multicast Mode (SSM)

The source-specific multicast method uses prefix-lists to configure a receiver to only allow traffic to a multicast address from a single source. This removes the need for an RP, as the source must be known before traffic can be accepted. The default range is 232.0.0.0/8, and be configured by setting a prefix-list.

The example process below configures a prefix-list named ssm-range, and prefix-lists permitting traffic from 230.0.0.0/8 and 238.0.0.0/8, for prefixes longer than 32.

PIM considers 232.0.0.0/8 the default range if the ssm range is not configured. If this default is overridden with a prefix-list, all ranges that should be considered must be in the prefix-list

cumulus@switch:~$ net add routing prefix-list ipv4 ssm-range permit 232.0.0.0/8 ge 32
cumulus@switch:~$ net add routing prefix-list ipv4 ssm range permit 
cumulus@switch:~$ net add pim ssm prefix-list ssm-range
cumulus@switch:~$ net pending
cumulus@switch:~$ net commit

This configuration can also be done with the FRRouting CLI:

cumulus@switch:~$ sudo vtysh
switch# conf t
switch(config)# ip prefix-list ssm-range seq 5 permit 238.0.0.0/8 ge 32
switch(config)# ip pim ssm prefix-list ssm-range
switch(config)# exit
switch# write mem

To view the existing prefix-lists, use the net show ip command:

cumulus@switch:~S net show ip prefix-list ssm-range
ZEBRA: ip prefix-list ssm-range: 2 entries
    seq 5 permit 232.0.0.0/8 ge 32
    seq 10 permit 238.0.0.0/8 ge 32
OSPF: ip prefix-list ssm-range: 2 entries
    seq 5 permit 232.0.0.0/8 ge 32
    seq 10 permit 238.0.0.0/8 ge 32
PIM: ip prefix-list ssm-range: 2 entries
    seq 5 permit 232.0.0.0/8 ge 32
    seq 10 permit 238.0.0.0/8 ge 32

Multicast Source Discovery Protocol (MSDP)

The Multicast Source Discovery Protocol (MSDP) is used to connect multiple PIM-SM multicast domains together, using the PIM-SM RPs. By configuring any cast RPs with the same IP address on multiple multicast switches (primarily on the loopback interface), the PIM-SM limitation of only one RP per multicast group is relaxed. This allows for an increase in both failover and load-balancing throughout.

When an RP discovers a new source (typically a PIM-SM register message via TCP), a source-active (SA) message is sent to each MSDP peer. The peer then determines if any receivers are interested.

Cumulus Linux MSDP support is primarily for anycast-RP configuration, rather than multiple multicast domains. Each MSDP peer must be configured in a full mesh, as SA messages are not received and re-forwarded.

Cumulus Linux currently only supports one MSDP mesh-group.

Configuration

The steps below cover configuring a Cumulus switch to use the MSDP

  1. Add an anycast IP address to the loopback interface for each RP in the domain:

     cumulus@switch:$ net add loopback lo ip address 10.1.1.1/32
     cumulus@switch:$ net add loopback lo ip address 10.1.1.100/32
     cumulus@switch:$ net pending
     cumulus@switch:$ net commit
    
  2. On every multicast switch, configure the group to RP mapping using the anycast address:

     cumulus@switch:$ net add pim rp 100.1.1.100 224.0.0.0/4
     cumulus@switch:$ net pending
     cumulus@switch:$ net commit
    
  3. Log into the FRRouting CLI:

     cumulus@switch:$ sudo vtysh
    
  4. Configure the MSDP mesh group for all active RPs:

    The mesh group should include all RPs in the domain as members, with a unique address as the source. This configuration will result in MSDP peerings between all RPs.

    switch# conf t
    switch(config)# ip msdp mesh-group cumulus member 100.1.1.2
    switch(config)# ip msdp mesh-group cumulus member 100.1.1.3
    
  5. Pick the local loopback address as the source of the MSDP control packets:

    switch# conf t
    switch(config)# ip msdp mesh-group cumulus source 100.1.1.1
    
  6. Inject the anycast IP address into the domain’s IGP.

If the network is unnumbered and uses unnumbered BGP as the IGP, avoid using the anycast IP address for establishing unicast or multicast peerings. For PIM-SM, ensure that the unique address is used as the PIM hello source by setting the source:

cumulus@switch:$ sudo vtysh
switch# conf t
switch(config)# interface lo
switch(config-if)# ip pim use-source 100.1.1.1

Verifying PIM

The following outputs are based on the Cumulus Reference Topology with cldemo-pim.

Source Starts First

On the FHR, an mroute is built, but the upstream state is “Prune”. The FHR flag is set on the interface receiving multicast.

Use the show ip mroute command to review detailed output for the FHR:

exit01# show ip mroute
Source          Group           Proto  Input      Output     TTL  Uptime
172.16.5.105    239.1.1.1       none   br0        none       0    --:--:--
!
exit01# show ip pim upstream
Iif Source Group State Uptime JoinTimer RSTimer KATimer RefCnt
br0 172.16.5.105 239.1.1.1 Prune 00:07:40 --:--:-- 00:00:36 00:02:50 1
!
exit01# show ip pim upstream-join-desired
Interface Source          Group           LostAssert Joins PimInclude JoinDesired EvalJD
!
exit01# show ip pim interface
Interface  State          Address  PIM Nbrs           PIM DR  FHR
br0           up       172.16.5.1         0            local    1
swp51         up        10.1.0.17         1            local    0
swp52         up        10.1.0.19         0            local    0
!
exit01# show ip pim state
Source           Group            IIF    OIL
172.16.5.105     239.1.1.1        br0
!
exit01# show ip pim int detail
Interface : br0
State     : up
Address   : 172.16.5.1
Designated Router
-----------------
Address   : 172.16.5.1
Priority  : 1
Uptime    : --:--:--
Elections : 2
Changes   : 0
 
FHR - First Hop Router
----------------------
239.1.1.1 : 172.16.5.105 is a source, uptime is 00:27:43

On the spine, no mroute state is created, but the show ip pim upstream output includes the S,G:

spine01# show ip mroute
Source          Group           Proto  Input      Output     TTL  Uptime
!
spine01# show ip pim upstream
Iif       Source          Group           State       Uptime   JoinTimer RSTimer   KATimer   RefCnt
swp30     172.16.5.105    239.1.1.1       Prune       00:00:19 --:--:--  --:--:--  00:02:46       1

As a receiver joins the group, the mroute output interface on the FHR transitions from “none” to the RPF interface of the RP:

exit01# show ip mroute
Source          Group           Proto  Input      Output     TTL  Uptime
172.16.5.105    239.1.1.1       PIM    br0        swp51      1    00:05:40
!
exit01# show ip pim upstream
Iif       Source          Group           State       Uptime   JoinTimer RSTimer   KATimer   RefCnt
br0       172.16.5.105    239.1.1.1       Prune       00:48:23 --:--:--  00:00:00  00:00:37       2
!
exit01# show ip pim upstream-join-desired
Interface Source          Group           LostAssert Joins PimInclude JoinDesired EvalJD
swp51     172.16.5.105    239.1.1.1       no         yes   no         yes         yes
!
exit01# show ip pim state
Source           Group            IIF    OIL
172.16.5.105     239.1.1.1        br0    swp51

spine01# show ip mroute
Source          Group           Proto  Input      Output     TTL  Uptime
*               239.1.1.1       PIM    lo         swp1       1    00:09:59
172.16.5.105    239.1.1.1       PIM    swp30      swp1       1    00:09:59
!
spine01# show ip pim upstream
Iif       Source          Group           State       Uptime   JoinTimer RSTimer   KATimer   RefCnt
lo        *               239.1.1.1       Joined      00:10:01 00:00:59  --:--:--  --:--:--       1
swp30     172.16.5.105    239.1.1.1       Joined      00:00:01 00:00:59  --:--:--  00:02:35       1
!
spine01# show ip pim upstream-join-desired
Interface Source          Group           LostAssert Joins PimInclude JoinDesired EvalJD
swp1      *               239.1.1.1       no         yes   no         yes         yes
!
spine01# show ip pim state
Source           Group            IIF    OIL
*                239.1.1.1        lo     swp1
172.16.5.105     239.1.1.1        swp30  swp1

Receiver Joins First

On the LHR attached to the receiver:

leaf01# show ip mroute
Source          Group           Proto  Input      Output     TTL  Uptime
*               239.2.2.2       IGMP   swp51      br0        1    00:01:19
!
leaf01# show ip pim local-membership
Interface Address         Source          Group           Membership
br0       172.16.1.1      *               239.2.2.2       INCLUDE
!
leaf01# show ip pim state
Source           Group            IIF    OIL
*                239.2.2.2        swp51  br0
!
leaf01# show ip pim upstream
Iif       Source          Group           State       Uptime   JoinTimer RSTimer   KATimer   RefCnt
swp51     *               239.2.2.2       Joined      00:02:07 00:00:53  --:--:--  --:--:--       1
!
leaf01# show ip pim upstream-join-desired
Interface Source          Group           LostAssert Joins PimInclude JoinDesired EvalJD
br0       *               239.2.2.2       no         no    yes        yes         yes
!
leaf01# show ip igmp groups
Interface Address         Group           Mode Timer    Srcs V Uptime
br0       172.16.1.1      239.2.2.2       EXCL 00:04:02    1 3 00:04:12
!
leaf01# show ip igmp sources
Interface Address         Group           Source          Timer Fwd Uptime
br0       172.16.1.1      239.2.2.2       *               03:54   Y 00:04:21

On the RP:

spine01# show ip mroute
Source          Group           Proto  Input      Output     TTL  Uptime
*               239.2.2.2       PIM    lo         swp1       1    00:00:03
!
spine01# show ip pim state
Source           Group            IIF    OIL
*                239.2.2.2        lo     swp1
!
spine01# show ip pim upstream
Iif       Source          Group           State       Uptime   JoinTimer RSTimer   KATimer   RefCnt
lo        *               239.2.2.2       Joined      00:05:17 00:00:43  --:--:--  --:--:--       1
!
spine01# show ip pim upstream-join-desired
Interface Source          Group           LostAssert Joins PimInclude JoinDesired EvalJD
swp1      *               239.2.2.2       no         yes   no         yes         yes

PIM in a VRF

VRFs divide the routing table on a per-tenant basis, ultimately providing for separate layer 3 networks over a single layer 3 infrastructure. With a VRF, each tenant has its own virtualized layer 3 network, so IP addresses can overlap between tenants.

PIM in a VRF enables PIM trees and multicast data traffic to run inside a layer 3 virtualized network, with a separate tree per domain or tenant. Each VRF has its own multicast tree with its own RP(s), sources, and so forth. Thus you can have one tenant per corporate division, client or product, for example.

VRFs on different switches typically connect or are peered over subinterfaces, where each subinterface is in its own VRF, provided MP-BGP VPN is not enabled or supported.

To configure PIM in a VRF, run the following commands. First, add the VRFs and associate them with switch ports:

cumulus@spine01:~$ net add vrf blue
cumulus@spine01:~$ net add vrf purple
cumulus@spine01:~$ net add interface swp1 vrf blue
cumulus@spine01:~$ net add interface swp2 vrf purple

Then add the PIM configuration to FRR, review and commit the changes:

cumulus@spine01:~$ net add interface swp1 pim sm
cumulus@spine01:~$ net add interface swp2 pim sm
cumulus@spine01:~$ net add bgp vrf blue auto 65001
cumulus@spine01:~$ net add bgp vrf purple auto 65000
cumulus@spine01:~$ net add bgp vrf blue router-id 10.1.1.1
cumulus@spine01:~$ net add bgp vrf purple router-id 10.1.1.2
cumulus@spine01:~$ net add bgp vrf blue neighbor swp1 interface remote-as external
cumulus@spine01:~$ net add bgp vrf purple neighbor swp2 interface remote-as external
cumulus@spine01:~$ net pending
cumulus@spine01:~$ net commit

These commands create the following configuration in the /etc/network/interfaces file and the /etc/frr/frr.conf file:

auto purple
iface purple
     vrf-table auto
 
auto blue
iface blue
    vrf-table auto
 
auto swp1
iface swp1
      vrf purple
 
auto swp49.1
iface swp49.1
     vrf purple
 
auto swp2
iface swp2
      vrf blue
 
auto swp49.2
iface swp49.2
     vrf blue
 
...

ip pim rp 192.168.0.1 224.0.0.0/4
 
vrf purple
  ip pim rp 192.168.0.1 224.0.0.0/4
!
vrf blue 
  ip pim rp 192.168.0.1 224.0.0.0/4 
!
 
int swp1 vrf purple
   ip pim sm
  ip igmp version 2
 
int swp2 vrf blue
   ip pim sm
   ip igmp version 3
 
int swp49.1 vrf purple
    ip pim sm
   
int swp49.2
   ip pim sm
 
router bgp 65000 vrf purple
    Bgp router-id 10.1.1.1
    Neighbor PURPLE peer-group
    Neighbor PURPLE remote-as external
    neighbor swp49.1 interface peer-group PURPLE
 
router bgp 65001 vrf blue
    bgp router-id 10.1.1.2
    neighbor BLUE peer-group
    neighbor BLUE remote-as external
    neighbor swp49.2 interface peer-group BLUE

In FRR, you can use show commands to display VRF information:

spine01# show ip mroute vrf blue
Source          Group           Proto  Input      Output     TTL  Uptime
11.1.0.1        239.1.1.1       IGMP   swp32s0    swp32s1    1    00:01:13
                                IGMP              br0.200    1    00:01:13
*               239.1.1.2       IGMP   mars       pimreg1001 1    00:01:13
                                IGMP              swp32s1    1    00:01:12
                                IGMP              br0.200    1    00:01:13

spine01# show ip mroute
Source          Group           Proto  Input      Output     TTL  Uptime
11.1.0.1        239.1.1.1       IGMP   swp31s0    swp31s1    1    00:01:15
                                IGMP              br0.100    1    00:01:15
*               239.1.1.2       IGMP   lo         pimreg     1    00:01:15
                                IGMP              swp31s1    1    00:01:14
                                IGMP              br0.100    1    00:01:15

BFD for PIM Neighbors

You can use bidirectional forward detection (BFD) for PIM neighbors to quickly detect link failures. When you configure an interface, include the pim bfd option:

cumulus@switch:~$ net add interface swp31s3 pim bfd

Troubleshooting PIM

FHR Stuck in Registering Process

When a multicast source starts, the FHR sends unicast PIM register messages from the RPF interface towards the source. After the PIM register is received by the RP, a PIM register stop message is sent from the RP to the FHR to end the register process. If an issue with this communication, the FHR will remain stuck in the registering process, which can result in high CPU, as PIM register packets are generated by the FHR CPU, and sent to the RP CPU.

To assess this issue, review the FHR. The output interface of pimreg can be seen here. If this does not change to an interface within a few seconds, the FHR is likely stuck.

exit01# show ip mroute
Source          Group           Proto  Input      Output     TTL  Uptime
172.16.5.105    239.2.2.3       PIM    br0        pimreg     1    00:03:59

To troubleshoot the issue:

  1. Validate that the FHR can reach the RP. If the RP and FHR can not communicate, the Registration process will fail:

     cumulus@exit01:~$ ping 10.0.0.21 -I br0
     PING 10.0.0.21 (10.0.0.21) from 172.16.5.1 br0: 56(84) bytes of data.
     ^C
     --- 10.0.0.21 ping statistics ---
     4 packets transmitted, 0 received, 100% packet loss, time 3000ms
    
  2. On the RP, use tcpdump to see if the PIM Register packets are arriving:

     cumulus@spine01:~$ sudo tcpdump -i swp30
     tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
     listening on swp30, link-type EN10MB (Ethernet), capture size 262144 bytes
     23:33:17.524982 IP 172.16.5.1 > 10.0.0.21: PIMv2, Register, length 66
    
  3. If PIM Registration packets are being received, verify that they are seen by PIM by issuing debug pim packets from within FRRouting:

     cumulus@spine01:~$ sudo vtysh -c "debug pim packets"
     PIM Packet debugging is on
         
     cumulus@spine01:~$ sudo tail /var/log/frr/frr.log
     2016/10/19 23:46:51 PIM: Recv PIM REGISTER packet from 172.16.5.1 to 10.0.0.21 on swp30: ttl=255 pim_version=2 pim_msg_size=64 checksum=a681
    
  4. Repeat the process on the FHR to see if PIM Register Stop messages are being received on the FHR and passed to the PIM process:

     cumulus@exit01:~$ sudo tcpdump -i swp51
     23:58:59.841625 IP 172.16.5.1 > 10.0.0.21: PIMv2, Register, length 28
     23:58:59.842466 IP 10.0.0.21 > 172.16.5.1: PIMv2, Register Stop, length 18
         
     cumulus@exit01:~$ sudo vtysh -c "debug pim packets"
     PIM Packet debugging is on
         
     cumulus@exit01:~$ sudo tail -f /var/log/frr/frr.log
     2016/10/19 23:59:38 PIM: Recv PIM REGSTOP packet from 10.0.0.21 to 172.16.5.1 on swp51: ttl=255 pim_version=2 pim_msg_size=18 checksum=5a39
    

No *,G Is Built on LHR

The most common reason for a *,G to not be built on a LHR is for both PIM and IGMP to not be enabled on an interface facing a receiver.

leaf01# show run
!
interface br0
 ip igmp
 ip ospf area 0.0.0.0
 ip pim sm

To troubleshoot this issue:

  1. If both PIM and IGMP are enabled, ensure that IGMPv3 Joins are being sent by the receiver:

     cumulus@leaf01:~$ sudo tcpdump -i br0 igmp
     tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
     listening on br0, link-type EN10MB (Ethernet), capture size 262144 bytes
     00:03:55.789744 IP 172.16.1.101 > igmp.mcast.net: igmp v3 report, 1 group record(s)
    

No mroute Created on FHR

To troubleshoot this issue:

  1. Verify that multicast traffic is being received:

     cumulus@exit01:~$ sudo tcpdump -i br0
     tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
     listening on br0, link-type EN10MB (Ethernet), capture size 262144 bytes
     00:11:52.944745 IP 172.16.5.105.51570 > 239.2.2.9.1000: UDP, length 9
    
  2. Verify that PIM is configured on the interface facing the source:

     exit01# show run
     !
     interface br0
      ip ospf area 0.0.0.0
      ip pim sm
    
    1. If PIM is configured, verify that the RPF interface for the source matches the interface the multicast traffic is received on:

       exit01# show ip rpf 172.16.5.105
       Routing entry for 172.16.5.0/24 using Multicast RIB
         Known via "connected", distance 0, metric 0, best
         * directly connected, br0
      
  3. Verify that an RP is configured for the multicast group:

     exit01# show ip pim rp-info
     RP address       group/prefix-list   OIF         I am RP
     10.0.0.21        224.0.0.0/4         swp51       no
    

No S,G on RP for an Active Group

An RP will not build an mroute when there are no active receivers for a multicast group, even though the mroute was created on the FHR:

spine01# show ip mroute
Source          Group           Proto  Input      Output     TTL  Uptime
spine01#

exit01# show ip mroute
Source          Group           Proto  Input      Output     TTL  Uptime
172.16.5.105    239.2.2.9       none   br0        none       0    --:--:--

This is expected behavior. The active source can be seen on the RP with show ip pim upstream:

spine01# show ip pim upstream
Iif       Source          Group           State       Uptime   JoinTimer RSTimer   KATimer   RefCnt
swp30     172.16.5.105    239.2.2.9       Prune       00:08:03 --:--:--  --:--:--  00:02:20       1
!
spine01# show ip mroute
Source          Group           Proto  Input      Output     TTL  Uptime
spine01#

No mroute Entry Present in Hardware

Please verify that the hardware IP multicast entry is the maximum value already, using the cl-resource-query command:

cumulus@switch:~$ cl-resource-query  | grep Mcast
   Total Mcast Routes:         450,   0% of maximum value    450

For Mellanox chipsets, refer to TCAM Resource Profiles for Mellanox Switches.

Verify MSDP Session State

Run the following commands to verify the state of MSDP sessions:

switch# show ip msdp mesh-group 
Mesh group : pod1
  Source : 100.1.1.1
  Member                 State
  100.1.1.2        established
  100.1.1.3        established
spine-1# show ip msdp peer       
Peer                       Local        State    Uptime   SaCnt
100.1.1.2              100.1.1.1  established  00:07:21       0
100.1.1.3              100.1.1.1  established  00:07:21       0
spine-1# 

View the Active Sources

Review the active sources learned locally (via PIM registers) and from MSDP peers:

switch# show ip msdp sa   
Source                     Group               RP  Local  SPT    Uptime
44.1.11.2              239.1.1.1        100.1.1.1      n    n  00:00:40
44.1.11.2              239.1.1.2        100.1.1.1      n    n  00:00:25
spine-2# 

Caveats and Errata

  • Cumulus Linux 3.2.0 only supports PIM Sparse Mode - Any-source Multicast (PIM-SM ASM) and Source-specific Multicast (SSM). Dense Mode and Bidirectional Multicast are not supported.
  • S,G mroutes are not built on routers that are not the Rendezvous Point (RP) or the First-hop Router (FHR). S,G PIM Joins will be sent, but only *,G mroutes are built. As a result, all traffic will flow over the *,G tree, similar to PIM Bidirectional Multicast.
  • Non-native forwarding (register decapsulation) is not supported. Initial packet loss is expected while the PIM *,G tree is built from the Rendezvous Point (RP) to the First-hop Router (FHR) to trigger native forwarding.
  • Cumulus Linux does not currently build an S,G mroute when forwarding