Cumulus Networks

Cumulus NetQ

Cumulus® NetQ is a highly-scalable, modern network operations tool set that provides visibility into your overlay and underlay networks in real-time. NetQ delivers actionable insights and operational intelligence about the health of your data center - from the container, virtual machine, or host, all the way to the switch and port.

Cumulus NetQ Deployment Guide

This guide is intended for network administrators who are responsible for installation, setup, and maintenance of Cumulus NetQ in their data center environment. NetQ offers the ability to monitor and manage your data center network infrastructure and operational health with simple tools based on open source Linux. This guide provides instructions and information about installing NetQ core capabilities, configuring optional capabilities, and upgrading an existing NetQ installation. This guide assumes you have already installed Cumulus Linux on your network switches and you are ready to add these NetQ capabilities.

For information about monitoring and troubleshooting your network, refer to the Cumulus NetQ CLI User Guide or the Cumulus NetQ UI User Guide.

Before you get started, you should review the release notes for this version.

Cumulus NetQ Overview

Cumulus® NetQ is a highly-scalable, modern network operations tool set that provides visibility and troubleshooting of your overlay and underlay networks in real-time. NetQ delivers actionable insights and operational intelligence about the health of your data center - from the container, virtual machine, or host, all the way to the switch and port. NetQ correlates configuration and operational status, and instantly identifies and tracks state changes while simplifying management for the entire Linux-based data center. With NetQ, network operations change from a manual, reactive, box-by-box approach to an automated, informed and agile one.

Cumulus NetQ performs three primary functions:

NetQ is available as an on-site or in-cloud deployment.

Unlike other network operations tools, NetQ delivers significant operational improvements to your network management and maintenance processes. It simplifies the data center network by reducing the complexity through real-time visibility into hardware and software status and eliminating the guesswork associated with investigating issues through the analysis and presentation of detailed, focused data.

Demystify Overlay Networks

While overlay networks provide significant advantages in network management, it can be difficult to troubleshoot issues that occur in the overlay one box at a time. You are unable to correlate what events (configuration changes, power outages, etc.) may have caused problems in the network and when they occurred. Only a sampling of data is available to use for your analysis. By contrast, with Cumulus NetQ deployed, you have a network-wide view of the overlay network, can correlate events with what is happening now or in the past, and have real-time data to fill out the complete picture of your network health and operation.

In summary:

Without NetQWith NetQ
Difficult to debug overlay networkView network-wide status of overlay network
Hard to find out what happened in the pastView historical activity with time-machine view
Periodically sampled dataReal-time collection of telemetry data for a more complete data set

Protect Network Integrity with NetQ Validation

Network configuration changes can cause numerous trouble tickets because you are not able to test a new configuration before deploying it. When the tickets start pouring in, you are stuck with a large amount of data that is collected and stored in multiple tools making correlation of the events to the resolution required difficult at best. Isolating faults in the past is challenging. By contract, with Cumulus NetQ deployed, you can proactively verify a configuration change as inconsistencies and misconfigurations can be caught prior to deployment. And historical data is readily available to correlate past events with current issues.

In summary:

Without NetQ

With NetQ

Reactive to trouble tickets

Catch inconsistencies and misconfigurations prior to deployment with integrity checks/validation

Large amount of data and multiple tools to
correlate the logs/events with the issues

Correlate network status, all in one place

Periodically sampled data

Readily available historical data for viewing and correlating changes in the past with current issues

Troubleshoot Issues Across the Network

Troubleshooting networks is challenging in the best of times, but trying to do so manually, one box at a time, and digging through a series of long and ugly logs make the job harder than it needs to be. Cumulus NetQ provides rolled up and correlated network status on a regular basis, enabling you to get down to the root of the problem quickly, whether it occurred recently or over a week ago. The graphical user interface makes this possible visually to speed the analysis.

In summary:

Without NetQ

With NetQ

Large amount of data and multiple tools to
correlate the logs/events with the issues

Rolled up and correlated network status, view events and status together

Past events are lost

Historical data gathered and stored for comparison with current network state

Manual, box-by-box troubleshooting

View issues on all devices all at once, pointing to the source of the problem

Track Connectivity with NetQ Trace

Conventional trace only traverses the data path looking for problems, and does so on a node to node basis. For paths with a small number of hops that might be fine, but in larger networks, it can become extremely time consuming. With Cumulus NetQ both the data and control paths are verified providing additional information. It discovers misconfigurations along all of the hops in one go, speeding the time to resolution.

In summary:

Without NetQWith NetQ
Trace covers only data path; hard to check control pathBoth data and control paths are verified
View portion of entire pathView all paths between devices all at once to find problem paths
Node-to-node check on misconfigurationsView any misconfigurations along all hops from source to destination

Cumulus NetQ Components

Cumulus NetQ contains the following applications and key components:

While these function apply to both the on-site and in-cloud solutions, where the functions reside varies, as shown here.

NetQ interfaces with event notification applications, third-party analytics tools.

Each of the NetQ components used to gather, store and process data about the network state are described here.

NetQ Agents

NetQ Agents are software installed and running on every monitored node in the network - including Cumulus® Linux® switches, Linux bare-metal hosts, and virtual machines. The NetQ Agents push network data regularly and event information immediately to the NetQ Platform.

Switch Agents

The NetQ Agents running on Cumulus Linux switches gather the following network data via Netlink:

for the following protocols:

The NetQ Agent is supported on Cumulus Linux 3.3.2 and later.

Host Agents

The NetQ Agents running on hosts gather the same information as that for switches, plus the following network data:

The NetQ Agent obtains container information by listening to the Kubernetes orchestration tool.

The NetQ Agent is supported on hosts running Ubuntu 16.04, Red Hat® Enterprise Linux 7, and CentOS 7 Operating Systems.

NetQ Core

The NetQ core performs the data collection, storage, and processing for delivery to various user interfaces. It is comprised of a collection of scalable components running entirely within a single server. The NetQ software queries this server, rather than individual devices enabling greater scalability of the system. Each of these components is described briefly here.

Data Aggregation

The data aggregation component collects data coming from all of the NetQ Agents. It then filters, compresses, and forwards the data to the streaming component. The server monitors for missing messages and also monitors the NetQ Agents themselves, providing alarms when appropriate. In addition to the telemetry data collected from the NetQ Agents, the aggregation component collects information from the switches and hosts, such as vendor, model, version, and basic operational state.

Data Stores

Two types of data stores are used in the NetQ product. The first stores the raw data, data aggregations, and discrete events needed for quick response to data requests. The second stores data based on correlations, transformations and processing of the raw data.

Real-time Streaming

The streaming component processes the incoming raw data from the aggregation server in real time. It reads the metrics and stores them as a time series, and triggers alarms based on anomaly detection, thresholds, and events.

Network Services

The network services component monitors protocols and services operation individually and on a network-wide basis and stores status details.

User Interfaces

NetQ data is available through several user interfaces:

The CLI and UI query the RESTful API for the data to present. Standard integrations can be configured to integrate with third-party notification tools.

Data Center Network Deployments

There are three deployment types that are commonly deployed for network management in the data center:

A summary of each type is provided here.

Cumulus NetQ operates over layer 3, and can be used in both layer 2 bridged and layer 3 routed environments. Cumulus Networks always recommends layer 3 routed environments whenever possible.

Out-of-Band Management Deployment

Cumulus Networks recommends deploying NetQ on an out-of-band (OOB) management network to separate network management traffic from standard network data traffic, but it is not required. This figure shows a sample CLOS-based network fabric design for a data center using an OOB management network overlaid on top, where NetQ is deployed.

The physical network hardware includes:

The diagram shows physical connections (in the form of grey lines) between Spine 01 and four Leaf devices and two Exit devices, and Spine 02 and the same four Leaf devices and two Exit devices. Leaf 01 and Leaf 02 are connected to each other over a peerlink and act as an MLAG pair for Server 01 and Server 02. Leaf 03 and Leaf 04 are connected to each other over a peerlink and act as an MLAG pair for Server 03 and Server 04. The Edge is connected to both Exit devices, and the Internet node is connected to Exit 01.

Data Center Network Example

The physical management hardware includes:

These switches are connected to each of the physical network devices through a virtual network overlay, shown with purple lines.

In-band Management Deployment

While not the preferred deployment method, you might choose to implement NetQ within your data network. In this scenario, there is no overlay and all traffic to and from the NetQ Agents and the NetQ Platform traverses the data paths along with your regular network traffic. The roles of the switches in the CLOS network are the same, except that the NetQ Platform performs the aggregation function that the OOB management switch performed. If your network goes down, you might not have access to the NetQ Platform for troubleshooting.

High Availability Deployment

NetQ supports a high availability deployment for users who prefer a solution in which the collected data and processing provided by the NetQ Platform remains available through alternate equipment should the platform fail for any reason. In this configuration, three NetQ Platforms are deployed, with one as the master and two as workers (or replicas). Data from the NetQ Agents is sent to all three switches so that if the master NetQ Platform fails, one of the replicas automatically becomes the master and continues to store and provide the telemetry data. This example is based on an OOB management configuration, and modified to support high availability for NetQ.

Cumulus NetQ Operation

In either in-band or out-of-band deployments, NetQ offers network-wide configuration and device management, proactive monitoring capabilities, and performance diagnostics for complete management of your network. Each component of the solution provides a critical element to make this possible.

The NetQ Agent

From a software perspective, a network switch has software associated with the hardware platform, the operating system, and communications. For data centers, the software on a Cumulus Linux network switch would be similar to the diagram shown here.

The NetQ Agent interacts with the various components and software on switches and hosts and provides the gathered information to the NetQ Platform. You can view the data using the NetQ CLI or UI.

The NetQ Agent polls the user space applications for information about the performance of the various routing protocols and services that are running on the switch. Cumulus Networks supports BGP and OSPF Free Range Routing (FRR) protocols as well as static addressing. Cumulus Linux also supports LLDP and MSTP among other protocols, and a variety of services such as systemd and sensors . For hosts, the NetQ Agent also polls for performance of containers managed with Kubernetes. All of this information is used to provide the current health of the network and verify it is configured and operating correctly.

For example, if the NetQ Agent learns that an interface has gone down, a new BGP neighbor has been configured, or a container has moved, it provides that information to the NetQ Platform. That information can then be used to notify users of the operational state change through various channels. By default, data is logged in the database, but you can use the CLI (netq show events) or configure the Event Service in NetQ to send the information to a third-party notification application as well. NetQ supports PagerDuty and Slack integrations.

The NetQ Agent interacts with the Netlink communications between the Linux kernel and the user space, listening for changes to the network state, configurations, routes and MAC addresses. NetQ uses this information to enable notifications about these changes so that network operators and administrators can respond quickly when changes are not expected or favorable.

For example, if a new route is added or a MAC address removed, NetQ Agent records these changes and sends that information to the NetQ Platform. Based on the configuration of the Event Service, these changes can be sent to a variety of locations for end user response.

The NetQ Agent also interacts with the hardware platform to obtain performance information about various physical components, such as fans and power supplies, on the switch. Operational states and temperatures are measured and reported, along with cabling information to enable management of the hardware and cabling, and proactive maintenance.

For example, as thermal sensors in the switch indicate that it is becoming very warm, various levels of alarms are generated. These are then communicated through notifications according to the Event Service configuration.

The NetQ Platform

Once the collected data is sent to and stored in the NetQ database, you can:

Validate Configurations

The NetQ CLI enables validation of your network health through two sets of commands: netq check and netq show. They extract the information from the Network Service component and Event service. The Network Service component is continually validating the connectivity and configuration of the devices and protocols running on the network. Using the netq check and netq show commands displays the status of the various components and services on a network-wide and complete software stack basis. For example, you can perform a network-wide check on all sessions of BGP with a single netq check bgp command. The command lists any devices that have misconfigurations or other operational errors in seconds. When errors or misconfigurations are present, using the netq show bgp command displays the BGP configuration on each device so that you can compare and contrast each device, looking for potential causes. netq check and netq show commands are available for numerous components and services as shown in the following table.

Component or ServiceCheckShowComponent or ServiceCheckShow
AgentsXXLLDPX
BGPXXLNVXX
CLAG (MLAG)XXMACsX
EventsXMTUX
EVPNXXNTPXX
InterfacesXXOSPFXX
InventoryXSensorsXX
IPv4/v6XServicesX
KubernetesXVLANXX
LicenseXVXLANXX

Monitor Communication Paths

The trace engine is used to validate the available communication paths between two network devices. The corresponding netq trace command enables you to view all of the paths between the two devices and if there are any breaks in the paths. This example shows two successful paths between server12 and leaf11, all with an MTU of 9152. The first command shows the output in path by path tabular mode. The second command show the same output as a tree.

cumulus@switch:~$ netq trace 10.0.0.13 from 10.0.0.21
Number of Paths: 2
Number of Paths with Errors: 0
Number of Paths with Warnings: 0
Path MTU: 9152
Id  Hop Hostname    InPort          InTun, RtrIf    OutRtrIf, Tun   OutPort
--- --- ----------- --------------- --------------- --------------- ---------------
1   1   server12                                                    bond1.1002
    2   leaf12      swp8                            vlan1002        peerlink-1
    3   leaf11      swp6            vlan1002                        vlan1002
--- --- ----------- --------------- --------------- --------------- ---------------
2   1   server12                                                    bond1.1002
    2   leaf11      swp8                                            vlan1002
--- --- ----------- --------------- --------------- --------------- ---------------
 
 
cumulus@switch:~$ netq trace 10.0.0.13 from 10.0.0.21 pretty
Number of Paths: 2
Number of Paths with Errors: 0
Number of Paths with Warnings: 0
Path MTU: 9152
 hostd-12 bond1.1002 -- swp8 leaf12 <vlan1002> peerlink-1 -- swp6 <vlan1002> leaf11 vlan1002
          bond1.1002 -- swp8 leaf11 vlan1002

This output is read as:

If the MTU does not match across the network, or any of the paths or parts of the paths have issues, that data is called out in the summary at the top of the output and shown in red along the paths, giving you a starting point for troubleshooting.

View Historical State and Configuration

All of the check, show and trace commands can be run for the current status and for a prior point in time. For example, this is useful when you receive messages from the night before, but are not seeing any problems now. You can use the netq check command to look for configuration or operational issues around the time that the messages are timestamped. Then use the netq show commands to see information about how the devices in question were configured at that time or if there were any changes in a given timeframe. Optionally, you can use the netq trace command to see what the connectivity looked like between any problematic nodes at that time. This example shows problems occurred on spine01, leaf04, and server03 last night. The network administrator received notifications and wants to investigate. The diagram is followed by the commands to run to determine the cause of a BGP error on spine01. Note that the commands use the around option to see the results for last night and that they can be run from any switch in the network.

cumulus@switch:~$ netq check bgp around 30m
Total Nodes: 25, Failed Nodes: 3, Total Sessions: 220 , Failed Sessions: 24,
Hostname          VRF             Peer Name         Peer Hostname     Reason                                        Last Changed
----------------- --------------- ----------------- ----------------- --------------------------------------------- -------------------------
exit-1            DataVrf1080     swp6.2            firewall-1        BGP session with peer firewall-1 swp6.2: AFI/ 1d:2h:6m:21s
                                                                      SAFI evpn not activated on peer              
exit-1            DataVrf1080     swp7.2            firewall-2        BGP session with peer firewall-2 (swp7.2 vrf  1d:1h:59m:43s
                                                                      DataVrf1080) failed,                         
                                                                      reason: Peer not configured                  
exit-1            DataVrf1081     swp6.3            firewall-1        BGP session with peer firewall-1 swp6.3: AFI/ 1d:2h:6m:21s
                                                                      SAFI evpn not activated on peer              
exit-1            DataVrf1081     swp7.3            firewall-2        BGP session with peer firewall-2 (swp7.3 vrf  1d:1h:59m:43s
                                                                      DataVrf1081) failed,                         
                                                                      reason: Peer not configured                  
exit-1            DataVrf1082     swp6.4            firewall-1        BGP session with peer firewall-1 swp6.4: AFI/ 1d:2h:6m:21s
                                                                      SAFI evpn not activated on peer              
exit-1            DataVrf1082     swp7.4            firewall-2        BGP session with peer firewall-2 (swp7.4 vrf  1d:1h:59m:43s
                                                                      DataVrf1082) failed,                         
                                                                      reason: Peer not configured                  
exit-1            default         swp6              firewall-1        BGP session with peer firewall-1 swp6: AFI/SA 1d:2h:6m:21s
                                                                      FI evpn not activated on peer                
exit-1            default         swp7              firewall-2        BGP session with peer firewall-2 (swp7 vrf de 1d:1h:59m:43s
...
 
cumulus@switch:~$ netq exit-1 show bgp
Matching bgp records:
Hostname          Neighbor                     VRF             ASN        Peer ASN   PfxRx        Last Changed
----------------- ---------------------------- --------------- ---------- ---------- ------------ -------------------------
exit-1            swp3(spine-1)                default         655537     655435     27/24/412    Fri Feb 15 17:20:00 2019
exit-1            swp3.2(spine-1)              DataVrf1080     655537     655435     14/12/0      Fri Feb 15 17:20:00 2019
exit-1            swp3.3(spine-1)              DataVrf1081     655537     655435     14/12/0      Fri Feb 15 17:20:00 2019
exit-1            swp3.4(spine-1)              DataVrf1082     655537     655435     14/12/0      Fri Feb 15 17:20:00 2019
exit-1            swp4(spine-2)                default         655537     655435     27/24/412    Fri Feb 15 17:20:00 2019
exit-1            swp4.2(spine-2)              DataVrf1080     655537     655435     14/12/0      Fri Feb 15 17:20:00 2019
exit-1            swp4.3(spine-2)              DataVrf1081     655537     655435     14/12/0      Fri Feb 15 17:20:00 2019
exit-1            swp4.4(spine-2)              DataVrf1082     655537     655435     13/12/0      Fri Feb 15 17:20:00 2019
exit-1            swp5(spine-3)                default         655537     655435     28/24/412    Fri Feb 15 17:20:00 2019
exit-1            swp5.2(spine-3)              DataVrf1080     655537     655435     14/12/0      Fri Feb 15 17:20:00 2019
exit-1            swp5.3(spine-3)              DataVrf1081     655537     655435     14/12/0      Fri Feb 15 17:20:00 2019
exit-1            swp5.4(spine-3)              DataVrf1082     655537     655435     14/12/0      Fri Feb 15 17:20:00 2019
exit-1            swp6(firewall-1)             default         655537     655539     73/69/-      Fri Feb 15 17:22:10 2019
exit-1            swp6.2(firewall-1)           DataVrf1080     655537     655539     73/69/-      Fri Feb 15 17:22:10 2019
exit-1            swp6.3(firewall-1)           DataVrf1081     655537     655539     73/69/-      Fri Feb 15 17:22:10 2019
exit-1            swp6.4(firewall-1)           DataVrf1082     655537     655539     73/69/-      Fri Feb 15 17:22:10 2019
exit-1            swp7                         default         655537     -          NotEstd      Fri Feb 15 17:28:48 2019
exit-1            swp7.2                       DataVrf1080     655537     -          NotEstd      Fri Feb 15 17:28:48 2019
exit-1            swp7.3                       DataVrf1081     655537     -          NotEstd      Fri Feb 15 17:28:48 2019
exit-1            swp7.4                       DataVrf1082     655537     -          NotEstd      Fri Feb 15 17:28:48 2019

Manage Network Events

The NetQ notifier manages the events that occur for the devices and components, protocols and services that it receives from the NetQ Agents. The notifier enables you to capture and filter events that occur to manage the behavior of your network. This is especially useful when an interface or routing protocol goes down and you want to get them back up and running as quickly as possible, preferably before anyone notices or complains. You can improve resolution time significantly by creating filters that focus on topics appropriate for a particular group of users. You can easily create filters around events related to BGP, LNV, and MLAG session states, interfaces, links, NTP and other services, fans, power supplies, and physical sensor measurements.

For example, for operators responsible for routing, you can create an integration with a notification application that notifies them of routing issues as they occur. This is an example of a Slack message received on a netq-notifier channel indicating that the BGP session on switch leaf04 interface swp2 has gone down.

Timestamps in NetQ

Every event or entry in the NetQ database is stored with a timestamp of when the event was captured by the NetQ Agent on the switch or server. This timestamp is based on the switch or server time where the NetQ Agent is running, and is pushed in UTC format. It is important to ensure that all devices are NTP synchronized to prevent events from being displayed out of order or not displayed at all when looking for events that occurred at a particular time or within a time window.

Interface state, IP addresses, routes, ARP/ND table (IP neighbor) entries and MAC table entries carry a timestamp that represents the time the event happened (such as when a route is deleted or an interface comes up) - except the first time the NetQ agent is run. If the network has been running and stable when a NetQ agent is brought up for the first time, then this time reflects when the agent was started. Subsequent changes to these objects are captured with an accurate time of when the event happened.

Data that is captured and saved based on polling, and just about all other data in the NetQ database, including control plane state (such as BGP or MLAG), has a timestamp of when the information was captured rather than when the event actually happened, though NetQ compensates for this if the data extracted provides additional information to compute a more precise time of the event. For example, BGP uptime can be used to determine when the event actually happened in conjunction with the timestamp.

When retrieving the timestamp, command outputs display the time in three ways:

This example shows the difference between the timestamp displays.

cumulus@switch:~$ netq show bgp
Matching bgp records:
Hostname          Neighbor                     VRF             ASN        Peer ASN   PfxRx        Last Changed
----------------- ---------------------------- --------------- ---------- ---------- ------------ -------------------------
exit-1            swp3(spine-1)                default         655537     655435     27/24/412    Fri Feb 15 17:20:00 2019
exit-1            swp3.2(spine-1)              DataVrf1080     655537     655435     14/12/0      Fri Feb 15 17:20:00 2019
exit-1            swp3.3(spine-1)              DataVrf1081     655537     655435     14/12/0      Fri Feb 15 17:20:00 2019
exit-1            swp3.4(spine-1)              DataVrf1082     655537     655435     14/12/0      Fri Feb 15 17:20:00 2019
exit-1            swp4(spine-2)                default         655537     655435     27/24/412    Fri Feb 15 17:20:00 2019
exit-1            swp4.2(spine-2)              DataVrf1080     655537     655435     14/12/0      Fri Feb 15 17:20:00 2019
exit-1            swp4.3(spine-2)              DataVrf1081     655537     655435     14/12/0      Fri Feb 15 17:20:00 2019
exit-1            swp4.4(spine-2)              DataVrf1082     655537     655435     13/12/0      Fri Feb 15 17:20:00 2019
...
 
cumulus@switch:~$ netq show agents
Matching agents records:
Hostname          Status           NTP Sync Version                              Sys Uptime                Agent Uptime              Reinitialize Time          Last Changed
----------------- ---------------- -------- ------------------------------------ ------------------------- ------------------------- -------------------------- -------------------------
leaf01            Fresh            yes      2.0.0-cl3u11~1549993210.e902a94      2h:32m:33s                2h:26m:19s                2h:26m:19s                 Tue Feb 12 18:13:28 2019
leaf02            Fresh            yes      2.0.0-cl3u11~1549993210.e902a94      2h:32m:33s                2h:26m:14s                2h:26m:14s                 Tue Feb 12 18:13:33 2019
leaf11            Fresh            yes      2.0.0-ub16.04u11~1549993314.e902a94  2h:32m:28s                2h:25m:49s                2h:25m:49s                 Tue Feb 12 18:17:32 2019
leaf12            Fresh            yes      2.0.0-rh7u11~1549992132.c42c08f      2h:32m:0s                 2h:25m:44s                2h:25m:44s                 Tue Feb 12 18:17:36 2019
leaf21            Fresh            yes      2.0.0-ub16.04u11~1549993314.e902a94  2h:32m:28s                2h:25m:39s                2h:25m:39s                 Tue Feb 12 18:17:42 2019
leaf22            Fresh            yes      2.0.0-rh7u11~1549992132.c42c08f      2h:32m:0s                 2h:25m:35s                2h:25m:35s                 Tue Feb 12 18:17:46 2019
spine01           Fresh            yes      2.0.0-cl3u11~1549993210.e902a94      2h:32m:33s                2h:27m:11s                2h:27m:11s                 Tue Feb 12 18:13:06 2019
spine02           Fresh            yes      2.0.0-cl3u11~1549993210.e902a94      2h:32m:33s                2h:27m:6s                 2h:27m:6s                  Tue Feb 12 18:13:11 2019
...
 
cumulus@switch:~$ netq show agents json
{
    "agents":[
        {
            "status":"Fresh",
            "lastChanged":1549995208.3039999008,
            "reinitializeTime":1549995146.0,
            "hostname":"leaf01",
            "version":"2.0.0-cl3u11~1549993210.e902a94",
            "sysUptime":1549994772.0,
            "ntpSync":"yes",
            "agentUptime":1549995146.0
        },
        {
            "status":"Fresh",
            "lastChanged":1549995213.3399999142,
            "reinitializeTime":1549995151.0,
            "hostname":"leaf02",
            "version":"2.0.0-cl3u11~1549993210.e902a94",
            "sysUptime":1549994772.0,
            "ntpSync":"yes",
            "agentUptime":1549995151.0
        },
        {
            "status":"Fresh",
            "lastChanged":1549995434.3559999466,
            "reinitializeTime":1549995157.0,
            "hostname":"leaf11",
            "version":"2.0.0-ub16.04u11~1549993314.e902a94",
            "sysUptime":1549994772.0,
            "ntpSync":"yes",
            "agentUptime":1549995157.0
        },
        {
            "status":"Fresh",
            "lastChanged":1549995439.3770000935,
            "reinitializeTime":1549995164.0,
            "hostname":"leaf12",
            "version":"2.0.0-rh7u11~1549992132.c42c08f",
            "sysUptime":1549994809.0,
            "ntpSync":"yes",
            "agentUptime":1549995164.0
        },
        {
            "status":"Fresh",
            "lastChanged":1549995452.6830000877,
            "reinitializeTime":1549995176.0,
            "hostname":"leaf21",
            "version":"2.0.0-ub16.04u11~1549993314.e902a94",
            "sysUptime":1549994777.0,
            "ntpSync":"yes",
            "agentUptime":1549995176.0
        },
        {
            "status":"Fresh",
            "lastChanged":1549995456.4500000477,
            "reinitializeTime":1549995181.0,
            "hostname":"leaf22",
            "version":"2.0.0-rh7u11~1549992132.c42c08f",
            "sysUptime":1549994805.0,
            "ntpSync":"yes",
            "agentUptime":1549995181.0
        },
        {
            "status":"Fresh",
            "lastChanged":1549995186.3090000153,
            "reinitializeTime":1549995094.0,
            "hostname":"spine01",
            "version":"2.0.0-cl3u11~1549993210.e902a94",
            "sysUptime":1549994772.0,
            "ntpSync":"yes",
            "agentUptime":1549995094.0
        },
        {
            "status":"Fresh",
            "lastChanged":1549995191.4530000687,
            "reinitializeTime":1549995099.0,
            "hostname":"spine02",
            "version":"2.0.0-cl3u11~1549993210.e902a94",
            "sysUptime":1549994772.0,
            "ntpSync":"yes",
            "agentUptime":1549995099.0
        },
...

If a NetQ Agent is restarted on a device, the timestamps for existing objects are not updated to reflect this new restart time. Their timestamps are preserved relative to the original start time of the Agent. A rare exception is if the device is rebooted between the time it takes the Agent being stopped and restarted; in this case, the time is once again relative to the start time of the Agent.

Exporting NetQ Data

Data from the NetQ Platform can be exported in a couple of ways:

Example Using the CLI

You can check the state of BGP on your network with netq check bgp:

cumulus@leaf01:~$ netq check bgp
Total Nodes: 25, Failed Nodes: 3, Total Sessions: 220 , Failed Sessions: 24,
Hostname          VRF             Peer Name         Peer Hostname     Reason                                        Last Changed
----------------- --------------- ----------------- ----------------- --------------------------------------------- -------------------------
exit01            DataVrf1080     swp6.2            firewall01        BGP session with peer firewall01 swp6.2: AFI/ Tue Feb 12 18:11:16 2019
                                                                      SAFI evpn not activated on peer              
exit01            DataVrf1080     swp7.2            firewall02        BGP session with peer firewall02 (swp7.2 vrf  Tue Feb 12 18:11:27 2019
                                                                      DataVrf1080) failed,                         
                                                                      reason: Peer not configured                  
exit01            DataVrf1081     swp6.3            firewall01        BGP session with peer firewall01 swp6.3: AFI/ Tue Feb 12 18:11:16 2019
                                                                      SAFI evpn not activated on peer              
exit01            DataVrf1081     swp7.3            firewall02        BGP session with peer firewall02 (swp7.3 vrf  Tue Feb 12 18:11:27 2019
                                                                      DataVrf1081) failed,                         
                                                                      reason: Peer not configured                  
...

When you show the output in JSON format, this same command looks like this:

cumulus@leaf01:~$ netq check bgp json
{
    "failedNodes":[
        {
            "peerHostname":"firewall01",
            "lastChanged":1549995080.0,
            "hostname":"exit01",
            "peerName":"swp6.2",
            "reason":"BGP session with peer firewall01 swp6.2: AFI/SAFI evpn not activated on peer",
            "vrf":"DataVrf1080"
        },
        {
            "peerHostname":"firewall02",
            "lastChanged":1549995449.7279999256,
            "hostname":"exit01",
            "peerName":"swp7.2",
            "reason":"BGP session with peer firewall02 (swp7.2 vrf DataVrf1080) failed, reason: Peer not configured",
            "vrf":"DataVrf1080"
        },
        {
            "peerHostname":"firewall01",
            "lastChanged":1549995080.0,
            "hostname":"exit01",
            "peerName":"swp6.3",
            "reason":"BGP session with peer firewall01 swp6.3: AFI/SAFI evpn not activated on peer",
            "vrf":"DataVrf1081"
        },
        {
            "peerHostname":"firewall02",
            "lastChanged":1549995449.7349998951,
            "hostname":"exit01",
            "peerName":"swp7.3",
            "reason":"BGP session with peer firewall02 (swp7.3 vrf DataVrf1081) failed, reason: Peer not configured",
            "vrf":"DataVrf1081"
        },
...
 
    ],
    "summary": {
        "checkedNodeCount": 25,
        "failedSessionCount": 24,
        "failedNodeCount": 3,
        "totalSessionCount": 220
    }
}

Example Using the UI

Open the full screen Switch Inventory card, select the data to export, and click Export.

Important File Locations

The primary configuration file for all Cumulus NetQ tools, netq.yml, resides in /etc/netq by default.

Log files are stored in /var/logs/ by default.

Refer to Investigate NetQ Issues for a complete listing of configuration files and logs for use in issue resolution.

Install NetQ

Overview

The complete Cumulus NetQ solution contains several components that must be installed, including the NetQ applications, the database, and the NetQ Agents. NetQ can be deployed in two arrangements:

The NetQ Agents reside on the switches and hosts being monitored in your network.

For the on-premises solution, the NetQ Agents collect and transmit data from the switches and/or hosts back to the NetQ Platform, which in turn processes and stores the data in its database. This data is then provided for display through several user interfaces.

For the cloud solution, the NetQ Agent function is exactly the same, transmitting collected data, but instead sends it to the NetQ Platform containing only the aggregation and forwarding application. This platform then transmits this data to Cumulus Networks cloud-based infrastructure for further processing and storage. This data is then provided for display through the same user interfaces as the on-premises solution. In this solution, the browser interface can be pointed to the local NetQ Cloud Platform/Appliance or directly to netq.cumulusnetworks.com.

Installation Choices

There are several choices that you must make to determine what steps you need to perform to install the NetQ solution. First and foremost, you must determine whether you intend to deploy the solution fully on your premises or if you intend to deploy the cloud solution. Secondly, you must decide whether you are going to deploy a Virtual Machine on your own hardware or use one of the Cumulus NetQ appliances. Thirdly, you also must determine whether you want to install the software on a single server or as a server cluster. Finally, if you have an existing on-premises solution and want to save your existing NetQ data, you must backup that data before installing the new software.

Choose between On-premises or Cloud Deployment

Both deployments provide secure access to data and features useful for monitoring and troubleshooting your network, and each has its benefits.

It is common to select an on-premises deployment model if you want to host all required hardware and software at your location, and you have the in-house skill set to install, configure, and maintain it—including performing data backups, acquiring and maintaining hardware and software, and integration and license management. This model is also a good choice if you want very limited or no access to the Internet from switches and hosts in your network. Some companies simply want complete control of the their network, and no outside impact.

If, however, you find that you want to host only a small server on your premises and leave the details up to Cumulus Networks, then a cloud deployment might be the right choice for you. With a cloud deployment, a small local server connects to the NetQ Cloud service over selected ports or through a proxy server. Only data aggregation and forwarding is supported. The majority of the NetQ applications are hosted and data storage is provided in the cloud. Cumulus handles the backups and maintenance of the application and storage. This model is often chosen when it is untenable to support deployment in-house or if you need the flexibility to scale quickly, while also reducing capital expenses.

Choose between a Virtual Machine or Cumulus NetQ Appliance

Both options ultimately provide the same services and features. The difference is in the implementation. When you choose to install NetQ software on your own hardware, you create and maintain a KVM or VMware VM, and the software is run from there. This requires you to scope and order an appropriate hardware server to support the NetQ requirements, but may allow you to reuse an existing server in your stock.

When you choose to purchase and install NetQ software on a Cumulus hardware appliance (either the NetQ Appliance for on-premises deployments or the NetQ Cloud Appliance for cloud deployments), the initial configuration of the server with Ubuntu OS is already done for you, and the NetQ software components are pre-loaded, saving you time during the physical deployment.

Choose between a Single Server or Server Cluster

Again, both options provide the same services and features. The biggest difference is in the number of servers to be deployed and in the continued availability of services running on those servers should hardware failures occur.

A single server is easier to set up, configure and manage, but can limit your ability to scale your network monitoring quickly. Multiple servers is a bit more complicated, but you limit potential downtime and increase availability with the master and two worker nodes supported in the NetQ 2.4.0 release.

Installation Workflow Summary

No matter how you answer the questions above, the installation workflow can be summarized as follows:

  1. Prepare server(s) and collect needed information.
  2. Use Admin UI (preferred) or NetQ CLI to install and configure your deployment and the NetQ software.
  3. Install NetQ Agents on switches and hosts.

Install NetQ Platform

The first step to installing NetQ 2.4.1 is to install the NetQ Platform on a virtual machine or NetQ Appliance. The following sections describe how to prepare for and install the platform on both on-premises and cloud deployments.

Prepare for NetQ On-premises Installation

This topic describes the preparation steps needed before installing the NetQ components on your premises. Refer to Prepare for NetQ Cloud Installation for preparations for cloud deployments.

There are three key steps in the preparation for on-premises installation:

  1. Decide whether you want to install the NetQ Platform on:

    • a virtual machine (VM) on hardware that you provide, or
    • the Cumulus NetQ Appliance.
  2. Review the VM requirements if you have chosen that option.

  3. Obtain the NetQ Platform image and setup the VM or appliance.

Prepare Your KVM VM and Obtain the NetQ Platform

The first preparation step is to verify your VM meets the following minimum hardware and software requirements to ensure the VM can operate correctly.

Virtual Machine Requirements

The NetQ Platform requires a VM with the following system resources allocated:

ResourceMinimum Requirement
ProcessorEight (8) virtual CPUs
Memory64 GB RAM
Local disk storage256 GB SSD
(Note: This must be an SSD; use of other storage options can lead to system instability and are not supported.)
Network interface speed1 Gb NIC
Hypervisor
  • VMware ESXi™ 6.5 or later (OVA image) for servers running Cumulus Linux, CentOS, Ubuntu and RedHat operating systems
  • KVM/QCOW (QEMU Copy on Write) image for servers running CentOS, Ubuntu and RedHat operating systems

Required Open Ports

You must also open the following ports on your NetQ Platform (or platforms if you are planning to deploy a server cluster).

For external connections:

PortProtocolComponent Access
8443TCPAdmin UI
443TCPNetQ UI
31980TCPNetQ Agent communication
32708TCPAPI Gateway
22TCPSSH

For internal cluster communication:

PortProtocolComponent Access
8080TCPAdmin API
5000TCPDocker registry
8472UDPFlannel port for VXLAN
6443TCPKubernetes API server
10250TCPkubelet health probe
2379TCPetcd
2380TCPetcd
7072TCPKafka JMX monitoring
9092TCPKafka client
7071TCPCassandra JMX monitoring
7000TCPCassandra cluster communication
9042TCPCassandra client
7073TCPZookeeper JMX
2888TCPZookeeper cluster communication
3888TCPZookeeper cluster communication
2181TCPZookeeper client

Port 32666 is no longer used for the NetQ UI.

The second preparation step is to follow the instructions below, based on whether you intend to deploy a single-server platform or a three-server cluster.

KVM Single-Server Deployment

Two steps are needed, one to download the NetQ Platform and one to configure the VM.

Download the KVM NetQ Platform Image

IMPORTANT: Confirm that your server hardware meets the requirements identified in Virtual Machine Requirements.

  1. On the Cumulus Downloads page, select NetQ from the Product list.

  2. Click 2.4 from the Version list, and then select 2.4.1 from the submenu.

  3. Select KVM from the HyperVisor/Platform list.

  4. Scroll down to view the image, and click Download.

Configure the KVM VM

  1. Open your hypervisor and set up your VM.

    You can use this example for reference or use your own hypervisor instructions.

    KVM Example Configuration

    This example shows the VM setup process for a system with Libvirt and KVM/QEMU installed.

    1. Confirm that the SHA256 checksum matches the one posted on the Cumulus Downloads website to ensure the image download has not been corrupted.
    $ sha256sum ./Downloads/cumulus-netq-server-2.4.1-ts-amd64-qemu.qcow2
    $ 6fff5f2ac62930799b4e8cc7811abb6840b247e2c9e76ea9ccba03f991f42424  ./Downloads/cumulus-netq-server-2.4.1-ts-amd64-qemu.qcow2
    
    1. Copy the QCOW2 image to a directory where you want to run it.

      Copy, instead of moving, the original QCOW2 image that was downloaded to avoid re-downloading it again later should you need to perform this process again.

    $ sudo mkdir /vms
    $ sudo cp ./Downloads/cumulus-netq-server-2.4.1-ts-amd64-qemu.qcow2 /vms/ts.qcow2
    
    1. Create the VM.

      For a Direct VM, where the VM uses a MACVLAN interface to sit on the host interface for its connectivity:

    $ virt-install --name=netq_ts --vcpus=8 --memory=65536 --os-type=linux --os-variant=debian7 --disk path=/vms/ts.qcow2,format=qcow2,bus=virtio,cache=none --network=type=direct,source=eth0,model=virtio --import --noautoconsole
    
     
    

    Replace the disk path value with the location where the QCOW2 image is to reside. Replace network model value (eth0 in the above example) with the name of the interface where the VM is connected to the external network.

    Or, for a Bridged VM, where the VM attaches to a bridge which has already been setup to allow for external access:
    $ virt-install --name=netq_ts --vcpus=8 --memory=65536 --os-type=linux --os-variant=debian7 \ --disk path=/vms/ts.qcow2,format=qcow2,bus=virtio,cache=none --network=bridge=br0,model=virtio --import --noautoconsole
    
     
    

    Replace network bridge value (br0 in the above example) with the name of the (pre-existing) bridge interface where the VM is connected to the external network.

    1. Watch the boot process in another terminal window.
    $ virsh console netq_ts
    
    1. From the Console of the VM, check to see which IP address Eth0 has obtained via DHCP, or alternatively set a static IP address by viewing the /etc/netplan/01-ethernet.yaml Netplan configuration file:
    # This file describes the network interfaces available on your system
    # For more information, see netplan(5).
    network:
        version: 2
        renderer: networkd
        ethernets:
            eno0:
                dhcp4: no
                addresses: [192.168.1.222/24]
                gateway4: 192.168.1.1
                nameservers:
                    addresses: [8.8.8.8,8.8.4.4]
    
     This example show that the IP address is a static address. If this is desired, exit the file without changes. If you wanted the IP address to be determined by DHCP, edit the file as follows:
    
     ```
     network:
         version: 2
         renderer: networkd
         ethernets:
             eno0:
                 dhcp4: yes
     ```
    
     Apply the settings.
    
     ```
     $ sudo netplan apply
     ```
    
  2. Verify the platform is ready for installation. Fix any errors indicated before installing the NetQ software.

    cumulus@<hostname>:~$ sudo opta-check
    
  3. Run the Bootstrap CLI on the platform for the interface you defined above (eth0 or eth1 for example). This example uses the eth0 interface.

    cumulus@<hostname>:~$ netq bootstrap master interface eth0 tarball /mnt/installables/netq-bootstrap-2.4.1.tgz
    

    Allow about five minutes for this to complete, and only then continue to the next step.

    If this step fails for any reason, you can run netq bootstrap reset and then try again.

You are now ready to install the Cumulus NetQ software. Refer to Install NetQ Using the Admin UI (recommended) or Install NetQ Using the CLI.

KVM Three-Server Cluster

To prepare a three-server cluster is similar to preparing a single server configuration. For the master server, follow the instructions for the single server, then continue here:

  1. Copy the file you downloaded for the single server to the other two servers.

  2. On each worker node, open your hypervisor and setup the VM in the same manner as for the single server.

    Make a note of the private IP addresses you assign to the master and two worker nodes. They are needed for the installation steps.

  3. Verify the server is ready for installation. Fix any errors indicated before installing the NetQ software.

    cumulus@<hostname>:~$ sudo opta-check
    
  4. Run the Bootstrap CLI on each worker node for the interface you defined above (eth0 or eth1 for example). This example uses the eth0 interface.

    cumulus@<hostname>:~$ netq bootstrap worker interface eth0 tarball /mnt/installables/netq-bootstrap-2.4.1.tgz
    

    Allow about five minutes for this to complete, and only then continue to the next step.

    If this step fails for any reason, run netq bootstrap reset and then try again.

You are now ready to install the Cumulus NetQ software. Refer to Install NetQ Using the Admin UI (recommended) or Install NetQ Using the CLI.

Prepare Your VMware VM and Obtain NetQ Platform

The first preparation step is to verify your VM meets the following minimum hardware and software requirements to ensure the VM can operate correctly.

Virtual Machine Requirements

The NetQ Platform requires a VM with the following system resources allocated:

ResourceMinimum Requirement
ProcessorEight (8) virtual CPUs
Memory64 GB RAM
Local disk storage256 GB SSD
(Note: This must be an SSD; use of other storage options can lead to system instability and are not supported.)
Network interface speed1 Gb NIC
Hypervisor
  • VMware ESXi™ 6.5 or later (OVA image) for servers running Cumulus Linux, CentOS, Ubuntu and RedHat operating systems
  • KVM/QCOW (QEMU Copy on Write) image for servers running CentOS, Ubuntu and RedHat operating systems

Required Open Ports

You must also open the following ports on your NetQ Platform (or platforms if you are planning to deploy a server cluster).

For external connections:

PortProtocolComponent Access
8443TCPAdmin UI
443TCPNetQ UI
31980TCPNetQ Agent communication
32708TCPAPI Gateway
22TCPSSH

For internal cluster communication:

PortProtocolComponent Access
8080TCPAdmin API
5000TCPDocker registry
8472UDPFlannel port for VXLAN
6443TCPKubernetes API server
10250TCPkubelet health probe
2379TCPetcd
2380TCPetcd
7072TCPKafka JMX monitoring
9092TCPKafka client
7071TCPCassandra JMX monitoring
7000TCPCassandra cluster communication
9042TCPCassandra client
7073TCPZookeeper JMX
2888TCPZookeeper cluster communication
3888TCPZookeeper cluster communication
2181TCPZookeeper client

Port 32666 is no longer used for the NetQ UI.

The second preparation step is to follow the instructions below, based on whether you intend to deploy a single-server platform or a three-server cluster.

VMware Single-Server Arrangement

Two steps are needed, one to download the NetQ Platform and one to configure the VM.

Download the VMware NetQ Platform Image

IMPORTANT: Confirm that your server hardware meets the requirements identified in Virtual Machine Requirements.

  1. On the Cumulus Downloads page, select NetQ from the Product list.

  2. Click 2.4 from the Version list, and then select 2.4.1 from the submenu.

  3. Select VMware from the HyperVisor/Platform list.

  4. Scroll down to view the image, and click Download.

Configure the VMware VM

  1. Open your hypervisor and set up your VM.

    You can use this examples for reference or use your own hypervisor instructions.

    VMware Example Configuration

    This example shows the VM setup process using an OVA file with VMware ESXi.

    1. Enter the address of the hardware in your browser.

    2. Log in to VMware using credentials with root access.

    3. Click Storage in the Navigator to verify you have an SSD installed.

    4. Click Create/Register VM at the top of the right pane.

    5. Select Deploy a virtual machine from an OVF or OVA file, and click Next.

    6. Provide a name for the VM, for example Cumulus NetQ.

    7. Drag and drop the NetQ Platform image file you downloaded in Step 2 above.

    8. Click Next.

    9. Select the storage type and data store for the image to use, then click Next. In this example, only one is available.

    10. Accept the default deployment options or modify them according to your network needs. Click Next when you are finished.

    11. Review the configuration summary. Click Back to change any of the settings, or click Finish to continue with the creation of the VM.

      The progress of the request is shown in the Recent Tasks window at the bottom of the application. This may take some time, so continue with your other work until the upload finishes.

    12. Once completed, view the full details of the VM and hardware.

  2. Verify the platform is ready for installation. Fix any errors indicated before installing the NetQ software.

    cumulus@<hostname>:~$ sudo opta-check
    
  3. Run the Bootstrap CLI on the platform for the interface you defined above (eth0 or eth1 for example). This example uses the eth0 interface.

    cumulus@<hostname>:~$ netq bootstrap master interface eth0 tarball /mnt/installables/netq-bootstrap-2.4.1.tgz
    

    Allow about five minutes for this to complete, and only then continue to the next step.

    If this step fails for any reason, you can run netq bootstrap reset and then try again.

You are now ready to install the Cumulus NetQ software. Refer to Install NetQ Using the Admin UI (recommended) or Install NetQ Using the CLI.

VMware Three-Server Cluster

To prepare a three-server cluster is similar to preparing a single server configuration. For the master server, follow the instructions for the single server, then continue here:

  1. Copy the file you downloaded for the single server to the other two servers.

  2. On each worker node, open your hypervisor and setup the VM in the same manner as for the single server.

    Make a note of the private IP addresses you assign to the master and two worker nodes. They are needed for the installation steps.

  3. Verify the platform is ready for installation. Fix any errors indicated before installing the NetQ software.

    cumulus@<hostname>:~$ sudo opta-check
    
  4. Run the Bootstrap CLI on each worker node for the interface you defined above (eth0 or eth1 for example). This example uses the eth0 interface.

    cumulus@<hostname>:~$ netq bootstrap worker interface eth0 tarball /mnt/installables/netq-bootstrap-2.4.1.tgz
    

    Allow about five minutes for this to complete, and only then continue to the next step.

    If this step fails for any reason, you can run netq bootstrap reset and then try again.

You are now ready to install the Cumulus NetQ software. Refer to Install NetQ Using the Admin UI (recommended) or Install NetQ Using the CLI.

Prepare Your Cumulus NetQ Appliance

Follow the preparation instructions below, based on whether you intend to deploy a single NetQ Appliance or three NetQ Appliances as a cluster.

Single NetQ Appliance

To prepare your single NetQ Appliance:

Inside the box that was shipped to you, you’ll find:

For more detail about hardware specifications (including LED layouts and FRUs like the power supply or fans, and accessories like included cables) or safety and environmental information, refer to the user manual and quick reference guide.

Install the Appliance

After you unbox the appliance:

  1. Mount the appliance in the rack.

  2. Connect it to power following the procedures described in your appliance’s user manual.

  3. Connect the Ethernet cable to the 1G management port (eth0).

  4. Power on the appliance.

    NetQ Appliance connections

    NetQ Appliance connections

If your network runs DHCP, you can configure Cumulus NetQ over the network. If DHCP is not enabled, then you configure the appliance using the console cable provided.

Configure the Password, Hostname and IP Address

Change the password using the passwd command:

$ passwd 
Changing password for <user>.
(current) UNIX password: 
Enter new UNIX password: 
Retype new UNIX password: 
passwd: password updated successfully

By default, DHCP is used to acquire the hostname and IP address. However, you can manually specify the hostname with the following command:

sudo hostnamectl set-hostname <newHostNameHere>

You can also configure these items using the Ubuntu Netplan configuration tool. For example, to set your network interface eth0 to a static IP address of 192.168.1.222 with gateway 192.168.1.1 and DNS server as 8.8.8.8 and 8.8.4.4:

Edit the /etc/netplan/01-ethernet.yaml Netplan configuration file:

```
# This file describes the network interfaces available on your system
# For more information, see netplan(5).
network:
    version: 2
    renderer: networkd
    ethernets:
        eno0:
            dhcp4: no
            addresses: [192.168.1.222/24]
            gateway4: 192.168.1.1
            nameservers:
                addresses: [8.8.8.8,8.8.4.4]
```

Apply the settings.

$ sudo netplan apply

If you have changed the IP address or hostname of the NetQ Appliance, you need to re-register this address with the Kubernetes containers before you can continue.

  1. Reset all Kubernetes administrative settings. Run the command twice to make sure all directories and files have been reset.

    cumulus@netq-platform:~$ sudo kubeadm reset -f
    
  2. Remove the Kubernetes configuration.

    cumulus@netq-platform:~$ sudo rm /home/cumulus/.kube/config
    
  3. Reset the NetQ Platform install daemon.

    cumulus@netq-platform:~$ sudo systemctl reset-failed
    
  4. Reset the Kubernetes service.

    cumulus@netq-platform:~$ sudo systemctl restart cts-kubectl-config
    

    Note: Allow 15 minutes for the prompt to return.

Verify NetQ Software and Appliance Readiness

Now that the appliance is up and running, verify that the software is available and the appliance is ready for installation.

  1. Verify that the needed packages are present and of the correct release, version 2.4.1 and update 26 or later.

    cumulus@<hostname>:~$ dpkg -l | grep netq
    

    For Ubuntu 18.04, you should see:

    ii  netq-agent   2.4.1-ub18.04u26~1581351889.c5ec3e5 amd64   Cumulus NetQ Telemetry Agent for Ubuntu
    ii  netq-apps    2.4.1-ub18.04u26~1581351889.c5ec3e5 amd64   Cumulus NetQ Fabric Validation Application for Ubuntu
    

    For Ubuntu 16.04, you should see:

    ii  netq-agent   2.4.1-ub16.04u26~1581350451.c5ec3e5 amd64   Cumulus NetQ Telemetry Agent for Ubuntu
    ii  netq-apps    2.4.1-ub16.04u26~1581350451.c5ec3e5 amd64   Cumulus NetQ Fabric Validation Application for Ubuntu
    
  2. Verify the installation images are present and of the correct release, version 2.4.1.

    cumulus@<hostname>:~$ cd /mnt/installables/
    cumulus@<hostname>:/mnt/installables$ ls
    NetQ-2.4.1.tgz  netq-bootstrap-2.4.1.tgz
    
  3. Run the following commands.

sudo systemctl disable apt-{daily,daily-upgrade}.{service,timer}
sudo systemctl stop apt-{daily,daily-upgrade}.{service,timer}
sudo systemctl disable motd-news.{service,timer}
sudo systemctl stop motd-news.{service,timer}
  1. Verify the appliance is ready for installation. Fix any errors indicated before installing the NetQ software.

    cumulus@<hostname>:~$ sudo opta-check
    
  2. Run the Bootstrap CLI on the appliance for the interface you defined above (eth0 or eth1 for example). This example uses the eth0 interface.

    cumulus@<hostname>:~$ netq bootstrap master interface eth0 tarball /mnt/installables/netq-bootstrap-2.4.1.tgz
    

    Allow about five minutes for this to complete, and only then continue to the next step.

    If this step fails for any reason, you can run netq bootstrap reset and then try again.

You are now ready to install the Cumulus NetQ software. Refer to Install NetQ Using the Admin UI (recommended) or Install NetQ Using the CLI.

Three-Appliance Cluster

To prepare a three-appliance cluster is similar to preparing a single server. For the master appliance, follow the instructions for a single appliance, then return here to configure the worker appliances.

  1. Install the second NetQ Appliance using the same steps as a single NetQ Appliance.

  2. Configure the IP address, hostname, and password using the same steps as a single NetQ Appliance.

    Make a note of the private IP addresses you assign to the master and two worker nodes. They are needed for the installation steps.

  3. Copy the netq-bootstrap-2.4.1.tgz and NetQ-2.4.1.tgz files, downloaded for the single NetQ Appliance, to the /mnt/installables/ directory on the second NetQ Appliance and run the systemctl commands.

  4. Verify that the needed files are present and of the correct release.

  5. Verify the platform is ready for installation. Fix any errors indicated before installing the NetQ software.

    cumulus@<hostname>:~$ sudo opta-check
    
  6. Run the Bootstrap CLI on the appliance for the interface you defined above (eth0 or eth1 for example). This example uses the eth0 interface.

    cumulus@<hostname>:~$ netq bootstrap worker interface eth0 tarball /mnt/installables/netq-bootstrap-2.4.1.tgz
    

    Allow about five minutes for this to complete, and only then continue to the next step.

    If this step fails for any reason, you can run netq bootstrap reset and then try again.

  7. Repeat these steps for the third NetQ Appliance.

You are now ready to install the Cumulus NetQ software. Refer to Install NetQ Using the Admin UI (recommended) or Install NetQ Using the CLI.

Prepare for NetQ Cloud Installation

This topic describes the preparation steps needed before installing NetQ in a cloud deployment. Refer to Prepare for NetQ On-premises Installation for preparations for on-premises deployments.

There are three key steps in the preparation for cloud installation:

  1. Decide whether you want to install the NetQ Platform on:

    • a virtual machine (VM) on hardware that you provide, or
    • the Cumulus NetQ Cloud Appliance.
  2. Review the VM requirements if you have chosen that option.

  3. Obtain the NetQ Platform image and and setup the VM or appliance.

Prepare Your KVM VM and Obtain the NetQ Platform

The first preparation step is to verify your VM meets the following minimum hardware and software requirements to ensure the VM can operate correctly.

Virtual Machine Requirements

The NetQ Cloud Platform requires a VM with the following system resources allocated:

ResourceMinimum Requirement
ProcessorFour (4) virtual CPUs
Memory8 GB RAM
Local disk storage32 GB
Network interface speed1 Gb NIC
Hypervisor
  • VMware ESXi™ 6.5 or later (OVA image) for servers running Cumulus Linux, CentOS, Ubuntu and RedHat operating systems
  • KVM/QCOW (QEMU Copy on Write) image for servers running CentOS, Ubuntu and RedHat operating systems

Required Open Ports

You must also open the following ports on your NetQ Cloud Platform (or platforms if you are planning to deploy a server cluster).

For external connections:

PortProtocolComponent Access
8443TCPAdmin UI
443TCPNetQ UI
31980TCPNetQ Agent communication
32708TCPAPI Gateway
22TCPSSH

For internal cluster communication:

PortProtocolComponent Access
8080TCPAdmin API
5000TCPDocker registry
8472UDPFlannel port for VXLAN
6443TCPKubernetes API server
10250TCPKubelet health probe
2379TCPetcd
2380TCPetcd
7072TCPKafka JMX monitoring
9092TCPKafka client
7071TCPCassandra JMX monitoring
7000TCPCassandra cluster communication
9042TCPCassandra client
7073TCPZookeeper JMX monitoring
2888TCPZookeeper cluster communication
3888TCPZookeeper cluster communication
2181TCPZookeeper client

Port 32666 is no longer used for the NetQ UI.

The second preparation step is to follow the instructions below, based on whether you intend to deploy a single-server platform or a three-server cluster.

KVM Single-Server Arrangement

Two steps are needed, one to download the NetQ Platform and one to configure the VM.

Download the KVM NetQ Platform Image

IMPORTANT: Confirm that your server hardware meets the requirements identified in Virtual Machine Requirements.

  1. On the Cumulus Downloads page, select NetQ from the Product list.

  2. Click 2.4 from the Version list, and then select 2.4.1 from the submenu.

  3. Select KVM (Cloud) from the Hypervisor/Platform list.

  4. Scroll down to view the image, and click Download.

Configure the KVM VM

  1. Open your hypervisor and set up your VM.

    You can use this example for reference or use your own hypervisor instructions.

    KVM Example Configuration This example shows the VM setup process for a system with Libvirt and KVM/QEMU installed.
    1. Confirm that the SHA256 checksum matches the one posted on the Cumulus Downloads website to ensure the image download has not been corrupted.
    $ sha256sum ./Downloads/cumulus-netq-server-2.4.0-ts-amd64-qemu.qcow2
    $ 6fff5f2ac62930799b4e8cc7811abb6840b247e2c9e76ea9ccba03f991f42424  ./Downloads/cumulus-netq-server-2.4.0-ts-amd64-qemu.qcow2
    
    1. Copy the QCOW2 image to a directory where you want to run it.

      Copy, instead of moving, the original QCOW2 image that was downloaded to avoid re-downloading it again later should you need to perform this process again.

    $ sudo mkdir /vms
    $ sudo cp ./Downloads/cumulus-netq-server-2.4.0-ts-amd64-qemu.qcow2 /vms/ts.qcow2
    
    1. Create the VM.

      For a Direct VM, where the VM uses a MACVLAN interface to sit on the host interface for its connectivity:

      $ virt-install --name=netq_ts --vcpus=8 --memory=65536 --os-type=linux --os-variant=debian7 --disk path=/vms/ts.qcow2,format=qcow2,bus=virtio,cache=none --network=type=direct,source=eth0,model=virtio -import --noautoconsole
      

      Replace the disk path value with the location where the QCOW2 image is to reside. Replace network model value (eth0 in the above example) with the name of the interface where the VM is connected to the external network.

      Or, for a Bridged VM, where the VM attaches to a bridge which has already been setup to allow for external access:

      $ virt-install --name=netq_ts --vcpus=8 --memory=65536 --os-type=linux --os-variant=debian7 --disk path=/vms/ts.qcow2,format=qcow2,bus=virtio,cache=none --network=bridge=br0,model=virtio --import --noautoconsole
      

      Replace network bridge value (br0 in the above example) with the name of the (pre-existing) bridge interface where the VM is connected to the external network.

    2. Watch the boot process in another terminal window.

    $ virsh console netq_ts
    
    1. From the Console of the VM, check to see which IP address Eth0 has obtained via DHCP, or alternatively set a static IP address by viewing the /etc/netplan/01-ethernet.yaml Netplan configuration file:
    # This file describes the network interfaces available on your system
    # For more information, see netplan(5).
    network:
        version: 2
        renderer: networkd
        ethernets:
            eno0:
                dhcp4: no
                addresses: [192.168.1.222/24]
                gateway4: 192.168.1.1
                nameservers:
                    addresses: [8.8.8.8,8.8.4.4]
    
     This example show that the IP address is a static address. If this is desired, exit the file without changes. If you wanted the IP address to be determined by DHCP, edit the file as follows:
    
     ```
     network:
         version: 2
         renderer: networkd
         ethernets:
             eno0:
                 dhcp4: yes
     ```
    
     Apply the settings.
    
     ```
     $ sudo netplan apply
     ```
    
  2. Verify the platform is ready for installation. Fix any errors indicated before installing the NetQ software.

    cumulus@<hostname>:~$ sudo opta-check-cloud
    
  3. Run the Bootstrap CLI on the platform for the interface you defined above (eth0 or eth1 for example). This example uses the eth0 interface.

    cumulus@<hostname>:~$ netq bootstrap master interface eth0 tarball /mnt/installables/netq-bootstrap-2.4.1.tgz
    

    Allow about five minutes for this to complete, and only then continue to the next step.

    If this step fails for any reason, you can run netq bootstrap reset and then try again.

You are now ready to install the Cumulus NetQ software. Refer to Install NetQ Using the Admin UI (recommended) or Install NetQ Using the CLI.

KVM Three-Server Cluster

To prepare a three-server cluster is similar to preparing a single server configuration. For the master server, follow the instructions for the single server, then continue here:

  1. Copy the file you downloaded for the single server to the other two servers.

  2. On each worker node, open your hypervisor and setup the VM in the same manner as for the single server.

    Make a note of the private IP addresses you assign to the master and two worker nodes. They are needed for the installation steps.

  3. Verify the platform is ready for installation. Fix any errors indicated before installing the NetQ software.

    cumulus@<hostname>:~$ sudo opta-check-cloud
    
  4. Run the Bootstrap CLI on each worker node for the interface you defined above (eth0 or eth1 for example). This example uses the eth0 interface.

    cumulus@<hostname>:~$ netq bootstrap worker interface eth0 tarball /mnt/installables/netq-bootstrap-2.4.1.tgz
    

    Allow about five minutes for this to complete, and only then continue to the next step.

    If this step fails for any reason, you can run netq bootstrap reset and then try again.

You are now ready to install the Cumulus NetQ software. Refer to Install NetQ Using the Admin UI (recommended) or Install NetQ Using the CLI.

Prepare Your VMware VM and Obtain NetQ Platform

The first preparation step is to verify your VM meets the following minimum hardware and software requirements to ensure the VM can operate correctly.

Virtual Machine Requirements

The NetQ Cloud Platform requires a VM with the following system resources allocated:

ResourceMinimum Requirement
ProcessorFour (4) virtual CPUs
Memory8 GB RAM
Local disk storage32 GB
Network interface speed1 Gb NIC
Hypervisor
  • VMware ESXi™ 6.5 or later (OVA image) for servers running Cumulus Linux, CentOS, Ubuntu and RedHat operating systems
  • KVM/QCOW (QEMU Copy on Write) image for servers running CentOS, Ubuntu and RedHat operating systems

Required Open Ports

You must also open the following ports on your NetQ Cloud Platform (or platforms if you are planning to deploy a server cluster).

For external connections:

PortProtocolComponent Access
8443TCPAdmin UI
443TCPNetQ UI
31980TCPNetQ Agent communication
32708TCPAPI Gateway
22TCPSSH

For internal cluster communication:

PortProtocolComponent Access
8080TCPAdmin API
5000TCPDocker registry
8472UDPFlannel port for VXLAN
6443TCPKubernetes API server
10250TCPkubelet health probe
2379TCPetcd
2380TCPetcd
7072TCPKafka JMX monitoring
9092TCPKafka client
7071TCPCassandra JMX monitoring
7000TCPCassandra cluster communication
9042TCPCassandra client
7073TCPZookeeper JMX
2888TCPZookeeper cluster communication
3888TCPZookeeper cluster communication
2181TCPZookeeper client

Port 32666 is no longer used for the NetQ UI.

The second preparation step is to follow the instructions below, based on whether you intend to deploy a single-server platform or a three-server cluster.

VMware Single-Server Arrangement

Two steps are needed, one to download the NetQ Platform and one to configure the VM.

Download the VMware NetQ Platform Image

IMPORTANT: Confirm that your server hardware meets the requirements identified in Virtual Machine Requirements.

  1. On the Cumulus Downloads page, select NetQ from the Product list.

  2. Click 2.4 from the Version list, and then select 2.4.1 from the submenu.

  3. Select VMware (Cloud) from the Hypervisor/Platform list.

  4. Scroll down to view the image, and click Download.

Configure the VMware VM

  1. Open your hypervisor and set up your VM.

    You can use this examples for reference or use your own hypervisor instructions.

    VMware Example Configuration

    This example shows the VM setup process using an OVA file with VMware ESXi.

    1. Enter the address of the hardware in your browser.

    2. Log in to VMware using credentials with root access.

    3. Click Storage in the Navigator to verify you have an SSD installed.

    4. Click Create/Register VM at the top of the right pane.

    5. Select Deploy a virtual machine from an OVF or OVA file, and click Next.

    6. Provide a name for the VM, for example Cumulus NetQ.

    7. Drag and drop the NetQ Platform image file you downloaded in Step 2 above.

    8. Click Next.

    9. Select the storage type and data store for the image to use, then click Next. In this example, only one is available.

    10. Accept the default deployment options or modify them according to your network needs. Click Next when you are finished.

    11. Review the configuration summary. Click Back to change any of the settings, or click Finish to continue with the creation of the VM.

      The progress of the request is shown in the Recent Tasks window at the bottom of the application. This may take some time, so continue with your other work until the upload finishes.

    12. Once completed, view the full details of the VM and hardware.

  2. Verify the platform is ready for installation. Fix any errors indicated before installing the NetQ software.

    cumulus@<hostname>:~$ sudo opta-check-cloud
    
  3. Run the Bootstrap CLI on the platform for the interface you defined above (eth0 or eth1 for example). This example uses the eth0 interface.

    cumulus@<hostname>:~$ netq bootstrap master interface eth0 tarball /mnt/installables/netq-bootstrap-2.4.1.tgz
    

    Allow about five minutes for this to complete, and only then continue to the next step.

    If this step fails for any reason, you can run netq bootstrap reset and then try again.

You are now ready to install the Cumulus NetQ software. Refer to Install NetQ Using the Admin UI (recommended) or Install NetQ Using the CLI.

VMware Three-Server Cluster

To prepare a three-server cluster is similar to preparing a single server configuration. For the master server, follow the instructions for the single server, then continue here:

  1. Copy the file you downloaded for the single server to the other two servers.

  2. On each worker node, open your hypervisor and setup the VM in the same manner as for the single server.

    Make a note of the private IP addresses you assign to the master and two worker nodes. They are needed for the installation steps.

  3. Verify the platform is ready for installation. Fix any errors indicated before installing the NetQ software.

    cumulus@<hostname>:~$ sudo opta-check-cloud
    
  4. Run the Bootstrap CLI on each worker node for the interface you defined above (eth0 or eth1 for example). This example uses the eth0 interface.

    cumulus@<hostname>:~$ netq bootstrap worker interface eth0 tarball /mnt/installables/netq-bootstrap-2.4.1.tgz
    

    Allow about five minutes for this to complete, and only then continue to the next step.

    If this step fails for any reason, you can run netq bootstrap reset and then try again.

You are now ready to install the Cumulus NetQ software. Refer to Install NetQ Using the Admin UI (recommended) or Install NetQ Using the CLI.

Prepare Your Cumulus NetQ Cloud Appliance

Follow the preparation instructions below, based on whether you intend to deploy a single NetQ Cloud Appliance or a three NetQ Cloud Appliances as a cluster.

Single NetQ Cloud Appliance

To prepare your single NetQ Appliance:

Inside the box that was shipped to you, you’ll find:

If you’re looking for hardware specifications (including LED layouts and FRUs like the power supply or fans and accessories like included cables) or safety and environmental information, check out the appliance’s user manual.

Install the Appliance

After you unbox the appliance:

  1. Mount the appliance in the rack.

  2. Connect it to power following the procedures described in your appliance’s user manual.

  3. Connect the Ethernet cable to the 1G management port (eth0).

  4. Power on the appliance.

    NetQ Appliance connections

    NetQ Appliance connections

If your network runs DHCP, you can configure Cumulus NetQ over the network. If DHCP is not enabled, then you configure the appliance using the console cable provided.

Configure the Password, Hostname and IP Address

Change the password using the passwd command:

$ passwd 
Changing password for <user>.
(current) UNIX password: 
Enter new UNIX password: 
Retype new UNIX password: 
passwd: password updated successfully

By default, DHCP is used to acquire the hostname and IP address. However, you can manually specify the hostname with the following command:

cumulus@<hostname>:~$ sudo hostnamectl set-hostname <newHostNameHere>

You can also configure these items using the Ubuntu Netplan configuration tool. For example, to set your network interface eth0 to a static IP address of 192.168.1.222 with gateway 192.168.1.1 and DNS server as 8.8.8.8 and 8.8.4.4:

Edit the /etc/netplan/01-ethernet.yaml Netplan configuration file:

# This file describes the network interfaces available on your system
# For more information, see netplan(5).
network:
    version: 2
    renderer: networkd
    ethernets:
        eno0:
            dhcp4: no
            addresses: [192.168.1.222/24]
            gateway4: 192.168.1.1
            nameservers:
                addresses: [8.8.8.8,8.8.4.4]

Apply the settings.

cumulus@<hostname>:~$ sudo netplan apply

If you changed the IP address or interface of the appliance to something other than what it was assigned previously, you must inform NetQ of the change.

If you changed the IP address, but kept the interface the same (for example, eth0), re-run the netq install opta interface command using your config-key:

cumulus@netq-appliance:~$ netq install opta interface eth0 tarball NetQ-2.3.x-opta.tgz config-key "CNKaDBIjZ3buZhV2Mi5uZXRxZGV2LmN1bXVsdXNuZXw3b3Jrcy5jb20YuwM="

If you changed the interface (for example, eth0 to eth1), run the netq install opta interface command with the new interface and your config-key:

cumulus@netq-appliance:~$ netq install opta interface eth1 tarball NetQ-2.3.x-opta.tgz config-key "CNKaDBIjZ3buZhV2Mi5uZXRxZGV2LmN1bXVsdXNuZXw3b3Jrcy5jb20YuwM="

Verify NetQ Software and Appliance Readiness

Now that the appliance is up and running, verify that the software is available and the appliance is ready for installation.

  1. Verify that the needed packages are present and of the correct release, version 2.4.1 and update 26 or later.

    cumulus@<hostname>:~$ dpkg -l | grep netq
    

    For Ubuntu 18.04, you should see:

    ii  netq-agent   2.4.1-ub18.04u26~1581351889.c5ec3e5 amd64   Cumulus NetQ Telemetry Agent for Ubuntu
    ii  netq-apps    2.4.1-ub18.04u26~1581351889.c5ec3e5 amd64   Cumulus NetQ Fabric Validation Application for Ubuntu
    

    For Ubuntu 16.04, you should see:

    ii  netq-agent   2.4.1-ub16.04u26~1581350451.c5ec3e5 amd64   Cumulus NetQ Telemetry Agent for Ubuntu
    ii  netq-apps    2.4.1-ub16.04u26~1581350451.c5ec3e5 amd64   Cumulus NetQ Fabric Validation Application for Ubuntu
    
  2. Verify the installation images are present and of the correct release, version 2.4.1.

    cumulus@<hostname>:~$ cd /mnt/installables/
    cumulus@<hostname>:/mnt/installables$ ls
    NetQ-2.4.1-opta.tgz  netq-bootstrap-2.4.1.tgz
    
  3. Run the following commands.

    cumulus@<hostname>:~$ sudo systemctl disable apt-{daily,daily-upgrade}.{service,timer}
    cumulus@<hostname>:~$ sudo systemctl stop apt-{daily,daily-upgrade}.{service,timer}
    cumulus@<hostname>:~$ sudo systemctl disable motd-news.{service,timer}
    cumulus@<hostname>:~$ sudo systemctl stop motd-news.{service,timer}
    
  4. Verify the appliance is ready for installation. Fix any errors indicated before installing the NetQ software.

    cumulus@<hostname>:~$ sudo opta-check-cloud
    
  5. Run the Bootstrap CLI on the appliance for the interface you defined above (eth0 or eth1 for example). This example uses the eth0 interface.

    cumulus@<hostname>:~$ netq bootstrap master interface eth0 tarball /mnt/installables/netq-bootstrap-2.4.1.tgz
    

    Allow about five minutes for this to complete, and only then continue to the next step.

    If this step fails for any reason, you can run netq bootstrap reset and then try again.

You are now ready to install the Cumulus NetQ software. Refer to Install NetQ Using the Admin UI (recommended) or Install NetQ Using the CLI.

Three-Appliance Cluster

To prepare a three-appliance cluster is similar to preparing a single server. For the master appliance, follow the instructions for a single appliance, then return here to configure the worker appliances.

  1. Install the second NetQ Cloud Appliance using the same steps as a single NetQ Appliance.

  2. Configure the IP address, hostname, and password using the same steps as a single NetQ Appliance.

    Make a note of the private IP addresses you assign to the master and two worker nodes. They are needed for the installation steps.

  3. Copy the netq-bootstrap-2.4.1.tgz and NetQ-2.4.1-opta.tgz files downloaded for the single NetQ Appliance to this second NetQ Appliance and verify the correct files are present.

  4. Run the systemctl commands.

  5. Verify the platform is ready for installation. Fix any errors indicated before installing the NetQ software.

    cumulus@<hostname>:~$ sudo opta-check-cloud
    
  6. Run the Bootstrap CLI on the worker appliance pointing to the master appliance IP address

    cumulus@<hostname>:~$ netq bootstrap worker interface eth0 tarball /mnt/installables/netq-bootstrap-2.4.1.tgz
    

    Allow about 2-3 minutes for this to complete, and only then continue to the next step.

    If this step fails for any reason, you can run netq bootstrap reset and then try again.

  7. Repeat these steps for the third NetQ Appliance.

You are now ready to install the Cumulus NetQ software. Refer to Install NetQ Using the Admin UI (recommended) or Install NetQ Using the CLI.

Prepare Your Existing NetQ Appliances for a NetQ 2.4 Deployment

This topic describes how to prepare a NetQ 2.3.x or earlier NetQ Appliance before installing NetQ 2.4.x. The steps are the same for both the on-premises and cloud appliances. The only difference is the software you download for each platform. On completion of the steps included here, you will be ready to perform a fresh installation of NetQ 2.4.x.

To prepare your appliance:

Log in to your appliance.

Verify that your appliance is a supported hardware model.
For on-premises solutions using the NetQ Appliance, optionally back up your NetQ data.
  1. Run the backup script to create a backup file in /opt/<backup-directory>.

    Be sure to replace the backup-directory option with the name of the directory you want to use for the backup file. This location must be somewhere that is off of the appliance to avoid it being overwritten during these preparation steps.

cumulus@<netq-appliance>:~$ ./backuprestore.sh --backup --localdir /opt/<backup-directory>
  1. Verify the backup file has been created.
cumulus@<netq-appliance>:~$ cd /opt/<backup-directory>
cumulus@<netq-appliance>:~/opt/<backup-directory># ls
netq_master_snapshot_2020-01-09_07_24_50_UTC.tar.gz
Install Ubuntu 18.04 LTS.

Use the instructions here.

Note these tips:

  • Ignore the instructions for MAAS.

  • Ubuntu OS should be installed on the SSD disk. Select Micron SSD with ~900 GB at step#9 in the aforementioned instructions.

  • Set the default username to cumulus and password to CumulusLinux! while installing Ubuntu 18.04.

  • When prompted, select Install SSH server.

Configure networking.

Ubuntu uses Netplan for network configuration. You can give your appliance an IP address using DHCP or a static address.

Configure an IP address allocation using DHCP

  • Create and/or edit the /etc/netplan/01-ethernet.yaml Netplan configuration file.
# This file describes the network interfaces available on your system
# For more information, see netplan(5).
network:
    version: 2
    renderer: networkd
    ethernets:
        eno1:
            dhcp4: yes
  • Apply the settings.
$ sudo netplan apply

Configure a static IP address

  • Create and/or edit the  /etc/netplan/01-ethernet.yaml Netplan configuration file.

    In this example the interface, eno1, is given a static IP address of 192.168.1.222 with a gateway at 192.168.1.1 and DNS server at 8.8.8.8 and 8.8.4.4.

# This file describes the network interfaces available on your system
# For more information, see netplan(5).
network:
    version: 2
    renderer: networkd
    ethernets:
        eno1:
            dhcp4: no
            addresses: [192.168.1.222/24]
            gateway4: 192.168.1.1
            nameservers:
                addresses: [8.8.8.8,8.8.4.4
  • Apply the settings.
$ sudo netplan apply
Update the Ubuntu repository.
  1. Reference and update the local apt repository.
root@ubuntu:~# wget -O- https://apps3.cumulusnetworks.com/setup/cumulus-apps-deb.pubkey | apt-key add -
  1. Add the Ubuntu repository.
Ubuntu 16.04

Create the file /etc/apt/sources.list.d/cumulus-host-ubuntu-xenial.list and add the following line:

root@ubuntu:~# vi /etc/apt/sources.list.d/cumulus-apps-deb-xenial.list
...
deb [arch=amd64] https://apps3.cumulusnetworks.com/repos/deb xenial netq-latest
...
Ubuntu 18.04 Create the file `/etc/apt/sources.list.d/cumulus-host-ubuntu-bionic.list` and add the following line:
root@ubuntu:~# vi /etc/apt/sources.list.d/cumulus-apps-deb-bionic.list
...
deb [arch=amd64] https://apps3.cumulusnetworks.com/repos/deb bionic netq-latest
...

The use of netq-latest in this example means that a get to the repository always retrieves the latest version of NetQ, even in the case where a major version update has been made. If you want to keep the repository on a specific version - such as netq-2.2 - use that instead.

Install Python. ``` root@ubuntu:~# apt-get update root@ubuntu:~# apt-get install python python2.7 python-apt ```
Obtain the latest NetQ Agent and CLI package. ``` root@ubuntu:~# apt-get update root@ubuntu:~# apt-get install netq-agent netq-apps ```
Download the bootstrap and NetQ installation tarballs.

Download the software from the Cumulus Downloads page.

  1. Select NetQ from the Product list.

  2. Select 2.4 from the Version list, and then select 2.4.1 from the submenu.

  3. Select Bootstrap from the Hypervisor/Platform list. Note that the bootstrap file is the same for both appliances.

  4. Scroll down and click Download.

  5. Select Appliance for the NetQ Appliance or Appliance (Cloud) for the NetQ Cloud Appliance from the Hypervisor/Platform list.

    Make sure you select the right install choice based on whether you are preparing the on-premises or cloud version of the appliance.

  6. Scroll down and click Download.

  7. Copy these two files, netq-bootstrap-2.4.1.tgz and NetQ-2.4.1.tgz (on-premises) or NetQ-2.4.1-opta.tgz (cloud), to the /mnt/installables/ directory on the appliance.

  8. Verify that the needed files are present and of the correct release. This example shows on-premises files. The only difference for cloud files is that it should list NetQ-2.4.1-opta.tgz instead of NetQ-2.4.1.tgz.

    cumulus@<hostname>:~$ dpkg -l | grep netq
    ii  netq-agent   2.4.1-ub18.04u26~1581351889.c5ec3e5 amd64   Cumulus NetQ Telemetry Agent for Ubuntu
    ii  netq-apps    2.4.1-ub18.04u26~1581351889.c5ec3e5 amd64   Cumulus NetQ Fabric Validation Application for Ubuntu
    
    cumulus@<hostname>:~$ cd /mnt/installables/
    cumulus@<hostname>:/mnt/installables$ ls
    NetQ-2.4.1.tgz  netq-bootstrap-2.4.1.tgz
    
  9. Run the following commands.

    sudo systemctl disable apt-{daily,daily-upgrade}.{service,timer}
    sudo systemctl stop apt-{daily,daily-upgrade}.{service,timer}
    sudo systemctl disable motd-news.{service,timer}
    sudo systemctl stop motd-news.{service,timer}
    
Run the Bootstrap CLI.

Run the bootstrap CLI on your appliance for the interface you defined above (eth0 or eth1 for example). This example uses the eth0 interface.

cumulus@<hostname>:~$ netq bootstrap master interface eth0 tarball /mnt/installables/netq-bootstrap-2.4.1.tgz

Allow about five minutes for this to complete.

If you are creating a server cluster, you need to prepare each of those appliances as well. Repeat these steps if you are using a previously deployed appliance or refer to Prepare for NetQ On-premises Installation or Prepare for NetQ Cloud Installation for a new appliance.

You are now ready to install the NetQ Software. Refer to Install NetQ Using the Admin UI (recommended) or Install NetQ Using the CLI.

Install NetQ Using the Admin UI

After you have validated the prerequisites and performed the preparation steps, you can now install the NetQ software using the Admin UI.

You must perform the preparation steps before installing the NetQ software. Go to Prepare for NetQ On-premises Installation or Prepare for NetQ Cloud Installation if you have not yet completed these preparation steps.

To install NetQ:

  1. Log in to your NetQ platform server, NetQ Appliance, NetQ Cloud Appliance or the master node of your cluster.

    In your browser address field, enter https://<hostname-or-ipaddr>:8443

    This opens the Admin UI.

  2. Step through the UI:

    1. Select your deployment type.

      The first step to install Cumulus NetQ is to choose which type of deployment model you want to use. If you are performing an upgrade, then select the deployment type you already have set up. Both options provide secure access to data and features useful for monitoring and troubleshooting your network.

      Select the on-premises deployment model if you want to host all required hardware and software at your location(s), and you have the in-house skill set to install, configure, and maintain it-including performing data backups, acquiring and maintaining hardware and software, and integration and license management. This model is commonly chosen when you do not want to provide any access to the Internet or you have a strong desire for control of the entire network.

      Select the cloud deployment model if you want to host only a small server on your premises and leave the details up to Cumulus Networks. In this deployment, the server connects to the NetQ Cloud service over selected ports. The NetQ application is hosted and data storage is provided in the cloud. Cumulus handles the backups and maintenance of the application and storage.

    2. Select your install method.

      Choose between restoring data from a previous version of NetQ or performing a fresh installation.

      > Restore NetQ data (on-premises only)

      If you have created a backup of your NetQ data in the preparation steps, you can restore your data when you reach this screen.

      If you are moving from a standalone to a server cluster arrangement, you can only restore your data one time. After the data has been converted to the cluster schema, it cannot be returned to the standalone server format.

      > Fresh Install

      Continue with Step c.

    3. Select your server arrangement.

      Select whether you want to deploy your infrastructure as a single stand-alone server or as a cluster of servers. Choosing the Stand-alone configuration is simpler, but for on-premises deployments you need to anticipate the size and capabilities of this server to support your final deployment. Choosing the multiple server configuration is more complex, but it offers more scalability, high availability, and failover capabilities depending on the configuration. Both options support the installation of NetQ as a VM or disk image.

      Select the standalone single-server arrangements for smaller, simpler deployments. Be sure to consider the capabilities and resources needed on this server to support the size of your final deployment.

      Select the three-server cluster arrangement to obtain scalability, high availability, and/or failover for your network. With the NetQ 2.4.0 release, you must have one master and two worker nodes. With the NetQ 2.4.1 release and later, you can configure up to seven additional worker nodes, for a total of nine.

      Select arrangement

      Select arrangement

      Add worker nodes to a server cluster

      Add worker nodes to a server cluster

    4. Install NetQ software.

      After the hardware has been configured, you can install the NetQ software using the installation files (NetQ-2.4.1-tgz for on-premises deployments or NetQ-2.4.1-opta.tgz for cloud deployments) that you downloaded during the preparation steps.

    5. Activate NetQ.

      This final step activates the software and enables you to view the health of your NetQ system. For cloud deployments, you must enter your configuration key.

      On-premises activation

      On-premises activation

      Cloud activation

      Cloud activation

    6. View the system health.

      When the installation and activation is complete, the NetQ System Health dashboard is visible for tracking the status of key components in the system. Standalone server deployments display two cards, one for the server, and one for Kubernetes pods. Server cluster deployments display additional cards, including one each for the Cassandra database, Kafka, and Zookeeper services.

Install NetQ Using the CLI

After you have validated the prerequisites and performed the preparation steps, you can then install the NetQ software using the CLI.

You must perform the preparation steps before installing the NetQ software. Go to Prepare for NetQ On-premises Installation or Prepare for NetQ Cloud Installation if you have not yet completed these preparation steps.

To install NetQ:

  1. Log in to your NetQ platform server, NetQ Appliance, NetQ Cloud Appliance or the master node of your cluster.

  2. Install the software.

    • For On-premises Solution, Single Server

      Run the following command on your NetQ platform server or NetQ Appliance:

      cumulus@<hostname>:~$ netq install standalone full interface eth0 bundle /mnt/installables/NetQ-2.4.1.tgz
      

      You can specify the IP address instead of the interface name here: use ip-addr <IP address> in place of interface eth0 above.

    • For On-premises Solution, Server Cluster

      Run the following commands on your master node, using the IP addresses of your worker nodes:

      cumulus@<hostname>:~$ netq install cluster full interface eth0 bundle /mnt/installables/NetQ-2.4.1.tgz workers <worker-1-ip> <worker-2-ip>
      

      You can specify the IP address instead of the interface name here: use ip-addr <IP address> in place of interface eth0 above.

    • For Cloud Solution, Single Server

      Run the following command on your NetQ Cloud Appliance with the config-key sent by Cumulus Networks in an email titled “A new site has been added to your Cumulus NetQ account.”

      cumulus@<hostname>:~$ netq install opta standalone full interface eth0 bundle /mnt/installables/NetQ-2.4.1-opta.tgz config-key <your-config-key-from-email> proxy-host <proxy-hostname> proxy-port <proxy-port>
      

      You can specify the IP address instead of the interface name here: use ip-addr <IP address> in place of interface eth0 above.

    • For Cloud Solution, Server Cluster

      Run the following commands on your master NetQ Cloud Appliance with the config-key sent by Cumulus Networks in an email titled “A new site has been added to your Cumulus NetQ account.”

      cumulus@<hostname>:~$ netq install opta cluster full interface eth0 bundle /mnt/installables/NetQ-2.4.1-opta.tgz config-key <your-config-key-from-email> workers <worker-1-ip> <worker-2-ip> proxy-host <proxy-hostname> proxy-port <proxy-port>
      

      You can specify the IP address instead of the interface name here: use ip-addr <IP address> in place of interface eth0 above.

Install NetQ Agents

After installing your Cumulus NetQ 2.4.x software, you should install the corresponding NetQ 2.4.0 or NetQ 2.4.1 Agents on each switch and server you want to monitor. There are important fixes in the NetQ Agent with each release.

Use the instructions in the following sections based on the OS installed on the switch or server.

Install and Configure the NetQ Agent on Cumulus Linux Switches

After installing your Cumulus NetQ software, you should install the NetQ 2.4.1 Agents on each switch you want to monitor. NetQ 2.4 Agents can be installed on switches running:

Prepare for NetQ Agent Installation on a Cumulus Linux Switch

For servers running Cumulus Linux, you need to:

If your network uses a proxy server for external connections, you should first configure a global proxy so apt-get can access the software package in the Cumulus Networks repository.

Verify NTP is Installed and Configured

Verify that NTP is running on the switch. The switch must be in time synchronization with the NetQ Platform or NetQ Appliance to enable useful statistical analysis.

cumulus@switch:~$ sudo systemctl status ntp
[sudo] password for cumulus:
● ntp.service - LSB: Start NTP daemon
        Loaded: loaded (/etc/init.d/ntp; bad; vendor preset: enabled)
        Active: active (running) since Fri 2018-06-01 13:49:11 EDT; 2 weeks 6 days ago
          Docs: man:systemd-sysv-generator(8)
        CGroup: /system.slice/ntp.service
                └─2873 /usr/sbin/ntpd -p /var/run/ntpd.pid -g -c /var/lib/ntp/ntp.conf.dhcp -u 109:114

If NTP is not installed, install and configure it before continuing.

If NTP is not running:

If you are running NTP in your out-of-band management network with VRF, specify the VRF (ntp@<vrf-name> versus just ntp) in the above commands.

Obtain NetQ Agent Software Package

To install the NetQ Agent you need to install netq-agent on each switch or host. This is available from the Cumulus Networks repository.

To obtain the NetQ Agent package:

Edit the /etc/apt/sources.list file to add the repository for Cumulus NetQ.

Note that NetQ has a separate repository from Cumulus Linux.

Cumulus Linux 3.x
cumulus@switch:~$ sudo nano /etc/apt/sources.list
...
deb http://apps3.cumulusnetworks.com/repos/deb CumulusLinux-3 netq-2.4
...

The repository deb http://apps3.cumulusnetworks.com/repos/deb CumulusLinux-3 netq-latest can be used if you want to always retrieve the latest posted version of NetQ.

Cumulus Linux 4.x
cumulus@switch:~$ sudo nano /etc/apt/sources.list
...
deb http://apps3.cumulusnetworks.com/repos/deb CumulusLinux-4 netq-2.4
...

The repository deb http://apps3.cumulusnetworks.com/repos/deb CumulusLinux-4 netq-latest can be used if you want to always retrieve the latest posted version of NetQ.

Add the Apt Repository Key (Cumulus Linux 4.0 Only)

Add the apps3.cumulusnetworks.com authentication key to Cumulus Linux.

cumulus@switch:~$ wget -qO - https://apps3.cumulusnetworks.com/setup/cumulus-apps-deb.pubkey | sudo apt-key add -

Install the NetQ Agent on Cumulus Linux Switch

After completing the preparation steps, you can successfully install the agent onto your switch.

To install the NetQ Agent:

  1. Update the local apt repository, then install the NetQ software on the switch.
cumulus@switch:~$ sudo apt-get update
cumulus@switch:~$ sudo apt-get install netq-agent
  1. Verify you have the correct version of the Agent.
cumulus@switch:~$ dpkg-query -W -f '${Package}\t${Version}\n' netq-agent
You should see version 2.4.1 and update 26 or later in the results. For example:

- Cumulus Linux 3.3.2-3.7.x
  - netq-agent_**2.4.1**-cl3u**26**~1581350572.c5ec3e5_armel.deb
  - netq-agent_**2.4.1**-cl3u**26**~1581350238.c5ec3e5a_amd64.deb

- Cumulus Linux 4.0.0
  - netq-agent_**2.4.1**-cl4u**26**~1581350349.c5ec3e5a_armel.deb
  - netq-agent_**2.4.1**-cl3u**26**~1581350537.c5ec3e5_amd64.deb
  1. Restart rsyslog so log files are sent to the correct destination.
cumulus@switch:~$ sudo systemctl restart rsyslog.service
  1. Continue with NetQ Agent configuration in the next section.

Configure the NetQ Agent on a Cumulus Linux Switch

After the NetQ Agents have been installed on the switches you want to monitor, the NetQ Agents must be configured to obtain useful and relevant data. Two methods are available for configuring a NetQ Agent:

Configure NetQ Agents Using a Configuration File

You can configure the NetQ Agent in the netq.yml configuration file contained in the /etc/netq/ directory.

  1. Open the netq.yml file using your text editor of choice. For example:
root@rhel7:~# sudo nano /etc/netq/netq.yml
  1. Locate the netq-agent section, or add it.

  2. Set the parameters for the agent as follows:

    • port: 31980 (default configuration)
    • server: IP address of the NetQ Platform or NetQ Appliance where the agent should send its collected data
    • vrf: default (default) or one that you specify

Your configuration should be similar to this:

netq-agent:
  port: 31980
  server: 127.0.0.1
  vrf: default

Configure NetQ Agents Using the NetQ CLI

If the CLI is configured, you can use it to configure the NetQ Agent to send telemetry data to the NetQ Platform or NetQ Appliance. To configure the NetQ CLI, refer to Install and Configure the NetQ CLI on Cumulus Linux Switches.

If you intend to use VRF, refer to Configure the Agent to Use VRF. If you intend to specify a port for communication, refer to Configure the Agent to Communicate over a Specific Port.

Use the following command to configure the NetQ Agent:

netq config add agent server <text-opta-ip> [port <text-opta-port>] [vrf <text-vrf-name>]

This example uses an IP address of 192.168.1.254 and the default port and VRF for the NetQ hardware.

cumulus@switch:~$ sudo netq config add agent server 192.168.1.254
Updated agent server 192.168.1.254 vrf default. Please restart netq-agent (netq config restart agent).
cumulus@switch:~$ sudo netq config restart agent

Configure Advanced NetQ Agent Settings on a Cumulus Linux Switch

A couple of additional options are available for configuring the NetQ Agent. If you are using VRF, you can configure the agent to communicate over a specific VRF. You can also configure the agent to use a particular port.

Configure the Agent to Use a VRF

While optional, Cumulus strongly recommends that you configure NetQ Agents to communicate with the NetQ Platform only via a VRF, including a management VRF. To do so, you need to specify the VRF name when configuring the NetQ Agent. For example, if the management VRF is configured and you want the agent to communicate with the NetQ Platform over it, configure the agent like this:

cumulus@leaf01:~$ sudo netq config add agent server 192.168.1.254 vrf mgmt
cumulus@leaf01:~$ sudo netq config restart agent

Configure the Agent to Communicate over a Specific Port

By default, NetQ uses port 31980 for communication between the NetQ Platform and NetQ Agents. If you want the NetQ Agent to communicate with the NetQ Platform via a different port, you need to specify the port number when configuring the NetQ Agent like this:

cumulus@leaf01:~$ sudo netq config add agent server 192.168.1.254 port 7379
cumulus@leaf01:~$ sudo netq config restart agent

Install and Configure the NetQ Agent on Ubuntu Servers

After installing your Cumulus NetQ software, you should install the NetQ 2.4.1 Agents on each server you want to monitor. NetQ 2.4 Agents can be installed on servers running:

Prepare for NetQ Agent Installation on an Ubuntu Server

For servers running Ubuntu OS, you need to:

If your network uses a proxy server for external connections, you should first configure a global proxy so apt-get can access the agent package on the Cumulus Networks repository.

Verify Service Package Versions

Before you install the NetQ Agent on an Ubuntu server, make sure the following packages are installed and running these minimum versions:

Verify the Server is Running lldpd

Make sure you are running lldpd, not lldpad. Ubuntu does not include lldpd by default, which is required for the installation.

To install this package, run the following commands:

root@ubuntu:~# sudo apt-get update
root@ubuntu:~# sudo apt-get install lldpd
root@ubuntu:~# sudo systemctl enable lldpd.service
root@ubuntu:~# sudo systemctl start lldpd.service

Install and Configure Network Time Server

If NTP is not already installed and configured, follow these steps:

  1. Install NTP on the server, if not already installed. Servers must be in time synchronization with the NetQ Platform or NetQ Appliance to enable useful statistical analysis.
root@ubuntu:~# sudo apt-get install ntp
  1. Configure the network time server.

    Use NTP Configuration File
    1. Open the /etc/ntp.conf file in your text editor of choice.

    2. Under the Server section, specify the NTP server IP address or hostname.

    3. Enable and start the NTP service.

      root@ubuntu:~# sudo systemctl enable ntp
      root@ubuntu:~# sudo systemctl start ntp
      

    If you are running NTP in your out-of-band management network with VRF, specify the VRF (ntp@<vrf-name> versus just ntp) in the above commands.

    1. Verify NTP is operating correctly. Look for an asterisk (*) or a plus sign (+) that indicates the clock is synchronized.

      root@ubuntu:~# ntpq -pn
      remote           refid            st t when poll reach   delay   offset  jitter
      ==============================================================================
      +173.255.206.154 132.163.96.3     2 u   86  128  377   41.354    2.834   0.602
      +12.167.151.2    198.148.79.209   3 u  103  128  377   13.395   -4.025   0.198
      2a00:7600::41    .STEP.          16 u    - 1024    0    0.000    0.000   0.000
      \*129.250.35.250 249.224.99.213   2 u  101  128  377   14.588   -0.299   0.243
      
    Use Chrony (Ubuntu 18.04 only)
    1. Install chrony if needed.

      root@ubuntu:~# sudo apt install chrony
      
    2. Start the chrony service.

      root@ubuntu:~# sudo /usr/local/sbin/chronyd
      
    3. Verify it installed successfully.

      root@ubuntu:~# chronyc activity
      200 OK
      8 sources online
      0 sources offline
      0 sources doing burst (return to online)
      0 sources doing burst (return to offline)
      0 sources with unknown address
      
    4. View the time servers chrony is using.

      root@ubuntu:~# chronyc sources
      210 Number of sources = 8
      
      MS Name/IP address         Stratum Poll Reach LastRx Last sample
      ===============================================================================
      ^+ golem.canonical.com           2   6   377    39  -1135us[-1135us] +/-   98ms
      ^* clock.xmission.com            2   6   377    41  -4641ns[ +144us] +/-   41ms
      ^+ ntp.ubuntu.net              2   7   377   106   -746us[ -573us] +/-   41ms
      ...
      

      Open the chrony.conf configuration file (by default at /etc/chrony/) and edit if needed.

      Example with individual servers specified:

      server golem.canonical.com iburst
      server clock.xmission.com iburst
      server ntp.ubuntu.com iburst
      driftfile /var/lib/chrony/drift
      makestep 1.0 3
      rtcsync
      

      Example when using a pool of servers:

      pool pool.ntp.org iburst
      driftfile /var/lib/chrony/drift
      makestep 1.0 3
      rtcsync
      
    5. View the server chrony is currently tracking.

      root@ubuntu:~# chronyc tracking
      Reference ID    : 5BBD59C7 (golem.canonical.com)
      Stratum         : 3
      Ref time (UTC)  : Mon Feb 10 14:35:18 2020
      System time     : 0.0000046340 seconds slow of NTP time
      Last offset     : -0.000123459 seconds
      RMS offset      : 0.007654410 seconds
      Frequency       : 8.342 ppm slow
      Residual freq   : -0.000 ppm
      Skew            : 26.846 ppm
      Root delay      : 0.031207654 seconds
      Root dispersion : 0.001234590 seconds
      Update interval : 115.2 seconds
      Leap status     : Normal
      

Obtain NetQ Agent Software Package

To install the NetQ Agent you need to install netq-agent on each server. This is available from the Cumulus Networks repository.

To obtain the NetQ Agent package:

  1. Reference and update the local apt repository.
root@ubuntu:~# sudo wget -O- https://apps3.cumulusnetworks.com/setup/cumulus-apps-deb.pubkey | apt-key add -
  1. Add the Ubuntu repository:

    Ubuntu 16.04

    Create the file /etc/apt/sources.list.d/cumulus-host-ubuntu-xenial.list and add the following line:

    root@ubuntu:~# vi /etc/apt/sources.list.d/cumulus-apps-deb-xenial.list
    ...
    deb [arch=amd64] https://apps3.cumulusnetworks.com/repos/deb xenial netq-latest
    ...
    
    Ubuntu 18.04

    Create the file /etc/apt/sources.list.d/cumulus-host-ubuntu-bionic.list and add the following line:

     root@ubuntu:~# vi /etc/apt/sources.list.d/cumulus-apps-deb-bionic.list
     ...
     deb [arch=amd64] https://apps3.cumulusnetworks.com/repos/deb bionic netq-latest
     ...
    

    The use of netq-latest in these examples means that a get to the repository always retrieves the latest version of NetQ, even in the case where a major version update has been made. If you want to keep the repository on a specific version - such as netq-2.3 - use that instead.

Install NetQ Agent on an Ubuntu Server

After completing the preparation steps, you can successfully install the agent software onto your server.

To install the NetQ Agent:

  1. Install the software packages on the server.
root@ubuntu:~# sudo apt-get update
root@ubuntu:~# sudo apt-get install netq-agent
  1. Verify you have the correct version of the Agent.
root@ubuntu:~# dpkg-query -W -f '${Package}\t${Version}\n' netq-agent
You should see version 2.4.1 and update 26 in the results. For example:

- netq-agent_**2.4.1**-ub18.04u**26**~1581351889.c5ec3e5_amd64.deb, or
- netq-agent_**2.4.1**-ub16.04u**26**~1581350451.c5ec3e5_amd64.deb
  1. Restart rsyslog so log files are sent to the correct destination.
root@ubuntu:~# sudo systemctl restart rsyslog.service
  1. Continue with NetQ Agent Configuration in the next section.

Configure the NetQ Agent on an Ubuntu Server

After the NetQ Agents have been installed on the servers you want to monitor, the NetQ Agents must be configured to obtain useful and relevant data. Two methods are available for configuring a NetQ Agent:

Configure the NetQ Agents Using a Configuration File

You can configure the NetQ Agent in the netq.yml configuration file contained in the /etc/netq/ directory.

  1. Open the netq.yml file using your text editor of choice. For example:
root@ubuntu:~# sudo nano /etc/netq/netq.yml
  1. Locate the netq-agent section, or add it.

  2. Set the parameters for the agent as follows:

Your configuration should be similar to this:

netq-agent:
    port: 31980
    server: 127.0.0.1
    vrf: default

Configure NetQ Agents Using the NetQ CLI

If the CLI is configured, you can use it to configure the NetQ Agent to send telemetry data to the NetQ Server or Appliance. If it is not configured, refer to Configure the NetQ CLI on an Ubuntu Server and then return here.

If you intend to use VRF, skip to Configure the Agent to Use VRF. If you intend to specify a port for communication, skip to Configure the Agent to Communicate over a Specific Port.

Use the following command to configure the NetQ Agent:

netq config add agent server <text-opta-ip> [port <text-opta-port>] [vrf <text-vrf-name>]

This example uses an IP address of 192.168.1.254 and the default port and VRF for the NetQ hardware.

root@ubuntu:~# sudo netq config add agent server 192.168.1.254
Updated agent server 192.168.1.254 vrf default. Please restart netq-agent (netq config restart agent).
root@ubuntu:~# sudo netq config restart agent

Configure Advanced NetQ Agent Settings

A couple of additional options are available for configuring the NetQ Agent. If you are using VRF, you can configure the agent to communicate over a specific VRF. You can also configure the agent to use a particular port.

Configure the NetQ Agent to Use a VRF

While optional, Cumulus strongly recommends that you configure NetQ Agents to communicate with the NetQ Platform only via a VRF, including a management VRF. To do so, you need to specify the VRF name when configuring the NetQ Agent. For example, if the management VRF is configured and you want the agent to communicate with the NetQ Platform over it, configure the agent like this:

root@ubuntu:~# sudo netq config add agent server 192.168.1.254 vrf mgmt
root@ubuntu:~# sudo netq config restart agent

Configure the NetQ Agent to Communicate over a Specific Port

By default, NetQ uses port 31980 for communication between the NetQ Platform and NetQ Agents. If you want the NetQ Agent to communicate with the NetQ Platform via a different port, you need to specify the port number when configuring the NetQ Agent like this:

root@ubuntu:~# sudo netq config add agent server 192.168.1.254 port 7379
root@ubuntu:~# sudo netq config restart agent

Install and Configure the NetQ Agent on RHEL and CentOS Servers

After installing your Cumulus NetQ software, you should install the NetQ 2.4.1 Agents on each server you want to monitor. NetQ 2.4 Agents can be installed on servers running:

Prepare for NetQ Agent Installation on a RHEL or CentOS Server

For servers running RHEL or CentOS, you need to:

If your network uses a proxy server for external connections, you should first configure a global proxy so apt-get can access the software package in the Cumulus Networks repository.

Verify Service Package Versions

Before you install the NetQ Agent on a Red Hat or CentOS server, make sure the following packages are installed and running these minimum versions:

Verify the Server is Running lldpd and wget

Make sure you are running lldpd, not lldpad. CentOS does not include lldpd by default, nor does it include wget, which is required for the installation.

To install this package, run the following commands:

root@rhel7:~# sudo yum -y install epel-release
root@rhel7:~# sudo yum -y install lldpd
root@rhel7:~# sudo systemctl enable lldpd.service
root@rhel7:~# sudo systemctl start lldpd.service
root@rhel7:~# sudo yum install wget

Install and Configure NTP

If NTP is not already installed and configured, follow these steps:

  1. Install NTP on the server. Servers must be in time synchronization with the NetQ Platform or NetQ Appliance to enable useful statistical analysis.
root@rhel7:~# sudo yum install ntp
  1. Configure the NTP server.

    1. Open the /etc/ntp.conf file in your text editor of choice.

    2. Under the Server section, specify the NTP server IP address or hostname.

  2. Enable and start the NTP service.

root@rhel7:~# sudo systemctl enable ntp
root@rhel7:~# sudo systemctl start ntp

If you are running NTP in your out-of-band management network with VRF, specify the VRF (ntp@<vrf-name> versus just ntp) in the above commands.

  1. Verify NTP is operating correctly. Look for an asterisk (*) or a plus sign (+) that indicates the clock is synchronized.
root@rhel7:~# ntpq -pn
remote           refid            st t when poll reach   delay   offset  jitter
==============================================================================
+173.255.206.154 132.163.96.3     2 u   86  128  377   41.354    2.834   0.602
+12.167.151.2    198.148.79.209   3 u  103  128  377   13.395   -4.025   0.198
2a00:7600::41    .STEP.          16 u    - 1024    0    0.000    0.000   0.000
\*129.250.35.250 249.224.99.213   2 u  101  128  377   14.588   -0.299   0.243

Obtain NetQ Agent Software Package

To install the NetQ Agent you need to install netq-agent on each switch or host. This is available from the Cumulus Networks repository.

To obtain the NetQ Agent package:

  1. Reference and update the local yum repository.
root@rhel7:~# sudo rpm --import https://apps3.cumulusnetworks.com/setup/cumulus-apps-rpm.pubkey
root@rhel7:~# sudo wget -O- https://apps3.cumulusnetworks.com/setup/cumulus-apps-rpm-el7.repo > /etc/yum.repos.d/cumulus-host-el.repo
  1. Edit /etc/yum.repos.d/cumulus-host-el.repo to set the enabled=1 flag for the two NetQ repositories.
root@rhel7:~# vi /etc/yum.repos.d/cumulus-host-el.repo
...
[cumulus-arch-netq-2.4]
name=Cumulus netq packages
baseurl=https://apps3.cumulusnetworks.com/repos/rpm/el/7/netq-2.4/$basearch
gpgcheck=1
enabled=1
[cumulus-noarch-netq-2.4]
name=Cumulus netq architecture-independent packages
baseurl=https://apps3.cumulusnetworks.com/repos/rpm/el/7/netq-2.4/noarch
gpgcheck=1
enabled=1
...

Install NetQ Agent on a RHEL or CentOS Server

After completing the preparation steps, you can successfully install the agent software onto your server.

To install the NetQ Agent:

  1. Install the Bash completion and NetQ packages on the server.
root@rhel7:~# sudo yum -y install bash-completion
root@rhel7:~# sudo yum install netq-agent
  1. Verify you have the correct version of the Agent.
root@rhel7:~# rpm -q netq-agent
You should see version 2.4.1 and update 26 or later in the results. For example: 

netq-agent-**2.4.1**-rh7u**26**~1581350236.c5ec3e5.x86_64.rpm
  1. Restart rsyslog so log files are sent to the correct destination.
root@rhel7:~# sudo systemctl restart rsyslog
  1. Continue with NetQ Agent Configuration in the next section.

Configure the NetQ Agent on a RHEL or CentOS Server

After the NetQ Agents have been installed on the servers you want to monitor, the NetQ Agents must be configured to obtain useful and relevant data. Two methods are available for configuring a NetQ Agent:

Configure the NetQ Agents Using a Configuration File

You can configure the NetQ Agent in the netq.yml configuration file contained in the /etc/netq/ directory.

  1. Open the netq.yml file using your text editor of choice. For example:
root@rhel7:~# sudo nano /etc/netq/netq.yml
  1. Locate the netq-agent section, or add it.

  2. Set the parameters for the agent as follows:

Your configuration should be similar to this:

netq-agent:
  port: 31980
  server: 127.0.0.1
  vrf: default

Configure NetQ Agents Using the NetQ CLI

If the CLI is configured, you can use it to configure the NetQ Agent to send telemetry data to the NetQ Server or Appliance. If it is not configured, refer to Configure the NetQ CLI on a RHEL or CentOS Server and then return here.

If you intend to use VRF, skip to Configure the Agent to Use VRF. If you intend to specify a port for communication, skip to Configure the Agent to Communicate over a Specific Port.

Use the following command to configure the NetQ Agent:

netq config add agent server <text-opta-ip> [port <text-opta-port>] [vrf <text-vrf-name>]

This example uses an IP address of 192.168.1.254 and the default port and VRF for the NetQ hardware.

root@rhel7:~# sudo netq config add agent server 192.168.1.254
Updated agent server 192.168.1.254 vrf default. Please restart netq-agent (netq config restart agent).
root@rhel7:~# sudo netq config restart agent

Configure Advanced NetQ Agent Settings

A couple of additional options are available for configuring the NetQ Agent. If you are using VRF, you can configure the agent to communicate over a specific VRF. You can also configure the agent to use a particular port.

Configure the NetQ Agent to Use a VRF

While optional, Cumulus strongly recommends that you configure NetQ Agents to communicate with the NetQ Platform only via a VRF, including a management VRF. To do so, you need to specify the VRF name when configuring the NetQ Agent. For example, if the management VRF is configured and you want the agent to communicate with the NetQ Platform over it, configure the agent like this:

root@rhel7:~# sudo netq config add agent server 192.168.1.254 vrf mgmt
root@rhel7:~# sudo netq config restart agent

Configure the NetQ Agent to Communicate over a Specific Port

By default, NetQ uses port 31980 for communication between the NetQ Platform and NetQ Agents. If you want the NetQ Agent to communicate with the NetQ Platform via a different port, you need to specify the port number when configuring the NetQ Agent like this:

root@rhel7:~# sudo netq config add agent server 192.168.1.254 port 7379
root@rhel7:~# sudo netq config restart agent

Install NetQ CLI

When installing NetQ 2.4.x, it is not required that you install the NetQ CLI on your NetQ Platform, NetQ Appliance, or monitored switches and hosts, but it provides new features, important bug fixes, and the ability to management your network from multiple points in the network.

Use the instructions in the following sections based on the OS installed on the switch or server.

Install and Configure the NetQ CLI on Cumulus Linux Switches

After installing your Cumulus NetQ software and the NetQ 2.4.1 Agent on each switch you want to monitor, you can also install the NetQ CLI on switches running:

Install the NetQ CLI Installation on a Cumulus Linux Switch

A simple process installs the NetQ CLI on a Cumulus Linux switch.

To install the NetQ CLI you need to install netq-apps on each switch. This is available from the Cumulus Networks repository.

If your network uses a proxy server for external connections, you should first configure a global proxy so apt-get can access the software package in the Cumulus Networks repository.

To obtain the NetQ Agent package:

Edit the /etc/apt/sources.list file to add the repository for Cumulus NetQ.

Note that NetQ has a separate repository from Cumulus Linux.

Cumulus Linux 3.x
cumulus@switch:~$ sudo nano /etc/apt/sources.list
...
deb http://apps3.cumulusnetworks.com/repos/deb CumulusLinux-3 netq-2.4
...

The repository deb http://apps3.cumulusnetworks.com/repos/deb CumulusLinux-4 netq-latest can be used if you want to always retrieve the latest posted version of NetQ.

Cumulus Linux 4.x
cumulus@switch:~$ sudo nano /etc/apt/sources.list
...
deb http://apps3.cumulusnetworks.com/repos/deb CumulusLinux-4 netq-2.4
...

The repository deb http://apps3.cumulusnetworks.com/repos/deb CumulusLinux-4 netq-latest can be used if you want to always retrieve the latest posted version of NetQ.

  1. Update the local apt repository and install the software on the switch.
cumulus@switch:~$ sudo apt-get update
cumulus@switch:~$ sudo apt-get install netq-apps
  1. Verify you have the correct version of the CLI.
cumulus@switch:~$ dpkg-query -W -f '${Package}\t${Version}\n' netq-agent
You should see version 2.4.1 and update 26 or later in the results. For example:

- For Cumulus Linux 3.3.2-3.7.x:  
  - netq-apps_**2.4.1**-cl3u**26**~1581350572.c5ec3e5_armel.deb
  - netq-apps_**2.4.1**-cl3u**26**~1581350537.c5ec3e5_amd64.deb
- For Cumulus Linux 4.0.0:
  - netq-apps_**2.4.1**-cl4u**26**~1581350349.c5ec3e5a_armel.deb
  - netq-apps_**2.4.1**-cl4u**26**~1581350238.c5ec3e5a_amd64.deb 
  1. Continue with NetQ CLI configuration in the next section.

Configure the NetQ CLI on a Cumulus Linux Switch

Two methods are available for configuring the NetQ CLI on a switch:

Configure NetQ CLI Using the CLI

The steps to configure the CLI are different depending on whether the NetQ software has been installed for an on-premises or cloud deployment. Follow the instructions for your deployment type.

Configuring the CLI for On-premises Deployments

Use the following command to configure the CLI:

netq config add cli server <text-gateway-dest> [vrf <text-vrf-name>] [port <text-gateway-port>]

Restart the CLI afterward to activate the configuration.

This example uses an IP address of 192.168.1.0 and the default port and VRF.

cumulus@switch:~$ sudo netq config add cli server 192.168.1.0
cumulus@switch:~$ sudo netq config restart cli

If you have a server cluster deployed, use the IP address of the master server.

Configuring the CLI for Cloud Deployments

To access and configure the CLI on your NetQ Platform or NetQ Cloud Appliance, you must have your username and password to access the NetQ UI to generate AuthKeys. These keys provide authorized access (access key) and user authentication (secret key). Your credentials and NetQ Cloud addresses were provided by Cumulus Networks via an email titled Welcome to Cumulus NetQ!

To generate AuthKeys:

  1. In your Internet browser, enter netq.cumulusnetworks.com into the address field to open the NetQ UI login page.

  2. Enter your username and password.

  3. From the Main Menu, select Management in the Admin column.

  4. Click Manage on the User Accounts card.

  5. Select your user and click above the table.

  6. Copy these keys to a safe place.

    The secret key is only shown once. If you don’t copy these, you will need to regenerate them and reconfigure CLI access.

    You can also save these keys to a YAML file for easy reference, and to avoid having to type or copy the key values. You can:

    • store the file wherever you like, for example in /home/cumulus/ or /etc/netq
    • name the file whatever you like, for example credentials.yml, creds.yml, or keys.yml

    BUT, the file must have the following format:

    access-key: <user-access-key-value-here>
    secret-key: <user-secret-key-value-here>
    

Now that you have your AuthKeys, use the following command to configure the CLI:

netq config add cli server <text-gateway-dest> [access-key <text-access-key> secret-key <text-secret-key> premises <text-premises-name> | cli-keys-file <text-key-file> premises <text-premises-name>] [vrf <text-vrf-name>] [port <text-gateway-port>]

Restart the CLI afterward to activate the configuration.

This example uses the individual access key, a premises of datacenterwest, and the default Cloud address, port and VRF. Be sure to replace the key values with your generated keys if you are using this example on your server.

cumulus@switch:~$ sudo netq config add cli server api.netq.cumulusnetworks.com access-key 123452d9bc2850a1726f55534279dd3c8b3ec55e8b25144d4739dfddabe8149e secret-key /vAGywae2E4xVZg8F+HtS6h6yHliZbBP6HXU3J98765= premises datacenterwest
Successfully logged into NetQ cloud at api.netq.cumulusnetworks.com:443
Updated cli server api.netq.cumulusnetworks.com vrf default port 443. Please restart netqd (netq config restart cli)

cumulus@switch:~$ sudo netq config restart cli
Restarting NetQ CLI... Success!

This example uses an optional keys file. Be sure to replace the keys filename and path with the full path and name of your keys file, and the datacenterwest premises name with your premises name if you are using this example on your server.

cumulus@switch:~$ sudo netq config add cli server api.netq.cumulusnetworks.com cli-keys-file /home/netq/nq-cld-creds.yml premises datacenterwest
Successfully logged into NetQ cloud at api.netq.cumulusnetworks.com:443
Updated cli server api.netq.cumulusnetworks.com vrf default port 443. Please restart netqd (netq config restart cli)

cumulus@switch:~$ netq config restart cli
Restarting NetQ CLI... Success!

If you have multiple premises and want to query data from a different premises than you originally configured, rerun the netq config add cli server command with the desired premises name. You can only view the data for one premises at a time with the CLI.

Configure NetQ CLI Using a Configuration File

You can configure the NetQ CLI in the netq.yml configuration file contained in the /etc/netq/ directory.

  1. Open the netq.yml file using your text editor of choice. For example:
root@rhel7:~# sudo nano /etc/netq/netq.yml
  1. Locate the netq-cli section, or add it.

  2. Set the parameters for the CLI as follows:

ParameterOn-premisesCloud
netq-userUser who can access the CLIUser who can access the CLI
serverIP address of the NetQ server or NetQ Applianceapi.netq.cumulusnetworks.com
port (default)32708443
premisesNAName of premises you want to query

An on-premises configuration should be similar to this:

netq-cli:
  netq-user: admin@company.com
  port: 32708
  server: 192.168.0.254

A cloud configuration should be similar to this:

netq-cli:
  netq-user: admin@company.com
  port: 443
  premises: datacenterwest
  server: api.netq.cumulusnetworks.com

Install and Configure the NetQ CLI on Ubuntu Servers

After installing your Cumulus NetQ software, you should install the NetQ 2.4.0 Agents on each switch you want to monitor. NetQ 2.4 Agents can be installed on servers running:

Prepare for NetQ CLI Installation on an Ubuntu Server

For servers running Ubuntu OS, you need to:

If your network uses a proxy server for external connections, you should first configure a global proxy so apt-get can access the agent package on the Cumulus Networks repository.

Verify Service Package Versions

Before you install the NetQ Agent on an Ubuntu server, make sure the following packages are installed and running these minimum versions:

Verify the Server is Running lldpd

Make sure you are running lldpd, not lldpad. Ubuntu does not include lldpd by default, which is required for the installation.

To install this package, run the following commands:

root@ubuntu:~# sudo apt-get update
root@ubuntu:~# sudo apt-get install lldpd
root@ubuntu:~# sudo systemctl enable lldpd.service
root@ubuntu:~# sudo systemctl start lldpd.service

Install and Configure Network Time Server

If NTP is not already installed and configured, follow these steps:

  1. Install NTP on the server, if not already installed. Servers must be in time synchronization with the NetQ Platform or NetQ Appliance to enable useful statistical analysis.
root@ubuntu:~# sudo apt-get install ntp
  1. Configure the network time server.

    Use NTP Configuration File
    1. Open the /etc/ntp.conf file in your text editor of choice.

    2. Under the Server section, specify the NTP server IP address or hostname.

    3. Enable and start the NTP service.

      root@ubuntu:~# sudo systemctl enable ntp
      root@ubuntu:~# sudo systemctl start ntp
      

    If you are running NTP in your out-of-band management network with VRF, specify the VRF (ntp@<vrf-name> versus just ntp) in the above commands.

    1. Verify NTP is operating correctly. Look for an asterisk (*) or a plus sign (+) that indicates the clock is synchronized.

      root@ubuntu:~# ntpq -pn
      remote           refid            st t when poll reach   delay   offset  jitter
      ==============================================================================
      +173.255.206.154 132.163.96.3     2 u   86  128  377   41.354    2.834   0.602
      +12.167.151.2    198.148.79.209   3 u  103  128  377   13.395   -4.025   0.198
      2a00:7600::41    .STEP.          16 u    - 1024    0    0.000    0.000   0.000
      \*129.250.35.250 249.224.99.213   2 u  101  128  377   14.588   -0.299   0.243
      
    Use Chrony (Ubuntu 18.04 only)
    1. Install chrony if needed.

      root@ubuntu:~# sudo apt install chrony
      
    2. Start the chrony service.

      root@ubuntu:~# sudo /usr/local/sbin/chronyd
      
    3. Verify it installed successfully.

      root@ubuntu:~# chronyc activity
      200 OK
      8 sources online
      0 sources offline
      0 sources doing burst (return to online)
      0 sources doing burst (return to offline)
      0 sources with unknown address
      
    4. View the time servers chrony is using.

      root@ubuntu:~# chronyc sources
      210 Number of sources = 8
      
      MS Name/IP address         Stratum Poll Reach LastRx Last sample
      ===============================================================================
      ^+ golem.canonical.com           2   6   377    39  -1135us[-1135us] +/-   98ms
      ^* clock.xmission.com            2   6   377    41  -4641ns[ +144us] +/-   41ms
      ^+ ntp.ubuntu.net              2   7   377   106   -746us[ -573us] +/-   41ms
      ...
      

      Open the chrony.conf configuration file (by default at /etc/chrony/) and edit if needed.

      Example with individual servers specified:

      server golem.canonical.com iburst
      server clock.xmission.com iburst
      server ntp.ubuntu.com iburst
      driftfile /var/lib/chrony/drift
      makestep 1.0 3
      rtcsync
      

      Example when using a pool of servers:

      pool pool.ntp.org iburst
      driftfile /var/lib/chrony/drift
      makestep 1.0 3
      rtcsync
      
    5. View the server chrony is currently tracking.

      root@ubuntu:~# chronyc tracking
      Reference ID    : 5BBD59C7 (golem.canonical.com)
      Stratum         : 3
      Ref time (UTC)  : Mon Feb 10 14:35:18 2020
      System time     : 0.0000046340 seconds slow of NTP time
      Last offset     : -0.000123459 seconds
      RMS offset      : 0.007654410 seconds
      Frequency       : 8.342 ppm slow
      Residual freq   : -0.000 ppm
      Skew            : 26.846 ppm
      Root delay      : 0.031207654 seconds
      Root dispersion : 0.001234590 seconds
      Update interval : 115.2 seconds
      Leap status     : Normal
      

Obtain NetQ CLI Software Package

To install the NetQ Agent you need to install netq-apps on each server. This is available from the Cumulus Networks repository.

To obtain the NetQ CLI package:

  1. Reference and update the local apt repository.
root@ubuntu:~# sudo wget -O- https://apps3.cumulusnetworks.com/setup/cumulus-apps-deb.pubkey | apt-key add -
  1. Add the Ubuntu repository:

    Ubuntu 16.04

    Create the file /etc/apt/sources.list.d/cumulus-host-ubuntu-xenial.list and add the following line:

    root@ubuntu:~# vi /etc/apt/sources.list.d/cumulus-apps-deb-xenial.list
    ...
    deb [arch=amd64] https://apps3.cumulusnetworks.com/repos/deb xenial netq-latest
    ...
    
    Ubuntu 18.04

    Create the file /etc/apt/sources.list.d/cumulus-host-ubuntu-bionic.list and add the following line:

     root@ubuntu:~# vi /etc/apt/sources.list.d/cumulus-apps-deb-bionic.list
     ...
     deb [arch=amd64] https://apps3.cumulusnetworks.com/repos/deb bionic netq-latest
     ...
    

    The use of netq-latest in these examples means that a get to the repository always retrieves the latest version of NetQ, even in the case where a major version update has been made. If you want to keep the repository on a specific version - such as netq-2.3 - use that instead.

Install NetQ CLI on an Ubuntu Server

A simple process installs the NetQ CLI on an Ubuntu server.

  1. Install the CLI software on the server.
root@ubuntu:~# sudo apt-get update
root@ubuntu:~# sudo apt-get install netq-apps
  1. Verify you have the correct version of the CLI.
root@ubuntu:~# dpkg-query -W -f '${Package}\t${Version}\n' netq-apps
You should see version 2.4.1 and update 26 or later in the results. For example:

- netq-apps_**2.4.1**-ub18.04u**26**~1581351889.c5ec3e5_amd64.deb, or
- netq-apps_**2.4.1**-ub16.04u**26**~1581350451.c5ec3e5_amd64.deb. 
  1. Continue with NetQ CLI configuration in the next section.

Configure the NetQ CLI on an Ubuntu Server

Two methods are available for configuring the NetQ CLI on a switch:

Configure NetQ CLI Using the CLI

The steps to configure the CLI are different depending on whether the NetQ software has been installed for an on-premises or cloud deployment. Follow the instruction for your deployment type.

Configure the CLI for On-premises Deployments

Use the following command to configure the CLI:

netq config add cli server <text-gateway-dest> [vrf <text-vrf-name>] [port <text-gateway-port>]

Restart the CLI afterward to activate the configuration.

This example uses an IP address of 192.168.1.0 and the default port and VRF.

root@ubuntu:~# sudo netq config add cli server 192.168.1.0
root@ubuntu:~# sudo netq config restart cli

If you have a server cluster deployed, use the IP address of the master server.

Configure the CLI for Cloud Deployments

To access and configure the CLI on your NetQ Platform or NetQ Cloud Appliance, you must have your username and password to access the NetQ UI to generate AuthKeys. These keys provide authorized access (access key) and user authentication (secret key). Your credentials and NetQ Cloud addresses were provided by Cumulus Networks via an email titled Welcome to Cumulus NetQ!

To generate AuthKeys:

  1. In your Internet browser, enter netq.cumulusnetworks.com into the address field to open the NetQ UI login page.

  2. Enter your username and password.

  3. From the Main Menu, select Management in the Admin column.

  4. Click Manage on the User Accounts card.

  5. Select your user and click above the table.

  6. Copy these keys to a safe place.

    The secret key is only shown once. If you don’t copy these, you will need to regenerate them and reconfigure CLI access.

    You can also save these keys to a YAML file for easy reference, and to avoid having to type or copy the key values. You can:

    • store the file wherever you like, for example in /home/cumulus/ or /etc/netq
    • name the file whatever you like, for example credentials.yml, creds.yml, or keys.yml

    BUT, the file must have the following format:

    access-key: <user-access-key-value-here>
    secret-key: <user-secret-key-value-here>
    

Now that you have your AuthKeys, use the following command to configure the CLI:

netq config add cli server <text-gateway-dest> [access-key <text-access-key> secret-key <text-secret-key> premises <text-premises-name> | cli-keys-file <text-key-file> premises <text-premises-name>] [vrf <text-vrf-name>] [port <text-gateway-port>]

Restart the CLI afterward to activate the configuration.

This example uses the individual access key, a premises of datacenterwest, and the default Cloud address, port and VRF. Be sure to replace the key values with your generated keys if you are using this example on your server.

root@ubuntu:~# sudo netq config add cli server api.netq.cumulusnetworks.com access-key 123452d9bc2850a1726f55534279dd3c8b3ec55e8b25144d4739dfddabe8149e secret-key /vAGywae2E4xVZg8F+HtS6h6yHliZbBP6HXU3J98765= premises datacenterwest
Successfully logged into NetQ cloud at api.netq.cumulusnetworks.com:443
Updated cli server api.netq.cumulusnetworks.com vrf default port 443. Please restart netqd (netq config restart cli)

root@ubuntu:~# sudo netq config restart cli
Restarting NetQ CLI... Success!

This example uses an optional keys file. Be sure to replace the keys filename and path with the full path and name of your keys file, and the datacenterwest premises name with your premises name if you are using this example on your server.

root@ubuntu:~# sudo netq config add cli server api.netq.cumulusnetworks.com cli-keys-file /home/netq/nq-cld-creds.yml premises datacenterwest
Successfully logged into NetQ cloud at api.netq.cumulusnetworks.com:443
Updated cli server api.netq.cumulusnetworks.com vrf default port 443. Please restart netqd (netq config restart cli)

root@ubuntu:~# sudo netq config restart cli
Restarting NetQ CLI... Success!

Rerun this command if you have multiple premises and want to query a different premises.

Configure NetQ CLI Using Configuration File

You can configure the NetQ CLI in the netq.yml configuration file contained in the /etc/netq/ directory.

  1. Open the netq.yml file using your text editor of choice. For example:
root@ubuntu:~# sudo nano /etc/netq/netq.yml
  1. Locate the netq-cli section, or add it.

  2. Set the parameters for the CLI as follows:

ParameterOn-premisesCloud
netq-userUser who can access the CLIUser who can access the CLI
serverIP address of the NetQ server or NetQ Applianceapi.netq.cumulusnetworks.com
port (default)32708443
premisesNAName of premises you want to query

An on-premises configuration should be similar to this:

netq-cli:
  netq-user: admin@company.com
  port: 32708
  server: 192.168.0.254

A cloud configuration should be similar to this:

netq-cli:
  netq-user: admin@company.com
  port: 443
  premises: datacenterwest
  server: api.netq.cumulusnetworks.com

Install and Configure the NetQ CLI on RHEL and CentOS Servers

After installing your Cumulus NetQ software and the NetQ 2.4.1 Agents on each switch you want to monitor, you can also install the NetQ CLI on servers running:

Prepare for NetQ CLI Installation on a RHEL or CentOS Server

For servers running RHEL or CentOS, you need to:

If your network uses a proxy server for external connections, you should first configure a global proxy so apt-get can access the software package in the Cumulus Networks repository.

Verify Service Package Versions

Before you install the NetQ CLI on a Red Hat or CentOS server, make sure the following packages are installed and running these minimum versions:

Verify the Server is Running lldpd and wget

Make sure you are running lldpd, not lldpad. CentOS does not include lldpd by default, nor does it include wget, which is required for the installation.

To install this package, run the following commands:

root@rhel7:~# sudo yum -y install epel-release
root@rhel7:~# sudo yum -y install lldpd
root@rhel7:~# sudo systemctl enable lldpd.service
root@rhel7:~# sudo systemctl start lldpd.service
root@rhel7:~# sudo yum install wget

Install and Configure NTP

If NTP is not already installed and configured, follow these steps:

  1. Install NTP on the server. Servers must be in time synchronization with the NetQ Platform or NetQ Appliance to enable useful statistical analysis.
root@rhel7:~# sudo yum install ntp
  1. Configure the NTP server.

    1. Open the /etc/ntp.conf file in your text editor of choice.

    2. Under the Server section, specify the NTP server IP address or hostname.

  2. Enable and start the NTP service.

root@rhel7:~# sudo systemctl enable ntp
root@rhel7:~# sudo systemctl start ntp

If you are running NTP in your out-of-band management network with VRF, specify the VRF (ntp@<vrf-name> versus just ntp) in the above commands.

  1. Verify NTP is operating correctly. Look for an asterisk (*) or a plus sign (+) that indicates the clock is synchronized.
root@rhel7:~# ntpq -pn
remote           refid            st t when poll reach   delay   offset  jitter
==============================================================================
+173.255.206.154 132.163.96.3     2 u   86  128  377   41.354    2.834   0.602
+12.167.151.2    198.148.79.209   3 u  103  128  377   13.395   -4.025   0.198
2a00:7600::41    .STEP.          16 u    - 1024    0    0.000    0.000   0.000
\*129.250.35.250 249.224.99.213   2 u  101  128  377   14.588   -0.299   0.243

Install NetQ CLI on a RHEL or CentOS Server

A simple process installs the NetQ CLI on a RHEL or CentOS server.

  1. Reference and update the local yum repository and key.
root@rhel7:~# rpm --import https://apps3.cumulusnetworks.com/setup/cumulus-apps-rpm.pubkey
root@rhel7:~# wget -O- https://apps3.cumulusnetworks.com/setup/cumulus-apps-rpm-el7.repo > /etc/yum.repos.d/cumulus-host-el.repo
  1. Edit /etc/yum.repos.d/cumulus-host-el.repo to set the enabled=1 flag for the two NetQ repositories.
root@rhel7:~# vi /etc/yum.repos.d/cumulus-host-el.repo
...
[cumulus-arch-netq-2.4]
name=Cumulus netq packages
baseurl=https://apps3.cumulusnetworks.com/repos/rpm/el/7/netq-2.4/$basearch
gpgcheck=1
enabled=1
[cumulus-noarch-netq-2.4]
name=Cumulus netq architecture-independent packages
baseurl=https://apps3.cumulusnetworks.com/repos/rpm/el/7/netq-2.4/noarch
gpgcheck=1
enabled=1
...
  1. Install the Bash completion and CLI software on the server.
root@rhel7:~# sudo yum -y install bash-completion
root@rhel7:~# sudo yum install netq-apps
  1. Verify you have the correct version of the CLI.
root@rhel7:~# rpm -q netq-apps

You should see version 2.4.1 and update 26 or later in the results. For example:

netq-apps-**2.4.1**-rh7u**26**~1581350236.c5ec3e5.x86_64.rpm 
  1. Continue with the next section.

Configure the NetQ CLI on a RHEL or CentOS Server

Two methods are available for configuring the NetQ CLI on a switch:

Configure NetQ CLI Using the CLI

The steps to configure the CLI are different depending on whether the NetQ software has been installed for an on-premises or cloud deployment. Follow the instructions for your deployment type.

Configure the CLI for On-premises Deployments

Use the following command to configure the CLI:

netq config add cli server <text-gateway-dest> [vrf <text-vrf-name>] [port <text-gateway-port>]

Restart the CLI afterward to activate the configuration.

This example uses an IP address of 192.168.1.0 and the default port and VRF.

root@rhel7:~# sudo netq config add cli server 192.168.1.0
root@rhel7:~# sudo netq config restart cli

If you have a server cluster deployed, use the IP address of the master server.

Configure the CLI for Cloud Deployments

To access and configure the CLI on your NetQ Platform or NetQ Cloud Appliance, you must have your username and password to access the NetQ UI to generate AuthKeys. These keys provide authorized access (access key) and user authentication (secret key). Your credentials and NetQ Cloud addresses were provided by Cumulus Networks via an email titled Welcome to Cumulus NetQ!

To generate AuthKeys:

  1. In your Internet browser, enter netq.cumulusnetworks.com into the address field to open the NetQ UI login page.

  2. Enter your username and password.

  3. From the Main Menu, select Management in the Admin column.

  4. Click Manage on the User Accounts card.

  5. Select your user and click above the table.

  6. Copy these keys to a safe place.

    The secret key is only shown once. If you don’t copy these, you will need to regenerate them and reconfigure CLI access.

    You can also save these keys to a YAML file for easy reference, and to avoid having to type or copy the key values. You can:

    • store the file wherever you like, for example in /home/cumulus/ or /etc/netq
    • name the file whatever you like, for example credentials.yml, creds.yml, or keys.yml

    BUT, the file must have the following format:

    access-key: <user-access-key-value-here>
    secret-key: <user-secret-key-value-here>
    

Now that you have your AuthKeys, use the following command to configure the CLI:

netq config add cli server <text-gateway-dest> [access-key <text-access-key> secret-key <text-secret-key> premises <text-premises-name> | cli-keys-file <text-key-file> premises <text-premises-name>] [vrf <text-vrf-name>] [port <text-gateway-port>]

Restart the CLI afterward to activate the configuration.

This example uses the individual access key, a premises of datacenterwest, and the default Cloud address, port and VRF. Be sure to replace the key values with your generated keys if you are using this example on your server.

root@rhel7:~# sudo netq config add cli server api.netq.cumulusnetworks.com access-key 123452d9bc2850a1726f55534279dd3c8b3ec55e8b25144d4739dfddabe8149e secret-key /vAGywae2E4xVZg8F+HtS6h6yHliZbBP6HXU3J98765= premises datacenterwest
Successfully logged into NetQ cloud at api.netq.cumulusnetworks.com:443
Updated cli server api.netq.cumulusnetworks.com vrf default port 443. Please restart netqd (netq config restart cli)

root@rhel7:~# sudo netq config restart cli
Restarting NetQ CLI... Success!

This example uses an optional keys file. Be sure to replace the keys filename and path with the full path and name of your keys file, and the datacenterwest premises name with your premises name if you are using this example on your server.

root@rhel7:~# sudo netq config add cli server api.netq.cumulusnetworks.com cli-keys-file /home/netq/nq-cld-creds.yml premises datacenterwest
Successfully logged into NetQ cloud at api.netq.cumulusnetworks.com:443
Updated cli server api.netq.cumulusnetworks.com vrf default port 443. Please restart netqd (netq config restart cli)

root@rhel7:~# sudo netq config restart cli
Restarting NetQ CLI... Success!

Rerun this command if you have multiple premises and want to query a different premises.

Configure NetQ CLI Using Configuration File

You can configure the NetQ CLI in the netq.yml configuration file contained in the /etc/netq/ directory.

  1. Open the netq.yml file using your text editor of choice. For example:
root@rhel7:~# sudo nano /etc/netq/netq.yml
  1. Locate the netq-cli section, or add it.

  2. Set the parameters for the CLI as follows:

ParameterOn-premisesCloud
netq-userUser who can access the CLIUser who can access the CLI
serverIP address of the NetQ server or NetQ Applianceapi.netq.cumulusnetworks.com
port (default)32708443
premisesNAName of premises you want to query

An on-premises configuration should be similar to this:

netq-cli:
  netq-user: admin@company.com
  port: 32708
  server: 192.168.0.254

A cloud configuration should be similar to this:

netq-cli:
  netq-user: admin@company.com
  port: 443
  premises: datacenterwest
  server: api.netq.cumulusnetworks.com

Upgrade NetQ

This topic describes how to upgrade from your current NetQ 2.4.0 installation to the NetQ 2.4.1 release to take advantage of new capabilities and bug fixes (refer to the release notes).

You must upgrade your NetQ Platform(s) or NetQ/NetQ Cloud Appliance(s) and the NetQ Agents on your monitored switches and hosts. If you want access to new and updated commands, you can upgrade the CLI on the NetQ Platform(s), NetQ/NetQ Cloud Appliance(s), and monitored switches and hosts as well.

To complete the upgrade for either an on-premises or a cloud deployment:

If you are currently running NetQ 2.4.0 and you installed the NetQ 2.4.0 Agent, you may need to update the agent. Refer to Update NetQ 2.4.0 Agents.

Upgrade the NetQ Platform

The first step in upgrading your NetQ 2.4.0 installation to NetQ 2.4.1 is to upgrade your NetQ Platform. This topic describes how to upgrade this for both on-premises and cloud deployments.

Prepare for Upgrade

Two important steps are required to prepare for upgrade of your NetQ Platform:

Optionally, you can choose to back up your NetQ Data before performing the upgrade.

To complete the preparation:

  1. Optionally back up your NetQ 2.4.0 data. Refer to Back Up Your NetQ Data.

  2. Download the relevant software.

    1. Go to the Cumulus Downloads page, and select NetQ from the Product list.

    2. Select 2.4 from the Version list, and then click 2.4.1 from the submenu.

    3. Select the relevant software from the HyperVisor/Platform list:

      Your Deployment TypeHypervisor/Platform SelectionDownloaded Filename
      NetQ On-premises Platform running KVMKVMNetQ-2.4.1.tgz
      NetQ Cloud Platform running KVMKVM (Cloud)NetQ-2.4.1-opta.tgz
      NetQ On-premises Platform running VMwareVMwareNetQ-2.4.1.tgz
      NetQ Cloud Platform running VMwareVMware (Cloud)NetQ-2.4.1-opta.tgz
      NetQ Appliance (on-premises)ApplianceNetQ-2.4.1.tgz
      NetQ Cloud ApplianceAppliance (Cloud)NetQ-2.4.1-opta.tgz
    4. Scroll down and click Download. For example: The NetQ Appliance images.

  3. Copy the file to the /mnt/installables/ directory on your hardware.

  4. Update the NetQ debian packages using the following three commands.

    cumulus@<hostname>:~$ sudo dpkg --remove --force-remove-reinstreq netq-apps netq-agent 2>/dev/null
    [sudo] password for cumulus:
    (Reading database ... 71621 files and directories currently installed.)
    Removing netq-apps (2.4.0-ub18.04u24~1577405296.fcf3c28) ...
    Removing netq-agent (2.4.0-ub18.04u24~1577405296.fcf3c28) ...
    Processing triggers for man-db (2.8.3-2ubuntu0.1) ...
    
    cumulus@<hostname>:~$ sudo apt-get update
    Get:1 http://apps3.cumulusnetworks.com/repos/deb bionic InRelease [13.8 kB]
    Get:2 http://apps3.cumulusnetworks.com/repos/deb bionic/netq-2.4 amd64 Packages [758 B]
    Hit:3 http://archive.ubuntu.com/ubuntu bionic InRelease
    Get:4 http://security.ubuntu.com/ubuntu bionic-security InRelease [88.7 kB]
    Get:5 http://archive.ubuntu.com/ubuntu bionic-updates InRelease [88.7 kB]
    ...
    Get:24 http://archive.ubuntu.com/ubuntu bionic-backports/universe Translation-en [1900 B]
    Fetched 4651 kB in 3s (1605 kB/s)
    Reading package lists... Done
    
    cumulus@<hostname>:~$ sudo apt-get install -y netq-agent netq-apps
    Reading package lists... Done
    Building dependency tree
    Reading state information... Done
    ...
    The following NEW packages will be installed:
    netq-agent netq-apps
    ...
    Fetched 39.8 MB in 3s (13.5 MB/s)
    ...
    Unpacking netq-agent (2.4.1-ub18.04u26~1581351889.c5ec3e5) ...
    ...
    Unpacking netq-apps (2.4.1-ub18.04u26~1581351889.c5ec3e5) ...
    Setting up netq-apps (2.4.1-ub18.04u26~1581351889.c5ec3e5) ...
    Setting up netq-agent (2.4.1-ub18.04u26~1581351889.c5ec3e5) ...
    Processing triggers for rsyslog (8.32.0-1ubuntu4) ...
    Processing triggers for man-db (2.8.3-2ubuntu0.1) ...
    

You can now upgrade your platform using the NetQ Admin UI, in the next section. Alternately, you can upgrade using the CLI here: Upgrade the NetQ Platform.

Upgrade Your Platform Using the NetQ Admin UI

After completing the preparation steps, upgrading your NetQ Platform(s) or NetQ Appliance(s) is simple using the Admin UI.

To upgrade your NetQ software:

  1. Run the bootstrap CLI to upgrade the Admin UI itself.

    On-premises Deployments
    cumulus@<hostname>:~$ netq bootstrap master upgrade /mnt/installables/NetQ-2.4.1.tgz
    2020-02-28 15:39:37.016710: master-node-installer: Extracting tarball /mnt/installables/NetQ-2.4.1.tgz
    2020-02-28 15:44:48.188658: master-node-installer: Upgrading NetQ Admin container
    2020-02-28 15:47:35.667579: master-node-installer: Removing old images
    -----------------------------------------------
    Successfully bootstrap-upgraded the master node
    
    Cloud Deployments
    netq bootstrap master upgrade /mnt/installables/NetQ-2.4.1-opta.tgz
    
  2. Open the Admin UI by entering http://<hostname-or-ipaddress>:8443 in your browser address field.

  3. Click Upgrade.

  4. Enter NetQ-2.4.1.tgz or NetQ-2.4.1-opta.tgz and click .

    The is only visible after you enter your tar file information.

  5. Monitor the progress. Click to monitor each step in the jobs.

    The following example is for an on-premises upgrade. The jobs for a cloud upgrade are slightly different.

  6. When it completes, click to be returned to the Health dashboard.

Upgrade Your Platform Using the NetQ CLI

After completing the preparation steps, upgrading your NetQ Platform(s) or NetQ Appliance(s) is simple using the NetQ CLI.

To upgrade your hardware:

  1. Run the appropriate netq upgrade command.

    On-premises Deployments
    netq upgrade bundle /mnt/installables/NetQ-2.4.1.tgz
    
    Cloud Deployments
    netq upgrade bundle /mnt/installables/NetQ-2.4.1-opta.tgz
    
  2. After the upgrade is completed, confirm the upgrade was successful.

    cat /etc/app-release
    

    The output should look like this:

    On-premisesCloud
    NetQ Platform
    • KVM:
      APPLIANCE_VERSION=2.4.1
      APPLIANCE_MANIFEST_HASH=E9361…12BE7
      APPLIANCE_NAME=”<NetQ Platform Name>“
    • VMware:
      APPLIANCE_VERSION=2.4.1
      APPLIANCE_MANIFEST_HASH=7916C…6D0EF
      APPLIANCE_NAME=”<NetQ Platform Name>“
    • KVM:
      APPLIANCE_VERSION=2.4.1
      APPLIANCE_MANIFEST_HASH=383E9…F4371
      APPLIANCE_NAME=”<NetQ Cloud Platform Name>“
    • VMware:
      APPLIANCE_VERSION=2.4.1
      APPLIANCE_MANIFEST_HASH=E6176…A3EA1
      APPLIANCE_NAME=”<NetQ Cloud Platform Name>“
    NetQ ApplianceAPPLIANCE_VERSION=2.4.1
    APPLIANCE_MANIFEST_HASH=ADB58…E6732
    APPLIANCE_NAME="NetQ Appliance”
    APPLIANCE_VERSION=2.4.1
    APPLIANCE_MANIFEST_HASH=4F50D…57FE1
    APPLIANCE_NAME="NetQ Cloud Appliance”

Upgrade NetQ Agents

With NetQ 2.4, there are a couple of instances when you should upgrade your NetQ Agents:

Upgrade NetQ Agents on Cumulus Linux Switches

The following instructions are applicable to both Cumulus Linux 3.x and 4.x, and for both on-premises and cloud deployments.

To upgrade the NetQ Agent:

  1. Log in to your switch or host.

  2. Update and install the new NetQ debian package.

    For Switches and Hosts Running Cumulus Linux or Ubuntu
    sudo apt-get update
    sudo apt-get install -y netq-agent
    
    For Hosts Running RHEL or CentOS
    sudo yum update
    sudo yum install netq-agent
    
  3. Restart the NetQ Agent.

 netq config restart agent

Refer to Install and Configure the NetQ Agent on Cumulus Linux Switches to complete the upgrade.

Upgrade NetQ Agents on Ubuntu Servers

The following instructions are applicable to both NetQ Platform and NetQ Appliances running Ubuntu 16.04 or 18.04 in on-premises and cloud deployments.

To upgrade the NetQ Agent:

  1. Log in to your NetQ Platform or Appliance.

  2. Update your NetQ repository.

root@ubuntu:~# sudo apt-get update
  1. Install the agent software.
root@ubuntu:~# sudo apt-get install -y netq-agent
  1. Restart the NetQ Agent.
root@ubuntu:~# netq config restart agent

Refer to Install and Configure the NetQ Agent on Ubuntu Servers to complete the upgrade.

Upgrade NetQ Agents on RHEL or CentOS Servers

The following instructions are applicable to both on-premises and cloud deployments.

To upgrade the NetQ Agent:

  1. Log in to your NetQ Platform.

  2. Update your NetQ repository.

root@rhel7:~# sudo yum update
  1. Install the agent software.
root@rhel7:~# sudo yum install netq-agent
  1. Restart the NetQ Agent.
root@rhel7:~# netq config restart agent

Refer to Install and Configure the NetQ Agent on RHEL and CentOS Servers to complete the upgrade.

Update NetQ 2.4.0 Agents

NetQ 2.4.0 requires a fresh installation. If you have already upgraded to NetQ 2.4.0, you may require an update to the NetQ Agent. Verify you have the latest version of the agent software, a version of 2.4.0 and an update of 25 or later.

For Switches Running Cumulus Linux 3.x or 4.x

Run the following command to view the NetQ Agent version.

cumulus@switch:~$ dpkg-query -W -f '${Package}\t${Version}\n' netq-agent

You should see:

If you do not see one of these, refer to Upgrade NetQ Agents on Cumulus Linux Switches.

For Servers Running Ubuntu 16.04 or 18.04

Run the following command to view the NetQ Agent version.

root@ubuntu:~# dpkg-query -W -f '${Package}\t${Version}\n' netq-agent

You should see:

If you do not see one of these, refer to Upgrade NetQ Agents on Ubuntu Servers.

For Servers Running RHEL7 or CentOS

Run the following command to view the NetQ Agent version.

root@rhel7:~# rpm -q -netq-agent

You should see:

If you do not see one of these, refer to Upgrade NetQ Agents on RHEL or CentOS Servers.

Upgrade NetQ CLI

While it is not required to upgrade the NetQ CLI on your monitored switches and hosts when you upgrade to NetQ 2.4.1, doing so gives you access to new features and important bug fixes. Refer to the release notes for details.

To upgrade the NetQ CLI:

  1. Log in to your switch or host.

  2. Update and install the new NetQ debian package.

    For Switches and Hosts Running Cumulus Linux or Ubuntu
    sudo apt-get update
    sudo apt-get install -y netq-apps
    
    For Hosts Running RHEL or CentOS
    sudo yum update
    sudo yum install netq-apps
    
  3. Restart the CLI.

netq config restart cli

To complete the upgrade, refer to the relevant configuration topic:

Back Up and Restore NetQ

It is recommended that you back up your NetQ data according to your company policy. Typically this includes after key configuration changes and on a scheduled basis.

These topics describe how to backup and also restore your NetQ data for on-premises NetQ Platforms (running on your hardware) or for the on-premises NetQ Appliance.

These procedures do not apply to your NetQ Cloud server or the NetQ Cloud Appliance. Data backup is handled automatically with the NetQ Cloud service.

Back Up Your NetQ Data

NetQ 2.x data is stored in a Cassandra database. A backup is performed by running scripts provided with the software and located in the /usr/sbin directory. When a backup is performed, a single tar file is created. The file is stored on a local drive that you specify and is named netq_master_snapshot_<timestamp>.tar.gz. Currently, only one backup file is supported, and includes the entire set of data tables. It is replaced each time a new backup is created.

To create a backup:

  1. Run the backup script to create a backup file in /opt/<backup-directory> being sure to replace the backup-directory option with the name of the directory you want to use for the backup file.

    cumulus@<netq-platform/netq-appliance>:~$ ./backuprestore.sh --backup --localdir /opt/<backup-directory>
    

    You can abbreviate the backup and localdir options of this command to -b and -l to reduce typing. If the backup directory identified does not already exist, the script creates the directory during the backup process.

    This is a sample of what you see as the script is running:

    [Fri 26 Jul 2019 02:35:35 PM UTC] - Received Inputs for backup ...
    [Fri 26 Jul 2019 02:35:36 PM UTC] - Able to find cassandra pod: cassandra-0
    [Fri 26 Jul 2019 02:35:36 PM UTC] - Continuing with the procedure ...
    [Fri 26 Jul 2019 02:35:36 PM UTC] - Removing the stale backup directory from cassandra pod...
    [Fri 26 Jul 2019 02:35:36 PM UTC] - Able to successfully cleanup up /opt/backuprestore from cassandra pod ...
    [Fri 26 Jul 2019 02:35:36 PM UTC] - Copying the backup script to cassandra pod ....
    /opt/backuprestore/createbackup.sh: line 1: cript: command not found
    [Fri 26 Jul 2019 02:35:48 PM UTC] - Able to exeute /opt/backuprestore/createbackup.sh script on cassandra pod
    [Fri 26 Jul 2019 02:35:48 PM UTC] - Creating local directory:/tmp/backuprestore/ ...  
    Directory /tmp/backuprestore/ already exists..cleaning up
    [Fri 26 Jul 2019 02:35:48 PM UTC] - Able to copy backup from cassandra pod  to local directory:/tmp/backuprestore/ ...
    [Fri 26 Jul 2019 02:35:48 PM UTC] - Validate the presence of backup file in directory:/tmp/backuprestore/
    [Fri 26 Jul 2019 02:35:48 PM UTC] - Able to find backup file:netq_master_snapshot_2019-07-26_14_35_37_UTC.tar.gz
    [Fri 26 Jul 2019 02:35:48 PM UTC] - Backup finished successfully!
    
  2. Verify the backup file has been created.

    cumulus@<netq-platform/netq-appliance>:~$ cd /opt/<backup-directory>
    cumulus@<netq-platform/netq-appliance>:~/opt/<backup-directory># ls
    netq_master_snapshot_2019-06-04_07_24_50_UTC.tar.gz
    

To create a scheduled backup, add ./backuprestore.sh --backup --localdir /opt/<backup-directory> to an existing cron job, or create a new one.

Restore Your NetQ Data

You can restore NetQ data using the backup file you created in Back Up Your NetQ Data. You can restore your instance to the same NetQ Platform or NetQ Appliance or to a new platform or appliance. You do not need to stop the server where the backup file resides to perform the restoration, but logins to the NetQ UI will fail during the restoration process.The restore option of the backup script, copies the data from the backup file to the database, decompresses it, verifies the restoration, and starts all necessary services. You should not see any data loss as a result of a restore operation.

To restore NetQ on the same hardware where the backup file resides:

  1. Log in to the NetQ server.

  2. Run the restore script being sure to replace the backup-directory option with the name of the directory where the backup file resides.

    cumulus@<netq-platform/netq-appliance>:~$ ./backuprestore.sh --restore --localdir /opt/<backup-directory>
    

    You can abbreviate the restore and localdir options of this command to -r and -l to reduce typing.

    This is a sample of what you see while the script is running:

    [Fri 26 Jul 2019 02:37:49 PM UTC] - Received Inputs for restore ...
    
    WARNING: Restore procedure wipes out the existing contents of Database.
      Once the Database is restored you loose the old data and cannot be recovered.
    "Do you like to continue with Database restore:[Y(yes)/N(no)]. (Default:N)"
    

    You must answer the above question to continue the restoration. After entering Y or yes, the output continues as follows:

    [Fri 26 Jul 2019 02:37:50 PM UTC] - Able to find cassandra pod: cassandra-0
    [Fri 26 Jul 2019 02:37:50 PM UTC] - Continuing with the procedure ...
    [Fri 26 Jul 2019 02:37:50 PM UTC] - Backup local directory:/tmp/backuprestore/ exists....
    [Fri 26 Jul 2019 02:37:50 PM UTC] - Removing any stale restore directories ...
    Copying the file for restore to cassandra pod ....
    [Fri 26 Jul 2019 02:37:50 PM UTC] - Able to copy the local directory contents to cassandra pod in /tmp/backuprestore/.
    [Fri 26 Jul 2019 02:37:50 PM UTC] - copying the script to cassandra pod in dir:/tmp/backuprestore/....
    Executing the Script for restoring the backup ...
    /tmp/backuprestore//createbackup.sh: line 1: cript: command not found
    [Fri 26 Jul 2019 02:40:12 PM UTC] - Able to exeute /tmp/backuprestore//createbackup.sh script on cassandra pod
    [Fri 26 Jul 2019 02:40:12 PM UTC] - Restore finished successfully!
    

To restore NetQ on new hardware:

  1. Copy the backup file from /opt/<backup-directory> on the older hardware to the backup directory on the new hardware.

  2. Run the restore script on the new hardware, being sure to replace the backup-directory option with the name of the directory where the backup file resides.

    cumulus@<netq-platform/netq-appliance>:~$ ./backuprestore.sh --restore --localdir /opt/<backup-directory>
    

Configuration Updates

After installation or upgrade of NetQ is complete, there are a few additional configuration tasks that might be required.

Add More Nodes to Your Server Cluster

Installation of NetQ with a server cluster sets up the master and two worker nodes. To expand your cluster to include up to a total of nine worker nodes, use the Admin UI.

To add more worker nodes:

  1. Prepare the nodes. Refer to the relevant server cluster instructions in Prepare for NetQ Cloud Installation.

  2. Open the Admin UI by entering https://<master-hostname-or-ipaddress>:8443 in your browser address field.

    This opens the Health dashboard for NetQ.

  3. Click Cluster to view your current configuration.

    This opens the Cluster dashboard, with the details about each node in the cluster.

  4. Click Add Worker Node.

  5. Enter the private IP address of the node you want to add.

  6. Click Add.

    Monitor the progress of the three jobs by clicking next to the jobs.

    On completion, a card for the new node is added to the Cluster dashboard.

    If the addition fails for any reason, download the log file by clicking , run netq bootstrap reset, and then try again.

  7. Repeat this process to add more worker nodes as needed.

Update Your Cloud Activation Key

The cloud activation key is the one used to access the Cloud services, not the authorization keys used for configuring the CLI. It is provided by Cumulus Networks when your premises is set up. It is called the config-key.

There are occasions where you might want to update your cloud service activation key. For example, if you mistyped the key during installation and now your existing key does not work, or you received a new key for your premises from Cumulus Networks.

To update the activation key, run the following command on your NetQ Platform replacing text-opta-key with your new key.

cumulus@switch:~$ netq install opta activate-job config-key <text-opta-key>

Cumulus NetQ Integration Guide

After you have completed the installation of Cumulus NetQ, you may want to configure some of the additional capabilities that NetQ offers or integrate it with third-party software or hardware.

This topic describes how to:

Integrate NetQ with Notification Applications

After you have installed the NetQ applications package and the NetQ Agents, you may want to configure some of the additional capabilities that NetQ offers. This topic describes how to integrate NetQ with an event notification application.

Integrate NetQ with an Event Notification Application

To take advantage of the numerous event messages generated and processed by NetQ, you must integrate with third-party event notification applications. You can integrate NetQ with Syslog, PagerDuty and Slack tools. You may integrate with one or more of these applications simultaneously.

Each network protocol and service in the NetQ Platform receives the raw data stream from the NetQ Agents, processes the data and delivers events to the Notification function. Notification then stores, filters and sends messages to any configured notification applications. Filters are based on rules you create. You must have at least one rule per filter. A select set of events can be triggered by a user-configured threshold.

You may choose to implement a proxy server (that sits between the NetQ Platform and the integration channels) that receives, processes and distributes the notifications rather than having them sent directly to the integration channel. If you use such a proxy, you must configure NetQ with the proxy information.

In either case, notifications are generated for the following types of events:

CategoryEvents
Network Protocols
  • BGP status and session state
  • CLAG (MLAG) status and session state
  • EVPN status and session state
  • LLDP status
  • LNV status and session state **
  • OSPF status and session state
  • VLAN status and session state *
  • VXLAN status and session state *
Interfaces
  • Link status
  • Ports and cables status
  • MTU status
Services
  • NetQ Agent status
  • PTM
  • SSH *
  • NTP status *
Traces
  • On-demand trace status
  • Scheduled trace status
Sensors
  • Fan status
  • PSU (power supply unit) status
  • Temperature status
System Software
  • Configuration File changes
  • Running Configuration File changes
  • Cumulus Linux License status
  • Cumulus Linux Support status
  • Software Package status
  • Operating System version
System Hardware
  • Physical resources status
  • BTRFS status
  • SSD utilization status
  • Threshold Crossing Alerts (TCAs)

* This type of event can only be viewed in the CLI with this release.

** This type of event is only visible when enabled in the CLI.

Refer to the Events Reference for descriptions and examples of these events.

Event Message Format

Messages have the following structure: <message-type><timestamp><opid><hostname><severity><message>

ElementDescription
message typeCategory of event; agent, bgp, clag, clsupport, configdiff, evpn, license, link, lldp, lnv, mtu, node, ntp, ospf, packageinfo, ptm, resource, runningconfigdiff, sensor, services, ssdutil, tca, trace, version, vlan or vxlan
timestampDate and time event occurred
opidIdentifier of the service or process that generated the event
hostnameHostname of network device where event occurred
severitySeverity level in which the given event is classified; debug, error, info, warning, or critical
messageText description of event

For example:

To set up the integrations, you must configure NetQ with at least one channel, one rule, and one filter. To refine what messages you want to view and where to send them, you can add additional rules and filters and set thresholds on supported event types. You can also configure a proxy server to receive, process, and forward the messages. This is accomplished using the NetQ CLI in the following order:

Notification Commands Overview

The NetQ Command Line Interface (CLI) is used to filter and send notifications to third-party tools based on severity, service, event-type, and device. You can use TAB completion or the help option to assist when needed.

The command syntax for standard events is:

##Channels
netq add notification channel slack <text-channel-name> webhook <text-webhook-url> [severity info|severity warning|severity error|severity debug] [tag <text-slack-tag>]
netq add notification channel pagerduty <text-channel-name> integration-key <text-integration-key> [severity info|severity warning|severity error|severity debug]
 
##Rules and Filters
netq add notification rule <text-rule-name> key <text-rule-key> value <text-rule-value>
netq add notification filter <text-filter-name> [severity info|severity warning|severity error|severity debug] [rule <text-rule-name-anchor>] [channel <text-channel-name-anchor>] [before <text-filter-name-anchor>|after <text-filter-name-anchor>]
 
##Management
netq del notification channel <text-channel-name-anchor>
netq del notification filter <text-filter-name-anchor>
netq del notification rule <text-rule-name-anchor>
netq show notification [channel|filter|rule] [json]

The command syntax for events with user-configurable thresholds is:

##Rules
netq add tca event_id <event-name> scope <regex-filter> [severity <critical|info>] threshold <value>

##Management
netq add tca tca_id <tca-rule-name> is_active <true|false>
netq add tca tca_id <tca-rule-name> channel drop <channel-name>
netq del tca tca_id <tca-rule-name>
netq show tca [tca_id <tca-rule-name>]

The command syntax for a server proxy is:

##Proxy
netq add notification proxy <text-proxy-hostname> [port <text-proxy-port>]
netq show notification proxy
netq del notification proxy

The various command options are described in the following sections where they are used.

Configure Basic NetQ Event Notification

The simplest configuration you can create is one that sends all events generated by all interfaces to a single notification application. This is described here. For more granular configurations and examples, refer to Configure Advanced NetQ Event Notifications.

A notification configuration must contain one channel, one rule, and one filter. Creation of the configuration follows this same path:

  1. Add a channel (slack, pagerduty, syslog)
  2. Add a rule that accepts all interface events
  3. Add a filter that associates this rule with the newly created channel

Create Your Channel

For Pager Duty–

Configure a channel using the integration key for your Pager Duty setup. Verify the configuration.

```
cumulus@switch:~$ netq add notification channel pagerduty pd-netq-events integration-key c6d666e210a8425298ef7abde0d1998
Successfully added/updated channel pd-netq-events

cumulus@switch:~$ netq show notification channel
Matching config_notify records:
Name            Type             Severity         Channel Info
--------------- ---------------- ---------------- ------------------------
pd-netq-events  pagerduty        info             integration-key: c6d666e
                                                210a8425298ef7abde0d1998      
```

For Slack–

Create an *incoming webhook* as described in the documentation for your version of Slack. Verify the configuration.

    ```
    cumulus@switch:~$ netq add notification channel slack slk-netq-events webhook https://hooks.slack.com/services/text/moretext/evenmoretext
    Successfully added/updated channel slk-netq-events
    
    cumulus@switch:~$ netq show notification channel
    Matching config_notify records:
    Name            Type             Severity Channel Info
    --------------- ---------------- -------- ----------------------
    slk-netq-events slack            info     webhook:https://hooks.s
                                            lack.com/services/text/
                                            moretext/evenmoretext
    ```

For Syslog–

Create the channel using the syslog server hostname (or IP address) and port. Verify the configuration.

    ```
    cumulus@switch:~$ netq add notification channel syslog syslog-netq-events hostname syslog-server port 514
    Successfully added/updated channel syslog-netq-events
    
    cumulus@switch:~$ netq show notification channel
    Matching config_notify records:
    Name            Type             Severity Channel Info
    --------------- ---------------- -------- ----------------------
    syslog-netq-eve syslog            info     host:syslog-server
    nts                                        port: 514
    ```

Create a Rule

Create and verify a rule that accepts all interface events. Verify the configuration.

```
cumulus@switch:~$ netq add notification rule all-ifs key ifname value ALL
Successfully added/updated rule all-ifs

cumulus@switch:~$ netq show notification rule
Matching config_notify records:
Name            Rule Key         Rule Value
--------------- ---------------- --------------------
all-interfaces  ifname           ALL
```

Create a Filter

Create a filter to tie the rule to the channel. Verify the configuration.

For PagerDuty–

```
cumulus@switch:~$ netq add notification filter notify-all-ifs rule all-ifs channel pd-netq-events
Successfully added/updated filter notify-all-ifs

cumulus@switch:~$ netq show notification filter
Matching config_notify records:
Name            Order      Severity         Channels         Rules
--------------- ---------- ---------------- ---------------- ----------
notify-all-ifs  1          info             pd-netq-events   all-ifs
```

For Slack–

```
cumulus@switch:~$ netq add notification filter notify-all-ifs rule all-ifs channel slk-netq-events
Successfully added/updated filter notify-all-ifs

cumulus@switch:~$ netq show notification filter
Matching config_notify records:
Name            Order      Severity         Channels         Rules
--------------- ---------- ---------------- ---------------- ----------
notify-all-ifs  1          info             slk-netq-events   all-ifs
```

For Syslog–

```
cumulus@switch:~$ netq add notification filter notify-all-ifs rule all-ifs channel syslog-netq-events
Successfully added/updated filter notify-all-ifs

cumulus@switch:~$ netq show notification filter
Matching config_notify records:
Name            Order      Severity         Channels         Rules
--------------- ---------- ---------------- ---------------- ----------
notify-all-ifs  1          info             syslog-netq-events all-ifs
```

NetQ is now configured to send all interface events to your selected channel.

Configure Advanced NetQ Event Notifications

If you want to create more granular notifications based on such items as selected devices, characteristics of devices, or protocols, or you want to use a proxy server, you need more than the basic notification configuration. Details for creating these more complex notification configurations are included here.

Configure a Proxy Server

To send notification messages through a proxy server instead of directly to a notification channel, you configure NetQ with the hostname and optionally a port of a proxy server. If no port is specified, NetQ defaults to port 80. Only one proxy server is currently supported. To simplify deployment, configure your proxy server before configuring channels, rules, or filters.To configure the proxy server:

cumulus@switch:~$ netq add notification proxy <text-proxy-hostname> [port <text-proxy-port]
cumulus@switch:~$ netq add notification proxy proxy4
Successfully configured notifier proxy proxy4:80

You can view the proxy server settings by running the netq show notification proxy command.

cumulus@switch:~$ netq show notification proxy
Matching config_notify records:
Proxy URL          Slack Enabled              PagerDuty Enabled
------------------ -------------------------- ----------------------------------
proxy4:80          yes                        yes

You can remove the proxy server by running the netq del notification proxy command. This changes the NetQ behavior to send events directly to the notification channels.

cumulus@switch:~$ netq del notification proxy
Successfully overwrote notifier proxy to null

Create Channels

Create one or more PagerDuty, Slack, or syslog channels to present the notifications.

Configure a PagerDuty Channel

NetQ sends notifications to PagerDuty as PagerDuty events.

For example:

To configure the NetQ notifier to send notifications to PagerDuty:

  1. Configure the following options using the netq add notification channel command:

    OptionDescription
    CHANNEL_TYPE <text-channel-name>The third-party notification channel and name; use pagerduty in this case.
    integration-key <text-integration-key>The integration key is also called the service_key or routing_key. The default is an empty string ("").
    severity(Optional) The log level to set, which can be one of info, warning, error, critical or debug. The severity defaults to info.
    cumulus@switch:~$ netq add notification channel pagerduty pd-netq-events integration-key c6d666e210a8425298ef7abde0d1998
    Successfully added/updated channel pd-netq-events
    
  2. Verify that the channel is configured properly.

    cumulus@switch:~$ netq show notification channel
    Matching config_notify records:
    Name            Type             Severity         Channel Info
    --------------- ---------------- ---------------- ------------------------
    pd-netq-events  pagerduty        info             integration-key: c6d666e
                                                      210a8425298ef7abde0d1998      
    

Configure a Slack Channel

NetQ Notifier sends notifications to Slack as incoming webhooks for a Slack channel you configure. For example:

To configure NetQ to send notifications to Slack:

  1. If needed, create one or more Slack channels on which to receive the notifications.

    1. Click + next to Channels.
    2. Enter a name for the channel, and click Create Channel.
    3. Navigate to the new channel.
    4. Click + Add an app link below the channel name to open the application directory.
    5. In the search box, start typing incoming and select ** Incoming WebHooks when it appears.
    6. Click Add Configuration and enter the name of the channel you created (where you want to post notifications).
    7. Click Add Incoming WebHooks integration.
    8. Save WebHook URL in a text file for use in next step.
  2. Configure the following options in the netq config add notification channel command:

    Option

    Description

    CHANNEL_TYPE <text-channel-name>

    The third-party notification channel name; use slack in this case.

    WEBHOOK

    Copy the WebHook URL from the text file OR in the desired channel, locate the initial message indicating the addition of the webhook, click incoming-webhook link, click Settings.

    Example URL: https://hooks.slack.com/services/text/moretext/evenmoretext

    severity

    The log level to set, which can be one of error, warning, info, or debug. The severity defaults to info.

    tag

    Optional tag appended to the Slack notification to highlight particular channels or people. The tag value must be preceded by the @ sign. For example, @netq-info.

    cumulus@switch:~$ netq add notification channel slack slk-netq-events webhook https://hooks.slack.com/services/text/moretext/evenmoretext
    Successfully added/updated channel netq-events
    
  3. Verify the channel is configured correctly.
    From the CLI:

    cumulus@switch:~$ netq show notification channel
    Matching config_notify records:
    Name            Type             Severity Channel Info
    --------------- ---------------- -------- ----------------------
    slk-netq-events slack            info     webhook:https://hooks.s
                                              lack.com/services/text/
                                              moretext/evenmoretext
    

    From the Slack Channel:

Create Rules

Each rule is comprised of a single key-value pair. The key-value pair indicates what messages to include or drop from event information sent to a notification channel. You can create more than one rule for a single filter. Creating multiple rules for a given filter can provide a very defined filter. For example, you can specify rules around hostnames or interface names, enabling you to filter messages specific to those hosts or interfaces. You should have already defined the PagerDuty or Slack channels (as described earlier).

There is a fixed set of valid rule keys. Values are entered as regular expressions and vary according to your deployment.

Service

Rule Key

Description

Example Rule Values

BGP

message_type

Network protocol or service identifier

bgp

hostname

User-defined, text-based name for a switch or host

server02, leaf11, exit01, spine-4

peer

User-defined, text-based name for a peer switch or host

server4, leaf-3, exit02, spine06

desc

Text description

vrf

Name of VRF interface

mgmt, default

old_state

Previous state of the BGP service

Established, Failed

new_state

Current state of the BGP service

Established, Failed

old_last_reset_time

Previous time that BGP service was reset

Apr3, 2019, 4:17 pm

new_last_reset_time

Most recent time that BGP service was reset

Apr8, 2019, 11:38 am

MLAG (CLAG)

message_type

Network protocol or service identifier

clag

hostname

User-defined, text-based name for a switch or host

server02, leaf-9, exit01, spine04

old_conflicted_bonds

Previous pair of interfaces in a conflicted bond

swp7 swp8, swp3 swp4

new_conflicted_bonds

Current pair of interfaces in a conflicted bond

swp11 swp12, swp23 swp24

old_state_protodownbond

Previous state of the bond

protodown, up

new_state_protodownbond

Current state of the bond

protodown, up

ConfigDiff

message_type

Network protocol or service identifier

configdiff

hostname

User-defined, text-based name for a switch or host

server02, leaf11, exit01, spine-4

vni

Virtual Network Instance identifier

12, 23

old_state

Previous state of the configuration file

created, modified

new_state

Current state of the configuration file

created, modified

EVPN

message_type

Network protocol or service identifier

evpn

hostname

User-defined, text-based name for a switch or host

server02, leaf-9, exit01, spine04

vni

Virtual Network Instance identifier

12, 23

old_in_kernel_state

Previous VNI state, in kernel or not

true, false

new_in_kernel_state

Current VNI state, in kernel or not

true, false

old_adv_all_vni_state

Previous VNI advertising state, advertising all or not

true, false

new_adv_all_vni_state

Current VNI advertising state, advertising all or not

true, false

Link

message_type

Network protocol or service identifier

link

hostname

User-defined, text-based name for a switch or host

server02, leaf-6, exit01, spine7

ifname

Software interface name

eth0, swp53

LLDP

message_type

Network protocol or service identifier

lldp

hostname

User-defined, text-based name for a switch or host

server02, leaf41, exit01, spine-5, tor-36

ifname

Software interface name

eth1, swp12

old_peer_ifname

Previous software interface name

eth1, swp12, swp27

new_peer_ifname

Curent software interface name

eth1, swp12, swp27

old_peer_hostname

Previous user-defined, text-based name for a peer switch or host

server02, leaf41, exit01, spine-5, tor-36

new_peer_hostname

Current user-defined, text-based name for a peer switch or host

server02, leaf41, exit01, spine-5, tor-36

Node

message_type

Network protocol or service identifier

node

hostname

User-defined, text-based name for a switch or host

server02, leaf41, exit01, spine-5, tor-36

ntp_state

Current state of NTP service

in sync, not sync

db_state

Current state of DB

Add, Update, Del, Dead

NTP

message_type

Network protocol or service identifier

ntp

hostname

User-defined, text-based name for a switch or host

server02, leaf-9, exit01, spine04

old_state

Previous state of service

in sync, not sync

new_state

Current state of service

in sync, not sync

Port

message_type

Network protocol or service identifier

port

hostname

User-defined, text-based name for a switch or host

server02, leaf13, exit01, spine-8, tor-36

ifname

Interface name

eth0, swp14

old_speed

Previous speed rating of port

10 G, 25 G, 40 G, unknown

old_transreceiver

Previous transceiver

40G Base-CR4, 25G Base-CR

old_vendor_name

Previous vendor name of installed port module

Amphenol, OEM, Mellanox, Fiberstore, Finisar

old_serial_number

Previous serial number of installed port module

MT1507VS05177, AVE1823402U, PTN1VH2

old_supported_fec

Previous forward error correction (FEC) support status

none, Base R, RS

old_advertised_fec

Previous FEC advertising state

true, false, not reported

old_fec

Previous FEC capability

none

old_autoneg

Previous activation state of auto-negotiation

on, off

new_speed

Current speed rating of port

10 G, 25 G, 40 G

new_transreceiver

Current transceiver

40G Base-CR4, 25G Base-CR

new_vendor_name

Current vendor name of installed port module

Amphenol, OEM, Mellanox, Fiberstore, Finisar

new_part_number

Current part number of installed port module

SFP-H10GB-CU1M, MC3309130-001, 603020003

new_serial_number

Current serial number of installed port module

MT1507VS05177, AVE1823402U, PTN1VH2

new_supported_fec

Current FEC support status

none, Base R, RS

new_advertised_fec

Current FEC advertising state

true, false

new_fec

Current FEC capability

none

new_autoneg

Current activation state of auto-negotiation

on, off

Sensors

sensor

Network protocol or service identifier

Fan: fan1, fan-2
Power Supply Unit: psu1, psu2
Temperature: psu1temp1, temp2

hostname

User-defined, text-based name for a switch or host

server02, leaf-26, exit01, spine2-4

old_state

Previous state of a fan, power supply unit, or thermal sensor

Fan: ok, absent, bad
PSU: ok, absent, bad
Temp: ok, busted, bad, critical

new_state

Current state of a fan, power supply unit, or thermal sensor

Fan: ok, absent, bad
PSU: ok, absent, bad
Temp: ok, busted, bad, critical

old_s_state

Previous state of a fan or power supply unit.

Fan: up, down
PSU: up, down

new_s_state

Current state of a fan or power supply unit.

Fan: up, down
PSU: up, down

new_s_max

Current maximum temperature threshold value

Temp: 110

new_s_crit

Current critical high temperature threshold value

Temp: 85

new_s_lcrit

Current critical low temperature threshold value

Temp: -25

new_s_min

Current minimum temperature threshold value

Temp: -50

Services

message_type

Network protocol or service identifier

services

hostname

User-defined, text-based name for a switch or host

server02, leaf03, exit01, spine-8

name

Name of service

clagd, lldpd, ssh, ntp, netqd, net-agent

old_pid

Previous process or service identifier

12323, 52941

new_pid

Current process or service identifier

12323, 52941

old_status

Previous status of service

up, down

new_status

Current status of service

up, down

Rule names are case sensitive, and no wildcards are permitted. Rule names may contain spaces, but must be enclosed with single quotes in commands. It is easier to use dashes in place of spaces or mixed case for better readability. For example, use bgpSessionChanges or BGP-session-changes or BGPsessions, instead of ‘BGP Session Changes’.

Use Tab completion to view the command options syntax.

Example Rules

Create a BGP Rule Based on Hostname:

cumulus@switch:~$ netq add notification rule bgpHostname key hostname value spine-01
Successfully added/updated rule bgpHostname 

Create a Rule Based on a Configuration File State Change:

cumulus@switch:~$ netq add notification rule sysconf key configdiff value updated
Successfully added/updated rule sysconf

Create an EVPN Rule Based on a VNI:

cumulus@switch:~$ netq add notification rule evpnVni key vni value 42
Successfully added/updated rule evpnVni

Create an Interface Rule Based on FEC Support:

cumulus@switch:~$ netq add notification rule fecSupport key new_supported_fec value supported
Successfully added/updated rule fecSupport

Create a Service Rule Based on a Status Change:

cumulus@switch:~$ netq add notification rule svcStatus key new_status value down
Successfully added/updated rule svcStatus

Create a Sensor Rule Based on a Threshold:

cumulus@switch:~$ netq add notification rule overTemp key new_s_crit value 24
Successfully added/updated rule overTemp

Create an Interface Rule Based on Port:

cumulus@switch:~$ netq add notification rule swp52 key port value swp52
Successfully added/updated rule swp52 

View the Rule Configurations

Use the netq show notification command to view the rules on your platform.

cumulus@switch:~$ netq show notification rule
 
Matching config_notify records:
Name            Rule Key         Rule Value
--------------- ---------------- --------------------
bgpHostname     hostname         spine-01
evpnVni         vni              42
fecSupport      new_supported_fe supported
                c
overTemp        new_s_crit       24
svcStatus       new_status       down
swp52           port             swp52
sysconf         configdiff       updated

Create Filters

You can limit or direct event messages using filters. Filters are created based on rules you define; like those in the previous section. Each filter contains one or more rules. When a message matches the rule, it is sent to the indicated destination. Before you can create filters, you need to have already defined the rules and configured PagerDuty and/or Slack channels (as described earlier).

As filters are created, they are added to the bottom of a filter list. By default, filters are processed in the order they appear in this list (from top to bottom) until a match is found. This means that each event message is first evaluated by the first filter listed, and if it matches then it is processed, ignoring all other filters, and the system moves on to the next event message received. If the event does not match the first filter, it is tested against the second filter, and if it matches then it is processed and the system moves on to the next event received. And so forth. Events that do not match any filter are ignored.

You may need to change the order of filters in the list to ensure you capture the events you want and drop the events you do not want. This is possible using the before or after keywords to ensure one rule is processed before or after another.

This diagram shows an example with four defined filters with sample output results.

Filter names may contain spaces, but must be enclosed with single quotes in commands. It is easier to use dashes in place of spaces or mixed case for better readability. For example, use bgpSessionChanges or BGP-session-changes or BGPsessions, instead of ‘BGP Session Changes’. Filter names are also case sensitive.

Example Filters

Create a filter for BGP Events on a Particular Device:

cumulus@switch:~$ netq add notification filter bgpSpine rule bgpHostname channel pd-netq-events
Successfully added/updated filter bgpSpine

Create a Filter for a Given VNI in Your EVPN Overlay:

cumulus@switch:~$ netq add notification filter vni42 severity warning rule evpnVni channel pd-netq-events
Successfully added/updated filter vni42

Create a Filter for when a Configuration File has been Updated:

cumulus@switch:~$ netq add notification filter configChange severity info rule sysconf channel slk-netq-events
Successfully added/updated filter configChange

Create a Filter to Monitor Ports with FEC Support:

cumulus@switch:~$ netq add notification filter newFEC rule fecSupport channel slk-netq-events
Successfully added/updated filter newFEC

Create a Filter to Monitor for Services that Change to a Down State:

cumulus@switch:~$ netq add notification filter svcDown severity error rule svcStatus channel slk-netq-events
Successfully added/updated filter svcDown

Create a Filter to Monitor Overheating Platforms:

cumulus@switch:~$ netq add notification filter critTemp severity error rule overTemp channel pd-netq-events
Successfully added/updated filter critTemp

Create a Filter to Drop Messages from a Given Interface, and match against this filter before any other filters. To create a drop style filter, do not specify a channel. To put the filter first, use the before option.

cumulus@switch:~$ netq add notification filter swp52Drop severity error rule swp52 before bgpSpine
Successfully added/updated filter swp52Drop

View the Filter Configurations

Use the netq show notification command to view the filters on your platform.

cumulus@switch:~$ netq show notification filter
Matching config_notify records:
Name            Order      Severity         Channels         Rules
--------------- ---------- ---------------- ---------------- ----------
swp52Drop       1          error            NetqDefaultChann swp52
                                            el
bgpSpine        2          info             pd-netq-events   bgpHostnam
                                                             e
vni42           3          warning          pd-netq-events   evpnVni
configChange    4          info             slk-netq-events  sysconf
newFEC          5          info             slk-netq-events  fecSupport
svcDown         6          critical         slk-netq-events  svcStatus
critTemp        7          critical         pd-netq-events   overTemp

Reorder Filters

When you look at the results of the netq show notification filter command above, you might notice that although you have the drop-based filter first (no point in looking at something you are going to drop anyway, so that is good), but the critical severity events are processed last, per the current definitions. If you wanted to process those before lesser severity events, you can reorder the list using the before and after options.

For example, to put the two critical severity event filters just below the drop filter:

cumulus@switch:~$ netq add notification filter critTemp after swp52Drop
Successfully added/updated filter critTemp
cumulus@switch:~$ netq add notification filter svcDown before bgpSpine
Successfully added/updated filter svcDown

You do not need to reenter all the severity, channel, and rule information for existing rules if you only want to change their processing order.

Run the netq show notification command again to verify the changes:

cumulus@switch:~$ netq show notification filter
Matching config_notify records:
Name            Order      Severity         Channels         Rules
--------------- ---------- ---------------- ---------------- ----------
swp52Drop       1          error            NetqDefaultChann swp52
                                            el
critTemp        2          critical         pd-netq-events   overTemp
svcDown         3          critical         slk-netq-events  svcStatus
bgpSpine        4          info             pd-netq-events   bgpHostnam
                                                             e
vni42           5          warning          pd-netq-events   evpnVni
configChange    6          info             slk-netq-events  sysconf
newFEC          7          info             slk-netq-events  fecSupport

Examples of Advanced Notification Configurations

Putting all of these channel, rule, and filter definitions together you create a complete notification configuration. The following are example notification configurations are created using the three-step process outlined above. Refer to Integrate NetQ with an Event Notification Application for details and instructions for creating channels, rules, and filters.

Create a Notification for BGP Events from a Selected Switch

In this example, we created a notification integration with a PagerDuty channel called pd-netq-events. We then created a rule bgpHostname and a filter called 4bgpSpine for any notifications from spine-01. The result is that any info severity event messages from Spine-01 are filtered to the pd-netq-events ** channel.

cumulus@switch:~$ netq add notification channel pagerduty pd-netq-events integration-key 1234567890
Successfully added/updated channel pd-netq-events
cumulus@switch:~$ netq add notification rule bgpHostname key node value spine-01
Successfully added/updated rule bgpHostname
 
cumulus@switch:~$ netq add notification filter bgpSpine rule bgpHostname channel pd-netq-events
Successfully added/updated filter bgpSpine
cumulus@switch:~$ netq show notification channel
Matching config_notify records:
Name            Type             Severity         Channel Info
--------------- ---------------- ---------------- ------------------------
pd-netq-events  pagerduty        info             integration-key: 1234567
                                                  890   

cumulus@switch:~$ netq show notification rule
Matching config_notify records:
Name            Rule Key         Rule Value
--------------- ---------------- --------------------
bgpHostname     hostname         spine-01
 
cumulus@switch:~$ netq show notification filter
Matching config_notify records:
Name            Order      Severity         Channels         Rules
--------------- ---------- ---------------- ---------------- ----------
bgpSpine        1          info             pd-netq-events   bgpHostnam
                                                             e

Create a Notification for Warnings on a Given EVPN VNI

In this example, we created a notification integration with a PagerDuty channel called pd-netq-events. We then created a rule evpnVni and a filter called 3vni42 for any warnings messages from VNI 42 on the EVPN overlay network. The result is that any warning severity event messages from VNI 42 are filtered to the pd-netq-events channel.

cumulus@switch:~$ netq add notification channel pagerduty pd-netq-events integration-key 1234567890
Successfully added/updated channel pd-netq-events
 
cumulus@switch:~$ netq add notification rule evpnVni key vni value 42
Successfully added/updated rule evpnVni
 
cumulus@switch:~$ netq add notification filter vni42 rule evpnVni channel pd-netq-events
Successfully added/updated filter vni42
 
cumulus@switch:~$ netq show notification channel
Matching config_notify records:
Name            Type             Severity         Channel Info
--------------- ---------------- ---------------- ------------------------
pd-netq-events  pagerduty        info             integration-key: 1234567
                                                  890   

cumulus@switch:~$ netq show notification rule
Matching config_notify records:
Name            Rule Key         Rule Value
--------------- ---------------- --------------------
bgpHostname     hostname         spine-01
evpnVni         vni              42
 
cumulus@switch:~$ netq show notification filter
Matching config_notify records:
Name            Order      Severity         Channels         Rules
--------------- ---------- ---------------- ---------------- ----------
bgpSpine        1          info             pd-netq-events   bgpHostnam
                                                             e
vni42           2          warning          pd-netq-events   evpnVni

Create a Notification for Configuration File Changes

In this example, we created a notification integration with a Slack channel called slk-netq-events. We then created a rule sysconf and a filter called configChange for any configuration file update messages. The result is that any configuration update messages are filtered to the slk-netq-events channel.

cumulus@switch:~$ netq add notification channel slack slk-netq-events webhook https://hooks.slack.com/services/text/moretext/evenmoretext
Successfully added/updated channel slk-netq-events
 
cumulus@switch:~$ netq add notification rule sysconf key configdiff value updated
Successfully added/updated rule sysconf
 
cumulus@switch:~$ netq add notification filter configChange severity info rule sysconf channel slk-netq-events
Successfully added/updated filter configChange
 
cumulus@switch:~$ netq show notification channel
Matching config_notify records:
Name            Type             Severity Channel Info
--------------- ---------------- -------- ----------------------
slk-netq-events slack            info     webhook:https://hooks.s
                                          lack.com/services/text/
                                          moretext/evenmoretext     
 
cumulus@switch:~$ netq show notification rule
Matching config_notify records:
Name            Rule Key         Rule Value
--------------- ---------------- --------------------
bgpHostname     hostname         spine-01
evpnVni         vni              42
sysconf         configdiff       updated

cumulus@switch:~$ netq show notification filter
Matching config_notify records:
Name            Order      Severity         Channels         Rules
--------------- ---------- ---------------- ---------------- ----------
bgpSpine        1          info             pd-netq-events   bgpHostnam
                                                             e
vni42           2          warning          pd-netq-events   evpnVni
configChange    3          info             slk-netq-events  sysconf

Create a Notification for When a Service Goes Down

In this example, we created a notification integration with a Slack channel called slk-netq-events. We then created a rule svcStatus and a filter called svcDown for any services state messages indicating a service is no longer operational. The result is that any service down messages are filtered to the slk-netq-events channel.

cumulus@switch:~$ netq add notification channel slack slk-netq-events webhook https://hooks.slack.com/services/text/moretext/evenmoretext
Successfully added/updated channel slk-netq-events
 
cumulus@switch:~$ netq add notification rule svcStatus key new_status value down
Successfully added/updated rule svcStatus
 
cumulus@switch:~$ netq add notification filter svcDown severity error rule svcStatus channel slk-netq-events
Successfully added/updated filter svcDown
 
cumulus@switch:~$ netq show notification channel
Matching config_notify records:
Name            Type             Severity Channel Info
--------------- ---------------- -------- ----------------------
slk-netq-events slack            info     webhook:https://hooks.s
                                          lack.com/services/text/
                                          moretext/evenmoretext     
 
cumulus@switch:~$ netq show notification rule
Matching config_notify records:
Name            Rule Key         Rule Value
--------------- ---------------- --------------------
bgpHostname     hostname         spine-01
evpnVni         vni              42
svcStatus       new_status       down
sysconf         configdiff       updated

cumulus@switch:~$ netq show notification filter
Matching config_notify records:
Name            Order      Severity         Channels         Rules
--------------- ---------- ---------------- ---------------- ----------
bgpSpine        1          info             pd-netq-events   bgpHostnam
                                                             e
vni42           2          warning          pd-netq-events   evpnVni
configChange    3          info             slk-netq-events  sysconf
svcDown         4          critical         slk-netq-events  svcStatus

Create a Filter to Drop Notifications from a Given Interface

In this example, we created a notification integration with a Slack channel called slk-netq-events. We then created a rule swp52 and a filter called swp52Drop that drops all notifications for events from interface swp52.

cumulus@switch:~$ netq add notification channel slack slk-netq-events webhook https://hooks.slack.com/services/text/moretext/evenmoretext
Successfully added/updated channel slk-netq-events
 
cumulus@switch:~$ netq add notification rule swp52 key port value swp52
Successfully added/updated rule swp52
 
cumulus@switch:~$ netq add notification filter swp52Drop severity error rule swp52 before bgpSpine
Successfully added/updated filter swp52Drop
 
cumulus@switch:~$ netq show notification channel
Matching config_notify records:
Name            Type             Severity Channel Info
--------------- ---------------- -------- ----------------------
slk-netq-events slack            info     webhook:https://hooks.s
                                          lack.com/services/text/
                                          moretext/evenmoretext     
 
cumulus@switch:~$ netq show notification rule
Matching config_notify records:
Name            Rule Key         Rule Value
--------------- ---------------- --------------------
bgpHostname     hostname         spine-01
evpnVni         vni              42
svcStatus       new_status       down
swp52           port             swp52
sysconf         configdiff       updated

cumulus@switch:~$ netq show notification filter
Matching config_notify records:
Name            Order      Severity         Channels         Rules
--------------- ---------- ---------------- ---------------- ----------
swp52Drop       1          error            NetqDefaultChann swp52
                                            el
bgpSpine        2          info             pd-netq-events   bgpHostnam
                                                             e
vni42           3          warning          pd-netq-events   evpnVni
configChange    4          info             slk-netq-events  sysconf
svcDown         5          critical         slk-netq-events  svcStatus

Create a Notification for a Given Device that has a Tendency to Overheat (using multiple rules)

In this example, we created a notification when switch leaf04 has passed over the high temperature threshold. Two rules were needed to create this notification, one to identify the specific device and one to identify the temperature trigger. We sent the message to the pd-netq-events channel.

cumulus@switch:~$ netq add notification channel pagerduty pd-netq-events integration-key 1234567890
Successfully added/updated channel pd-netq-events
 
cumulus@switch:~$ netq add notification rule switchLeaf04 key hostname value leaf04
Successfully added/updated rule switchLeaf04
cumulus@switch:~$ netq add notification rule overTemp key new_s_crit value 24
Successfully added/updated rule overTemp
 
cumulus@switch:~$ netq add notification filter critTemp rule switchLeaf04 channel pd-netq-events
Successfully added/updated filter critTemp
cumulus@switch:~$ netq add notification filter critTemp severity critical rule overTemp channel pd-netq-events
Successfully added/updated filter critTemp
 
cumulus@switch:~$ netq show notification channel
Matching config_notify records:
Name            Type             Severity         Channel Info
--------------- ---------------- ---------------- ------------------------
pd-netq-events  pagerduty        info             integration-key: 1234567
                                                  890

cumulus@switch:~$ netq show notification rule
Matching config_notify records:
Name            Rule Key         Rule Value
--------------- ---------------- --------------------
bgpHostname     hostname         spine-01
evpnVni         vni              42
overTemp        new_s_crit       24
svcStatus       new_status       down
switchLeaf04    hostname         leaf04
swp52           port             swp52
sysconf         configdiff       updated
cumulus@switch:~$ netq show notification filter
Matching config_notify records:
Name            Order      Severity         Channels         Rules
--------------- ---------- ---------------- ---------------- ----------
swp52Drop       1          error            NetqDefaultChann swp52
                                            el
bgpSpine        2          info             pd-netq-events   bgpHostnam
                                                             e
vni42           3          warning          pd-netq-events   evpnVni
configChange    4          info             slk-netq-events  sysconf
svcDown         5          critical         slk-netq-events  svcStatus
critTemp        6          critical         pd-netq-events   switchLeaf
                                                             04
                                                             overTemp                                                

View Notification Configurations in JSON Format

You can view configured integrations using the netq show notification commands. To view the channels, filters, and rules, run the three flavors of the command. Include the json option to display JSON-formatted output.

For example:

cumulus@switch:~$ netq show notification channel json
{
    "config_notify":[
        {
            "type":"slack",
            "name":"slk-netq-events",
            "channelInfo":"webhook:https://hooks.slack.com/services/text/moretext/evenmoretext",
            "severity":"info"
        },
        {
            "type":"pagerduty",
            "name":"pd-netq-events",
            "channelInfo":"integration-key: 1234567890",
            "severity":"info"
    }
    ],
    "truncatedResult":false
}
 
cumulus@switch:~$ netq show notification rule json
{
    "config_notify":[
        {
            "ruleKey":"hostname",
            "ruleValue":"spine-01",
            "name":"bgpHostname"
        },
        {
            "ruleKey":"vni",
            "ruleValue":42,
            "name":"evpnVni"
        },
        {
            "ruleKey":"new_supported_fec",
            "ruleValue":"supported",
            "name":"fecSupport"
        },
        {
            "ruleKey":"new_s_crit",
            "ruleValue":24,
            "name":"overTemp"
        },
        {
            "ruleKey":"new_status",
            "ruleValue":"down",
            "name":"svcStatus"
        },
        {
            "ruleKey":"configdiff",
            "ruleValue":"updated",
            "name":"sysconf"
    }
    ],
    "truncatedResult":false
}
 
cumulus@switch:~$ netq show notification filter json
{
    "config_notify":[
        {
            "channels":"pd-netq-events",
            "rules":"overTemp",
            "name":"1critTemp",
            "severity":"critical"
        },
        {
            "channels":"pd-netq-events",
            "rules":"evpnVni",
            "name":"3vni42",
            "severity":"warning"
        },
        {
            "channels":"pd-netq-events",
            "rules":"bgpHostname",
            "name":"4bgpSpine",
            "severity":"info"
        },
        {
            "channels":"slk-netq-events",
            "rules":"sysconf",
            "name":"configChange",
            "severity":"info"
        },
        {
            "channels":"slk-netq-events",
            "rules":"fecSupport",
            "name":"newFEC",
            "severity":"info"
        },
        {
            "channels":"slk-netq-events",
            "rules":"svcStatus",
            "name":"svcDown",
            "severity":"critical"
    }
    ],
    "truncatedResult":false
}

Manage NetQ Event Notification Integrations

You might need to modify event notification configurations at some point in the lifecycle of your deployment.

Remove an Event Notification Channel

You can delete an event notification integration using the netq config del notification command. You can verify it has been removed using the related show command.

For example, to remove a Slack integration and verify it is no longer in the configuration:

cumulus@switch:~$ netq del notification channel slk-netq-events
cumulus@switch:~$ netq show notification channel
Matching config_notify records:
Name            Type             Severity         Channel Info
--------------- ---------------- ---------------- ------------------------
pd-netq-events  pagerduty        info             integration-key: 1234567
                                                  890

Delete an Event Notification Rule

To delete a rule, use the following command, then verify it has been removed:

cumulus@switch:~$ netq del notification rule swp52
cumulus@switch:~$ netq show notification rule
Matching config_notify records:
Name            Rule Key         Rule Value
--------------- ---------------- --------------------
bgpHostname     hostname         spine-01
evpnVni         vni              42
overTemp        new_s_crit       24
svcStatus       new_status       down
switchLeaf04    hostname         leaf04
sysconf         configdiff       updated

Delete an Event Notification Filter

To delete a filter, use the following command, then verify it has been removed:

cumulus@switch:~$ netq del notification filter bgpSpine
cumulus@switch:~$ netq show notification filter
Matching config_notify records:
Name            Order      Severity         Channels         Rules
--------------- ---------- ---------------- ---------------- ----------
swp52Drop       1          error            NetqDefaultChann swp52
                                            el
vni42           2          warning          pd-netq-events   evpnVni
configChange    3          info             slk-netq-events  sysconf
svcDown         4          critical         slk-netq-events  svcStatus
critTemp        5          critical         pd-netq-events   switchLeaf
                                                             04
                                                             overTemp

Configure Threshold-based Event Notifications

NetQ supports a set of events that are triggered by crossing a user-defined threshold, called TCA events. These events allow detection and prevention of network failures for selected interface, utilization, sensor, forwarding, and ACL events.

The simplest configuration you can create is one that sends a TCA event generated by all devices and all interfaces to a single notification application. Use the netq add tca command to configure the event. Its syntax is:

netq add tca [event_id <text-event-id-anchor>]  [scope <text-scope-anchor>] [tca_id <text-tca-id-anchor>]  [severity info | severity critical] [is_active true | is_active false] [suppress_until <text-suppress-ts>] [threshold <text-threshold-value> ] [channel <text-channel-name-anchor> | channel drop <text-drop-channel-name>]

A notification configuration must contain one rule. Each rule must contain a scope and a threshold. Optionally, you can specify an associated channel. Note: If a rule is not associated with a channel, the event information is only reachable from the database. If you want to deliver events to one or more notification channels (syslog, Slack, or PagerDuty), create them by following the instructions in Create Your Channel, and then return here to define your rule.

Supported Events

The following events are supported:

CategoryEvent IDDescription
Interface StatisticsTCA_RXBROADCAST_UPPERrx_broadcast bytes per second on a given switch or host is greater than maximum threshold
Interface StatisticsTCA_RXBYTES_UPPERrx_bytes per second on a given switch or host is greater than maximum threshold
Interface StatisticsTCA_RXMULTICAST_UPPERrx_multicast per second on a given switch or host is greater than maximum threshold
Interface StatisticsTCA_TXBROADCAST_UPPERtx_broadcast bytes per second on a given switch or host is greater than maximum threshold
Interface StatisticsTCA_TXBYTES_UPPERtx_bytes per second on a given switch or host is greater than maximum threshold
Interface StatisticsTCA_TXMULTICAST_UPPERtx_multicast bytes per second on a given switch or host is greater than maximum threshold
Resource UtilizationTCA_CPU_UTILIZATION_UPPERCPU utilization (%) on a given switch or host is greater than maximum threshold
Resource UtilizationTCA_DISK_UTILIZATION_UPPERDisk utilization (%) on a given switch or host is greater than maximum threshold
Resource UtilizationTCA_MEMORY_UTILIZATION_UPPERMemory utilization (%) on a given switch or host is greater than maximum threshold
SensorsTCA_SENSOR_FAN_UPPERSwitch sensor reported fan speed on a given switch or host is greater than maximum threshold
SensorsTCA_SENSOR_POWER_UPPERSwitch sensor reported power (Watts) on a given switch or host is greater than maximum threshold
SensorsTCA_SENSOR_TEMPERATURE_UPPERSwitch sensor reported temperature (°C) on a given switch or host is greater than maximum threshold
SensorsTCA_SENSOR_VOLTAGE_UPPERSwitch sensor reported voltage (Volts) on a given switch or host is greater than maximum threshold
Forwarding ResourcesTCA_TCAM_TOTAL_ROUTE_ENTRIES_UPPERNumber of routes on a given switch or host is greater than maximum threshold
Forwarding ResourcesTCA_TCAM_TOTAL_MCAST_ROUTES_UPPERNumber of multicast routes on a given switch or host is greater than maximum threshold
Forwarding ResourcesTCA_TCAM_MAC_ENTRIES_UPPERNumber of MAC addresses on a given switch or host is greater than maximum threshold
Forwarding ResourcesTCA_TCAM_IPV4_ROUTE_UPPERNumber of IPv4 routes on a given switch or host is greater than maximum threshold
Forwarding ResourcesTCA_TCAM_IPV4_HOST_UPPERNumber of IPv4 hosts on a given switch or host is greater than maximum threshold
Forwarding ResourcesTCA_TCAM_IPV6_ROUTE_UPPERNumber of IPv6 hosts on a given switch or host is greater than maximum threshold
Forwarding ResourcesTCA_TCAM_IPV6_HOST_UPPERNumber of IPv6 hosts on a given switch or host is greater than maximum threshold
Forwarding ResourcesTCA_TCAM_ECMP_NEXTHOPS_UPPERNumber of equal cost multi-path (ECMP) next hop entries on a given switch or host is greater than maximum threshold
ACL ResourcesTCA_TCAM_IN_ACL_V4_FILTER_UPPERNumber of ingress ACL filters for IPv4 addresses on a given switch or host is greater than maximum threshold
ACL ResourcesTCA_TCAM_EG_ACL_V4_FILTER_UPPERNumber of egress ACL filters for IPv4 addresses on a given switch or host is greater than maximum threshold
ACL ResourcesTCA_TCAM_IN_ACL_V4_MANGLE_UPPERNumber of ingress ACL mangles for IPv4 addresses on a given switch or host is greater than maximum threshold
ACL ResourcesTCA_TCAM_EG_ACL_V4_MANGLE_UPPERNumber of egress ACL mangles for IPv4 addresses on a given switch or host is greater than maximum threshold
ACL ResourcesTCA_TCAM_IN_ACL_V6_FILTER_UPPERNumber of ingress ACL filters for IPv6 addresses on a given switch or host is greater than maximum threshold
ACL ResourcesTCA_TCAM_EG_ACL_V6_FILTER_UPPERNumber of egress ACL filters for IPv6 addresses on a given switch or host is greater than maximum threshold
ACL ResourcesTCA_TCAM_IN_ACL_V6_MANGLE_UPPERNumber of ingress ACL mangles for IPv6 addresses on a given switch or host is greater than maximum threshold
ACL ResourcesTCA_TCAM_EG_ACL_V6_MANGLE_UPPERNumber of egress ACL mangles for IPv6 addresses on a given switch or host is greater than maximum threshold
ACL ResourcesTCA_TCAM_IN_ACL_8021x_FILTER_UPPERNumber of ingress ACL 802.1 filters on a given switch or host is greater than maximum threshold
ACL ResourcesTCA_TCAM_ACL_L4_PORT_CHECKERS_UPPERNumber of ACL port range checkers on a given switch or host is greater than maximum threshold
ACL ResourcesTCA_TCAM_ACL_REGIONS_UPPERNumber of ACL regions on a given switch or host is greater than maximum threshold
ACL ResourcesTCA_TCAM_IN_ACL_MIRROR_UPPERNumber of ingress ACL mirrors on a given switch or host is greater than maximum threshold
ACL ResourcesTCA_TCAM_ACL_18B_RULES_UPPERNumber of ACL 18B rules on a given switch or host is greater than maximum threshold
ACL ResourcesTCA_TCAM_ACL_32B_RULES_UPPERNumber of ACL 32B rules on a given switch or host is greater than maximum threshold
ACL ResourcesTCA_TCAM_ACL_54B_RULES_UPPERNumber of ACL 54B rules on a given switch or host is greater than maximum threshold
ACL ResourcesTCA_TCAM_IN_PBR_V4_FILTER_UPPERNumber of ingress policy-based routing (PBR) filters for IPv4 addresses on a given switch or host is greater than maximum threshold
ACL ResourcesTCA_TCAM_IN_PBR_V6_FILTER_UPPERNumber of ingress policy-based routing (PBR) filters for IPv6 addresses on a given switch or host is greater than maximum threshold

Define a Scope

A scope is used to filter the events generated by a given rule. Scope values are set on a per TCA rule basis. All rules can be filtered on Hostname. Some rules can also be filtered by other parameters, as shown in this table. Note: Scope parameters must be entered in the order defined.

CategoryEvent IDScope Parameters
Interface StatisticsTCA_RXBROADCAST_UPPERHostname, Interface
Interface StatisticsTCA_RXBYTES_UPPERHostname, Interface
Interface StatisticsTCA_RXMULTICAST_UPPERHostname, Interface
Interface StatisticsTCA_TXBROADCAST_UPPERHostname, Interface
Interface StatisticsTCA_TXBYTES_UPPERHostname, Interface
Interface StatisticsTCA_TXMULTICAST_UPPERHostname, Interface
Resource UtilizationTCA_CPU_UTILIZATION_UPPERHostname
Resource UtilizationTCA_DISK_UTILIZATION_UPPERHostname
Resource UtilizationTCA_MEMORY_UTILIZATION_UPPERHostname
SensorsTCA_SENSOR_FAN_UPPERHostname, Sensor Name
SensorsTCA_SENSOR_POWER_UPPERHostname, Sensor Name
SensorsTCA_SENSOR_TEMPERATURE_UPPERHostname, Sensor Name
SensorsTCA_SENSOR_VOLTAGE_UPPERHostname, Sensor Name
Forwarding ResourcesTCA_TCAM_TOTAL_ROUTE_ENTRIES_UPPERHostname
Forwarding ResourcesTCA_TCAM_TOTAL_MCAST_ROUTES_UPPERHostname
Forwarding ResourcesTCA_TCAM_MAC_ENTRIES_UPPERHostname
Forwarding ResourcesTCA_TCAM_ECMP_NEXTHOPS_UPPERHostname
Forwarding ResourcesTCA_TCAM_IPV4_ROUTE_UPPERHostname
Forwarding ResourcesTCA_TCAM_IPV4_HOST_UPPERHostname
Forwarding ResourcesTCA_TCAM_IPV6_ROUTE_UPPERHostname
Forwarding ResourcesTCA_TCAM_IPV6_HOST_UPPERHostname
ACL ResourcesTCA_TCAM_IN_ACL_V4_FILTER_UPPERHostname
ACL ResourcesTCA_TCAM_EG_ACL_V4_FILTER_UPPERHostname
ACL ResourcesTCA_TCAM_IN_ACL_V4_MANGLE_UPPERHostname
ACL ResourcesTCA_TCAM_EG_ACL_V4_MANGLE_UPPERHostname
ACL ResourcesTCA_TCAM_IN_ACL_V6_FILTER_UPPERHostname
ACL ResourcesTCA_TCAM_EG_ACL_V6_FILTER_UPPERHostname
ACL ResourcesTCA_TCAM_IN_ACL_V6_MANGLE_UPPERHostname
ACL ResourcesTCA_TCAM_EG_ACL_V6_MANGLE_UPPERHostname
ACL ResourcesTCA_TCAM_IN_ACL_8021x_FILTER_UPPERHostname
ACL ResourcesTCA_TCAM_ACL_L4_PORT_CHECKERS_UPPERHostname
ACL ResourcesTCA_TCAM_ACL_REGIONS_UPPERHostname
ACL ResourcesTCA_TCAM_IN_ACL_MIRROR_UPPERHostname
ACL ResourcesTCA_TCAM_ACL_18B_RULES_UPPERHostname
ACL ResourcesTCA_TCAM_ACL_32B_RULES_UPPERHostname
ACL ResourcesTCA_TCAM_ACL_54B_RULES_UPPERHostname
ACL ResourcesTCA_TCAM_IN_PBR_V4_FILTER_UPPERHostname
ACL ResourcesTCA_TCAM_IN_PBR_V6_FILTER_UPPERHostname

Scopes are defined with regular expressions, as follows. When two paramaters are used, they are separated by a comma, but no space.

ParametersScope ValueExampleResult
Hostname<hostname>leaf01Deliver events for the specified device
Hostname<partial-hostname>*leaf*Deliver events for devices with hostnames starting with specified text (leaf)
Hostname**Deliver events for all devices
Hostname, Interface<hostname>,<interface>leaf01,swp9Deliver events for the specified interface (swp9) on the specified device (leaf01)
Hostname, Interface<hostname>,*leaf01,*Deliver events for all interfaces on the specified device (leaf01)
Hostname, Interface*,<interface>*,swp9Deliver events for the specified interface (swp9) on all devices
Hostname, Interface*,**,*Deliver events for all devices and all interfaces
Hostname, Interface<partial-hostname>*,<interface>leaf*,swp9Deliver events for the specified interface (swp9) on all devices with hostnames starting with the specified text (leaf)
Hostname, Interface<hostname>,<partial-interface>*leaf01,swp*Deliver events for all interface with names starting with the specified text (swp) on the specified device (leaf01)
Hostname, Sensor Name<hostname>,<sensorname>leaf01,fan1Deliver events for the specified sensor (fan1) on the specified device (leaf01)
Hostname, Sensor Name*,<sensorname>*,fan1Deliver events for the specified sensor (fan1) for all devices
Hostname, Sensor Name<hostname>,*leaf01,*Deliver events for all sensors on the specified device (leaf01)
Hostname, Sensor Name<partial-hostname>*,<interface>leaf*,fan1Deliver events for the specified sensor (fan1) on all devices with hostnames starting with the specified text (leaf)
Hostname, Sensor Name<hostname>,<partial-sensorname>*leaf01,fan*Deliver events for all sensors with names starting with the specified text (fan) on the specified device (leaf01)
Hostname, Sensor Name*,**,*Deliver events for all sensors on all devices

Create a TCA Rule

Now that you know which events are supported and how to set the scope, you can create a basic rule to deliver one of the TCA events to a notification channel using the netq add tca command. Note that the event ID is case sensitive and must be in all caps.

For example, this rule tells NetQ to deliver an event notification to the tca_slack_ifstats pre-configured Slack channel when the CPU utilization exceeds 95% of its capacity on any monitored switch:

netq add tca event_id TCA_CPU_UTILIZATION_UPPER scope * channel tca_slack_ifstats threshold 95

This rule tells NetQ to deliver an event notification to the tca_pd_ifstats PagerDuty channel when the number of transmit bytes per second (Bps) on the leaf12 switch exceeds 20,000 Bps on any interface:

netq add tca event_id TCA_TXBYTES_UPPER scope leaf12,* channel tca_pd_ifstats threshold 20000

This rule tells NetQ to deliver an event notification to the syslog-netq syslog channel when the temperature on sensor temp1 on the leaf12 switch exceeds 32 degrees Celcius:

netq add tca event_id TCA_SENSOR_TEMPERATURE_UPPER scope leaf12,temp1 channel syslog-netq threshold 32

For a Slack channel, the event messages should be similar to this:

Set the Severity of a Threshold-based Event

In addition to defining a scope for TCA rule, you can also set a severity of either info or critical. To add a severity to a rule, use the severity option.

For example, if you want add a critical severity to the CPU utilization rule you created earlier:

netq add tca event_id TCA_CPU_UTILIZATION_UPPER scope * severity critical channel tca_slack_resources threshold 95

Or if an event is important, but not critical. Set the severity to info:

netq add tca event_id TCA_TXBYTES_UPPER scope leaf12,* severity info channel tca_pd_ifstats threshold 20000

Create Multiple Rules for a TCA Event

You are likely to want more than one rule around a particular event. For example, you might want to:

netq add tca event_id TCA_SENSOR_TEMPERATURE_UPPER scope leaf*,temp1 channel syslog-netq threshold 32

netq add tca event_id TCA_SENSOR_TEMPERATURE_UPPER scope *,temp1 channel tca_sensors,tca_pd_sensors threshold 32

netq add tca event_id TCA_SENSOR_TEMPERATURE_UPPER scope leaf03,temp1 channel syslog-netq threshold 29

Now you have four rules created (the original one, plus these three new ones) all based on the TCA_SENSOR_TEMPERATURE_UPPER event. To identify the various rules, NetQ automatically generates a TCA name for each rule. As each rule is created, an _# is added to the event name. The TCA Name for the first rule created is then TCA_SENSOR_TEMPERATURE_UPPER_1, the second rule created for this event is TCA_SENSOR_TEMPERATURE_UPPER_2, and so forth.

Suppress a Rule

During troubleshooting or maintenance of switches you may want to suppress a rule to prevent erroneous event messages. Using the suppress_until option allows you to prevent the rule from being applied for a designated amout of time (in seconds). When this time has passed, the rule is automatically re-enabled.

For example, to suppress the disk utilization event for an hour:

cumulus@switch:~$ netq add tca tca_id TCA_DISK_UTILIZATION_UPPER_1 suppress_until 3600
Successfully added/updated tca TCA_DISK_UTILIZATION_UPPER_1

Remove a Channel from a Rule

You can stop sending events to a particular channel using the drop option:

cumulus@switch:~$ netq add tca tca_id TCA_DISK_UTILIZATION_UPPER_1 channel drop tca_slack_resources
Successfully added/updated tca TCA_DISK_UTILIZATION_UPPER_1

Manage Threshold-based Event Notifications

Once you have created a bunch of rules, you might to manage them; view a list of the rules, disable a rule, delete a rule, and so forth.

Show Threshold-based Event Rules

You can view all TCA rules or a particular rule using the netq show tca command:

Example 1: Display All TCA Rules

cumulus@switch:~$ netq show tca
Matching config_tca records:
TCA Name                     Event Name           Scope                      Severity         Channel/s          Active Threshold          Suppress Until
---------------------------- -------------------- -------------------------- ---------------- ------------------ ------ ------------------ ----------------------------
TCA_CPU_UTILIZATION_UPPER_1  TCA_CPU_UTILIZATION_ {"hostname":"leaf01"}      critical         tca_slack_resource True   1                  Sun Dec  8 14:17:18 2019
                             UPPER                                                            s
TCA_DISK_UTILIZATION_UPPER_1 TCA_DISK_UTILIZATION {"hostname":"leaf01"}      info                                False  80                 Mon Dec  9 05:03:46 2019
                             _UPPER
TCA_MEMORY_UTILIZATION_UPPER TCA_MEMORY_UTILIZATI {"hostname":"leaf01"}      info             tca_slack_resource True   1                  Sun Dec  8 11:53:15 2019
_1                           ON_UPPER                                                         s
TCA_RXBYTES_UPPER_1          TCA_RXBYTES_UPPER    {"ifname":"swp3","hostname info             tca-tx-bytes-slack True   100                Sun Dec  8 17:22:52 2019
                                                  ":"leaf01"}
TCA_RXMULTICAST_UPPER_1      TCA_RXMULTICAST_UPPE {"ifname":"swp3","hostname info             tca-tx-bytes-slack True   0                  Sun Dec  8 10:43:57 2019
                             R                    ":"leaf01"}
TCA_SENSOR_FAN_UPPER_1       TCA_SENSOR_FAN_UPPER {"hostname":"leaf01","s_na info             tca_slack_sensors  True   0                  Sun Dec  8 12:30:14 2019
                                                  me":"*"}
TCA_SENSOR_TEMPERATURE_UPPER TCA_SENSOR_TEMPERATU {"hostname":"leaf01","s_na critical         tca_slack_sensors  True   10                 Sun Dec  8 14:05:24 2019
_1                           RE_UPPER             me":"*"}
TCA_TXBYTES_UPPER_1          TCA_TXBYTES_UPPER    {"ifname":"swp3","hostname critical         tca-tx-bytes-slack True   100                Sun Dec  8 14:19:46 2019
                                                  ":"leaf01"}
TCA_TXMULTICAST_UPPER_1      TCA_TXMULTICAST_UPPE {"ifname":"swp3","hostname info             tca-tx-bytes-slack True   0                  Sun Dec  8 16:40:14 2269
                             R                    ":"leaf01"}

Example 2: Display a Specific TCA Rule

cumulus@switch:~$ netq show tca tca_id TCA_TXMULTICAST_UPPER_1
Matching config_tca records:
TCA Name                     Event Name           Scope                      Severity         Channel/s          Active Threshold          Suppress Until
---------------------------- -------------------- -------------------------- ---------------- ------------------ ------ ------------------ ----------------------------
TCA_TXMULTICAST_UPPER_1      TCA_TXMULTICAST_UPPE {"ifname":"swp3","hostname info             tca-tx-bytes-slack True   0                  Sun Dec  8 16:40:14 2269
                             R                    ":"leaf01"}

Disable a TCA Rule

Where the suppress option temporarily disables a TCA rule, you can use the is_active option to disable a rule indefinitely. To disable a rule, set the option to false. To re-enable it, set the option to true.

cumulus@switch:~$ netq add tca tca_id TCA_DISK_UTILIZATION_UPPER_1 is_active false
Successfully added/updated tca TCA_DISK_UTILIZATION_UPPER_1

Delete a TCA Rule

If disabling a rule is not sufficient, and you want to remove a rule altogether, you can do so using the netq del tca command.

cumulus@switch:~$ netq del tca tca_id TCA_RXBYTES_UPPER_1
Successfully deleted TCA TCA_RXBYTES_UPPER_1

Resolve Scope Conflicts

There may be occasions where the scope defined by the multiple rules for a given TCA event may overlap each other. In such cases, the TCA rule with the most specific scope that is still true is used to generate the event.

To clarify this, consider this example. Three events have occurred:

NetQ attempts to match the TCA event against hostname and interface name with three TCA rules with different scopes:

The result is:

In summary:

Input EventScope ParametersTCA Scope 1TCA Scope 2TCA Scope 3Scope Applied
leaf01,swp1Hostname, Interface*,*leaf*,*leaf01,swp1
leaf01,swp3Hostname, Interface*,*leaf*,*leaf01,swp1Scope 2
spine01,swp1Hostname, Interface*,*leaf*,*leaf01,swp1Scope 1

Integrate NetQ with Your LDAP Server

With this release and an administrator role, you are able to integrate the NetQ role-based access control (RBAC) with your lightweight directory access protocol (LDAP) server in on-premises deployments. NetQ maintains control over role-based permissions for the NetQ application. Currently there are two roles, admin and user. With the integration, user authentication is handled through LDAP and your directory service, such as Microsoft Active Directory, Kerberos, OpenLDAP, and Red Hat Directory Service. A copy of each user from LDAP is stored in the local NetQ database.

Integrating with an LDAP server does not prevent you from configuring local users (stored and managed in the NetQ database) as well.

Read the Overview to become familiar with LDAP configuration parameters, or skip to Create an LDAP Configuration if you are already an LDAP expert.

Overview

LDAP integration requires information about how to connect to your LDAP server, the type of authentication you plan to use, bind credentials, and, optionally, search attributes.

Provide Your LDAP Server Information

To connect to your LDAP server, you need the URI and bind credentials. The URI identifies the location of the LDAP server. It is comprised of a FQDN (fully qualified domain name) or IP address, and the port of the LDAP server where the LDAP client can connect. For example: myldap.mycompany.com or 192.168.10.2. Typically port 389 is used for connection over TCP or UDP. In production environments, a secure connection with SSL can be deployed. In this case, the port used is typically 636. Setting the Enable SSL toggle automatically sets the server port to 636.

Specify Your Authentication Method

Two methods of user authentication are available: anonymous and basic.

If you are unfamiliar with the configuration of your LDAP server, contact your administrator to ensure you select the appropriate authentication method and credentials.

Define User Attributes

Two attributes are required to define a user entry in a directory:

Optionally, you can specify the first name, last name, and email address of the user.

Set Search Attributes

While optional, specifying search scope indicates where to start and how deep a given user can search within the directory. The data to search for is specified in the search query.

Search scope options include:

A typical search query for users would be {userIdAttribute}={userId}.

Now that you are familiar with the various LDAP configuration parameters, you can configure the integration of your LDAP server with NetQ using the instructions in the next section.

Create an LDAP Configuration

One LDAP server can be configured per bind DN (distinguished name). Once LDAP is configured, you can validate the connectivity (and configuration) and save the configuration.

To create an LDAP configuration:

  1. Click , then select Management under Admin.

  2. Locate the LDAP Server Info card, and click Configure LDAP.

  3. Fill out the LDAP Server Configuration form according to your particular configuration. Refer to Overview for details about the various parameters.

    Note: Items with an asterisk (*) are required. All others are optional.

  4. Click Save to complete the configuration, or click Cancel to discard the configuration.

LDAP config cannot be changed once configured. If you need to change the configuration, you must delete the current LDAP configuration and create a new one. Note that if you change the LDAP server configuration, all users created against that LDAP server remain in the NetQ database and continue to be visible, but are no longer viable. You must manually delete those users if you do not want to see them.

Example LDAP Configurations

A variety of example configurations are provided here. Scenarios 1-3 are based on using an OpenLDAP or similar authentication service. Scenario 4 is based on using the Active Directory service for authentication.

Scenario 1: Base Configuration

In this scenario, we are configuring the LDAP server with anonymous authentication, a User ID based on an email address, and a search scope of base.

ParameterValue
Host Server URLldap1.mycompany.com
Host Server Port389
AuthenticationAnonymous
Base DNdc=mycompany,dc=com
User IDemail
Search ScopeBase
Search Query{userIdAttribute}={userId}

Scenario 2: Basic Authentication and Subset of Users

In this scenario, we are configuring the LDAP server with basic authentication, for access only by the persons in the network operators group, and a limited search scope.

ParameterValue
Host Server URLldap1.mycompany.com
Host Server Port389
AuthenticationBasic
Admin Bind DNuid =admin,ou=netops,dc=mycompany,dc=com
Admin Bind Passwordnqldap!
Base DNdc=mycompany,dc=com
User IDUID
Search ScopeOne Level
Search Query{userIdAttribute}={userId}

Scenario 3: Scenario 2 with Widest Search Capability

In this scenario, we are configuring the LDAP server with basic authentication, for access only by the persons in the network administrators group, and an unlimited search scope.

ParameterValue
Host Server URL192.168.10.2
Host Server Port389
AuthenticationBasic
Admin Bind DNuid =admin,ou=netadmin,dc=mycompany,dc=com
Admin Bind Password1dap*netq
Base DNdc=mycompany, dc=net
User IDUID
Search ScopeSubtree
Search QueryuserIdAttribute}={userId}

Scenario 4: Scenario 3 with Active Directory Service

In this scenario, we are configuring the LDAP server with basic authentication, for access only by the persons in the given Active Directory group, and an unlimited search scope.

ParameterValue
Host Server URL192.168.10.2
Host Server Port389
AuthenticationBasic
Admin Bind DNcn=netq,ou=45,dc=mycompany,dc=com
Admin Bind Passwordnq&4mAd!
Base DNdc=mycompany, dc=net
User IDsAMAccountName
Search ScopeSubtree
Search Query{userIdAttribute}={userId}

Add LDAP Users to NetQ

  1. Click , then select Management under Admin.

  2. Locate the User Accounts card, and click Manage.

  3. On the User Accounts tab, click Add User.

  4. Select LDAP User.

  5. Enter the user’s ID.

  6. Enter your administrator password.

  7. Click Search.

  8. If the user is found, the email address, first and last name fields are automatically filled in on the Add New User form. If searching is not enabled on the LDAP server, you must enter the information manually.

    If the fields are not automatically filled in, and searching is enabled on the LDAP server, you might require changes to the mapping file.

  9. Select the NetQ user role for this user, admin or user, in the User Type dropdown.

  10. Enter your admin password, and click Save, or click Cancel to discard the user account.

    LDAP user passwords are not stored in the NetQ database and are always authenticated against LDAP.

  11. Repeat these steps to add additional LDAP users.

Remove LDAP Users from NetQ

You can remove LDAP users in the same manner as local users.

  1. Click , then select Management under Admin.

  2. Locate the User Accounts card, and click Manage.

  3. Select the user or users you want to remove.

  4. Click in the Edit menu.

If an LDAP user is deleted in LDAP it is not automatically deleted from NetQ; however, the login credentials for these LDAP users stop working immediately.

Integrate NetQ with Grafana

Switches collect statistics about the performance of their interfaces. The NetQ Agent on each switch collects these statistics every 15 seconds and then sends them to your NetQ Server or Appliance.

NetQ only collects statistics for physical interfaces; it does not collect statistics for virtual (non-physical) interfaces, such as bonds, bridges, and VXLANs. Specifically, the NetQ Agent collects the following interface statistics:

You can use Grafana, an open source analytics and monitoring tool, to view the interface statistics collected by the NetQ Agents. The fastest way to achieve this is by installing Grafana on an application server or locally per user, and then importing the prepared NetQ dashboard.

Install NetQ Plug-in for Grafana

The first step is to install the NetQ plug-in on your NetQ server or appliance. There are three ways to install the plug-in:

Set Up a Dashboard

The quickest way to view the interface statistics for your Cumulus Linux network is to make use of the pre-configured dashboard installed with the plug-in. Once you are familiar with that dashboard, you can create new dashboards or add new panels to the NetQ dashboard.

  1. Open the Grafana user interface:

    • Remote access: Enter <NetQ-Server-or-Appliance-IPaddr>:3000 in a web browser address field.
    • Local access: Enter localhost:3000 in a web browser address field.
  2. Log in using your application credentials.

    The Home Dashboard appears.

  3. Click Add data source or > Data Sources.

  4. Enter Net-Q in the search box or scroll down to the Other category, and select Net-Q from there.

  5. Enter Net-Q into the Name field.

  6. Enter the URL used to access the NetQ cloud service; for example api.netq.cumulusnetworks.com

  7. Enter your credentials (the ones used to login)

  8. For cloud deployments only, if you have more than one premises configured, you can select the premises you want to view, as follows:

    • If you leave the Premises field blank, the first premises name is selected by default

    • If you enter a premises name, that premises is selected for viewing

      Note: If multiple premises are configured with the same name, then the first premises of that name is selected for viewing

  9. Click Save & Test

Create a Dashboard

You can either use the dashboard provided with the plug-in, NetQ Interface Statistics, or create your own.

To use the Cumulus-provided dashboard, select the NetQ Interface Statistics from the left panel of the Home Page.

If you choose this option, you can skip directly to analyzing your data.

To create your own dashboard:

  1. Click to open a blank dashboard.

  2. Click (Dashboard Settings) at the top of the dashboard.

  3. Click Variables.

  4. Enter hostname into the Name field.

  5. Enter Hostname into the Label field.

  6. Select Net-Q from the Data source list.

  7. Enter hostname into the Query field.

  8. Click Add.

    You should see a preview at the bottom of the hostname values.

  9. Click to return to the new dashboard.

  10. Click Add Query.

  11. Select Net-Q from the Query source list.

  12. Select the interface statistic you want to view from the Metric list.

  13. Click the General icon.

  14. Select hostname from the Repeat list.

  15. Set any other parameters around how to display the data.

  16. Return to the dashboard.

  17. Add additional panels with other metrics to complete your dashboard.

Analyze the Data

Once you have your dashboard configured, you can start analyzing the data:

  1. Select the hostname from the variable list at the top left of the charts to see the statistics for that switch or host.

  2. Review the statistics, looking for peaks and valleys, unusual patterns, and so forth.

  3. Explore the data more by modifying the data view in one of several ways using the dashboard tool set:

    • Select a different time period for the data by clicking the forward or back arrows. The default time range is dependent on the width of your browser window.
    • Zoom in on the dashboard by clicking the magnifying glass.
    • Manually refresh the dashboard data, or set an automatic refresh rate for the dashboard from the down arrow.
    • Add a new variable by clicking the cog wheel, then selecting Variables
    • Add additional panels
    • Click any chart title to edit or remove it from the dashboard
    • Rename the dashboard by clicking the cog wheel and entering the new name

Cumulus NetQ API User Guide

The NetQ API provides access to key telemetry and system monitoring data gathered about the performance and operation of your data center network and devices so that you can view that data in your internal or third-party analytic tools. The API gives you access to the health of individual switches, network protocols and services, and views of network-wide inventory and events.

This guide provides an overview of the API framework and some examples of how to use the API to extract the data you need. Descriptions of each endpoint and model parameter are contained in the API .json files.

For information regarding new features, improvements, bug fixes, and known issues present in this release, refer to the release notes.

API Organization

The Cumulus NetQ API provides endpoints for:

Each endpoint has its own API. You can make requests for all data and all devices or you can filter the request by a given hostname.

Each API returns a predetermined set of data as defined in the API models.

Get Started

You can access the API gateway and execute requests from a terminal interface against your NetQ Platform or NetQ Appliance through port 32708.

Log In and Authentication

Use your login credentials that were provided as part of the installation process. For this release, the default is username admin and password admin.

To log in and obtain authorization:

  1. Open a terminal window.

  2. Enter the following curl command.

    <computer-name>:~ <username>$ curl --insecure -X POST "https://<netq.domain>:32708/netq/auth/v1/login" -H "Content-Type: application/json" -d '{"username":"admin","password":"admin"}'
    {"premises":[{"opid":0,"name":"OPID0"}],"access_token":"eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJ1c2VyIjoiYWRtaW4iLCJvcGlkIjowLCJyb2xlIjoiYWRtaW4iLCJleHBpcmVzQXQiOjE1NTYxMjUzNzgyODB9.\_D2Ibhmo_BWSfAMnF2FzddjndTn8LP8CAFFGIj5tn0A","customer_id":0,"id":"admin","expires_at":1556125378280,"terms_of_use_accepted":true}
    
  3. Copy the access token for use in making data requests.

API Requests

We will use curl to execute our requests. Each request contains an API method (GET, POST, etc.), the address and API object to query, a variety of headers, and sometimes a body. In the log in step you used above:

We have used the insecure option to work around any certificate issues with our development configuration. You would likely not use this option.

API Responses

A NetQ API response is comprised of a status code, any relevant error codes (if unsuccessful), and the collected data (if successful).

The following HTTP status codes might be presented in the API responses:

CodeNameDescriptionAction
200SuccessRequest was successfully processed.Review response
400Bad RequestInvalid input was detected in request.Check the syntax of your request and make sure it matches the schema
401UnauthorizedAuthentication has failed or credentials were not provided.Provide or verify your credentials, or request access from your administrator
403ForbiddenRequest was valid, but user may not have needed permissions.Verify your credentials or request an account from your administrator
404Not FoundRequested resource could not be found.Try the request again after a period of time or verify status of resource
409ConflictRequest cannot be processed due to conflict in current state of the resource.Verify status of resource and remove conflict
500Internal Server ErrorUnexpected condition has occurred.Perform general troubleshooting and try the request again
503Service UnavailableThe service being requested is currently unavailable.Verify the status of the NetQ Platform or Appliance, and the associated service

Example Requests and Responses

Some command requests and their responses are shown here, but feel free to run your own requests. To run a request, you will need your authorization token. We have piped our responses through a python tool to make the responses more readable. You may chose to do so as well or not.

To view all of the endpoints and their associated requests and responses, refer to Cumulus NetQ API User Guide.

Get Network-wide Status of the BGP Service

Make your request to the bgp endpoint to obtain status information from all nodes running the BGP service, as follows:

curl --insecure -X GET "<https://<netq.domain>:32708/netq/telemetry/v1/object/bgp " -H "Content-Type: application/json " -H "Authorization: <auth-token> " | python -m json.tool
 
[
  {
    "ipv6_pfx_rcvd": 0,
    "peer_router_id": "0.0.0.0",
    "objid": "",
    "upd8_tx": 0,
    "hostname": "exit-1",
    "timestamp": 1556037420723,
    "peer_asn": 0,
    "state": "NotEstd",
    "vrf": "DataVrf1082",
    "rx_families": [],
    "ipv4_pfx_rcvd": 0,
    "conn_dropped": 0,
    "db_state": "Update",
    "up_time": 0,
    "last_reset_time": 0,
    "tx_families": [],
    "reason": "N/A",
    "vrfid": 13,
    "asn": 655536,
    "opid": 0,
    "peer_hostname": "",
    "upd8_rx": 0,
    "peer_name": "swp7.4",
    "evpn_pfx_rcvd": 0,
    "conn_estd": 0
  },
  {
    "ipv6_pfx_rcvd": 0,
    "peer_router_id": "0.0.0.0",
    "objid": "",
    "upd8_tx": 0,
    "hostname": "exit-1",
    "timestamp": 1556037420674,
    "peer_asn": 0,
    "state": "NotEstd",
    "vrf": "default",
    "rx_families": [],
    "ipv4_pfx_rcvd": 0,
    "conn_dropped": 0,
    "db_state": "Update",
    "up_time": 0,
    "last_reset_time": 0,
    "tx_families": [],
    "reason": "N/A",
    "vrfid": 0,
    "asn": 655536,
    "opid": 0,
    "peer_hostname": "",
    "upd8_rx": 0,
    "peer_name": "swp7",
    "evpn_pfx_rcvd": 0,
    "conn_estd": 0
  },
  {
    "ipv6_pfx_rcvd": 24,
    "peer_router_id": "27.0.0.19",
    "objid": "",
    "upd8_tx": 314,
    "hostname": "exit-1",
    "timestamp": 1556037420665,
    "peer_asn": 655435,
    "state": "Established",
    "vrf": "default",
    "rx_families": [
      "ipv4",
      "ipv6",
      "evpn"
    ],
    "ipv4_pfx_rcvd": 26,
    "conn_dropped": 0,
    "db_state": "Update",
    "up_time": 1556036850000,
    "last_reset_time": 0,
    "tx_families": [
      "ipv4",
      "ipv6",
      "evpn"
    ],
    "reason": "N/A",
    "vrfid": 0,
    "asn": 655536,
    "opid": 0,
    "peer_hostname": "spine-1",
    "upd8_rx": 321,
    "peer_name": "swp3",
    "evpn_pfx_rcvd": 354,
    "conn_estd": 1
  },
...

Get Status of EVPN on a Specific Switch

Make your request to the evpn/hostname endpoint to view the status of all EVPN sessions running on that node. This example uses the server01 node.

curl -X GET "https://<netq.domain>:32708/netq/telemetry/v1/object/evpn/hostname/server01" -H "Content-Type: application/json" -H "Authorization: <auth-token>" | python -m json.tool
 
[
  {
    "import_rt": "[\"197:42\"]",
    "vni": 42,
    "rd": "27.0.0.22:2",
    "hostname": "server01",
    "timestamp": 1556037403853,
    "adv_all_vni": true,
    "export_rt": "[\"197:42\"]",
    "db_state": "Update",
    "in_kernel": true,
    "adv_gw_ip": "Disabled",
    "origin_ip": "27.0.0.22",
    "opid": 0,
    "is_l3": false
  },
  {
    "import_rt": "[\"197:37\"]",
    "vni": 37,
    "rd": "27.0.0.22:8",
    "hostname": "server01",
    "timestamp": 1556037403811,
    "adv_all_vni": true,
    "export_rt": "[\"197:37\"]",
    "db_state": "Update",
    "in_kernel": true,
    "adv_gw_ip": "Disabled",
    "origin_ip": "27.0.0.22",
    "opid": 0,
    "is_l3": false
  },
  {
    "import_rt": "[\"197:4001\"]",
    "vni": 4001,
    "rd": "6.0.0.194:5",
    "hostname": "server01",
    "timestamp": 1556036360169,
    "adv_all_vni": true,
    "export_rt": "[\"197:4001\"]",
    "db_state": "Refresh",
    "in_kernel": true,
    "adv_gw_ip": "Disabled",
    "origin_ip": "27.0.0.22",
    "opid": 0,
    "is_l3": true
  },
...

Get Status on All Interfaces at a Given Time

Make your request to the interfaces endpoint to view the status of all interfaces. By specifying the eq-timestamp option and entering a date and time in epoch format, you indicate the data for that time (versus in the last hour by default), as follows:

curl -X GET "https://<netq.domain>:32708/netq/telemetry/v1/object/interface?eq_timestamp=1556046250" -H "Content-Type: application/json" -H "Authorization: <auth-token>" | python -m json.tool
 
[
  {
    "hostname": "exit-1",
    "timestamp": 1556046270494,
    "state": "up",
    "vrf": "DataVrf1082",
    "last_changed": 1556037405259,
    "ifname": "swp3.4",
    "opid": 0,
    "details": "MTU: 9202",
    "type": "vlan"
  },
  {
    "hostname": "exit-1",
    "timestamp": 1556046270496,
    "state": "up",
    "vrf": "DataVrf1081",
    "last_changed": 1556037405320,
    "ifname": "swp7.3",
    "opid": 0,
    "details": "MTU: 9202",
    "type": "vlan"
  },
  {
    "hostname": "exit-1",
    "timestamp": 1556046270497,
    "state": "up",
    "vrf": "DataVrf1080",
    "last_changed": 1556037405310,
    "ifname": "swp7.2",
    "opid": 0,
    "details": "MTU: 9202",
    "type": "vlan"
  },
  {
    "hostname": "exit-1",
    "timestamp": 1556046270499,
    "state": "up",
    "vrf": "",
    "last_changed": 1556037405315,
    "ifname": "DataVrf1081",
    "opid": 0,
    "details": "table: 1081, MTU: 65536, Members:  swp7.3,  DataVrf1081,  swp4.3,  swp6.3,  swp5.3,  swp3.3, ",
    "type": "vrf"
  },
...

Get a List of All Devices Being Monitored

Make your request to the inventory endpoint to get a listing of all monitored nodes and their configuration information, as follows:

curl -X GET "https://<netq.domain>:32708/netq/telemetry/v1/object/inventory" -H "Content-Type: application/json" -H "Authorization: <auth-token>" | python -m json.tool
 
[
  {
    "hostname": "exit-1",
    "timestamp": 1556037425658,
    "asic_model": "A-Z",
    "agent_version": "2.1.1-cl3u16~1556035513.afedb69",
    "os_version": "A.2.0",
    "license_state": "ok",
    "disk_total_size": "10 GB",
    "os_version_id": "A.2.0",
    "platform_model": "A_VX",
    "memory_size": "2048.00 MB",
    "asic_vendor": "AA Inc",
    "cpu_model": "A-SUBLEQ",
    "asic_model_id": "N/A",
    "platform_vendor": "A Systems",
    "asic_ports": "N/A",
    "cpu_arch": "x86_64",
    "cpu_nos": "2",
    "platform_mfg_date": "N/A",
    "platform_label_revision": "N/A",
    "agent_state": "fresh",
    "cpu_max_freq": "N/A",
    "platform_part_number": "3.7.6",
    "asic_core_bw": "N/A",
    "os_vendor": "CL",
    "platform_base_mac": "00:01:00:00:01:00",
    "platform_serial_number": "00:01:00:00:01:00"
  },
  {
    "hostname": "exit-2",
    "timestamp": 1556037432361,
    "asic_model": "C-Z",
    "agent_version": "2.1.1-cl3u16~1556035513.afedb69",
    "os_version": "C.2.0",
    "license_state": "N/A",
    "disk_total_size": "30 GB",
    "os_version_id": "C.2.0",
    "platform_model": "C_VX",
    "memory_size": "2048.00 MB",
    "asic_vendor": "CC Inc",
    "cpu_model": "C-CRAY",
    "asic_model_id": "N/A",
    "platform_vendor": "C Systems",
    "asic_ports": "N/A",
    "cpu_arch": "x86_64",
    "cpu_nos": "2",
    "platform_mfg_date": "N/A",
    "platform_label_revision": "N/A",
    "agent_state": "fresh",
    "cpu_max_freq": "N/A",
    "platform_part_number": "3.7.6",
    "asic_core_bw": "N/A",
    "os_vendor": "CL",
    "platform_base_mac": "00:01:00:00:02:00",
    "platform_serial_number": "00:01:00:00:02:00"
  },
  {
    "hostname": "firewall-1",
    "timestamp": 1556037438002,
    "asic_model": "N/A",
    "agent_version": "2.1.0-ub16.04u15~1555608012.1d98892",
    "os_version": "16.04.1 LTS (Xenial Xerus)",
    "license_state": "N/A",
    "disk_total_size": "3.20 GB",
    "os_version_id": "(hydra-poc-01 /tmp/purna/Kleen-Gui1/)\"16.04",
    "platform_model": "N/A",
    "memory_size": "4096.00 MB",
    "asic_vendor": "N/A",
    "cpu_model": "QEMU Virtual  version 2.2.0",
    "asic_model_id": "N/A",
    "platform_vendor": "N/A",
    "asic_ports": "N/A",
    "cpu_arch": "x86_64",
    "cpu_nos": "2",
    "platform_mfg_date": "N/A",
    "platform_label_revision": "N/A",
    "agent_state": "fresh",
    "cpu_max_freq": "N/A",
    "platform_part_number": "N/A",
    "asic_core_bw": "N/A",
    "os_vendor": "Ubuntu",
    "platform_base_mac": "N/A",
    "platform_serial_number": "N/A"
  },
...

View the API

For simplicity, all of the endpoint APIs are combined into a single json-formatted file. There have been no changes to the file in the NetQ 2.3.0 release.

netq-231.json
{
  "swagger": "2.0",
  "info": {
    "description": "This API is used to gain access to data collected by the Cumulus NetQ Platform and Agents for integration with third-party monitoring and analytics  software. Integrators can pull data for daily monitoring of network protocols and services performance, inventory status, and system-wide events.",
    "version": "1.1",
    "title": "Cumulus NetQ 2.3.1 API",
    "termsOfService": "https://cumulusnetworks.com/legal/"
  },
  "host": "<netq-platform-or-appliance-ipaddress>:32708",
  "basePath": "/netq/telemetry/v1",
  "externalDocs": {
    "description": "API Documentation",
    "url": "https://docs.cumulusnetworks.com/cumulus-netq/Cumulus-NetQ-Integration-Guide/API-User-Guide/"
  },
  "schemes": [
    "https"
  ],
  "paths": {
    "/object/address": {
      "get": {
        "tags": [
          "address"
        ],
        "summary": "Get all addresses for all network devices",
        "description": "Retrieves all IPv4, IPv6 and MAC addresses deployed on switches and hosts in your network running NetQ Agents.",
        "produces": [
          "application/json"
        ],
        "parameters": [
          {
            "name": "eq_timestamp",
            "in": "query",
            "description": "Display results for a given time. Time must be entered in Epoch format. For example, to display the results for Monday February 13, 2019 at 1:25 pm, use a time converter and enter 1550082300.",
            "required": false,
            "type": "integer"
          },
          {
            "name": "count",
            "in": "query",
            "description": "Number of entries to display starting from the offset value. For example, a count of 100 displays 100 entries at a time.",
            "required": false,
            "type": "integer"
          },
          {
            "name": "offset",
            "in": "query",
            "description": "Used in combination with count, offset specifies the starting location within the set of entries returned. For example, an offset of 100 would display results beginning with entry 101.",
            "required": false,
            "type": "integer"
          }
        ],
        "responses": {
          "200": {
            "description": "successful operation",
            "schema": {
              "type": "array",
              "items": {
                "$ref": "#/definitions/Address"
              }
            }
          }
        },
        "security": [
          {
            "jwt": []
          }
        ]
      }
    },
    "/object/address/hostname/{hostname}": {
      "get": {
        "tags": [
          "address"
        ],
        "summary": "Get all addresses for a given network device by hostname",
        "description": "Retrieves IPv4, IPv6, and MAC addresses of a network device (switch or host) specified by its hostname.",
        "produces": [
          "application/json"
        ],
        "parameters": [
          {
            "name": "hostname",
            "in": "path",
            "description": "User-specified name for a network switch or host. For example, leaf01, spine04, host-6, engr-1, tor-22.",
            "required": true,
            "type": "string"
          },
          {
            "name": "eq_timestamp",
            "in": "query",
            "description": "Display results for a given time. Time must be entered in Epoch format. For example, to display the results for Monday February 13, 2019 at 1:25 pm, use a time converter and enter 1550082300.",
            "required": false,
            "type": "integer"
          },
          {
            "name": "count",
            "in": "query",
            "description": "Number of entries to display starting from the offset value. For example, a count of 100 displays 100 entries at a time.",
            "required": false,
            "type": "integer"
          },
          {
            "name": "offset",
            "in": "query",
            "description": "Used in combination with count, offset specifies the starting location within the set of entries returned. For example, an offset of 100 would display results beginning with entry 101.",
            "required": false,
            "type": "integer"
          }
        ],
        "responses": {
          "200": {
            "description": "successful operation",
            "schema": {
              "type": "array",
              "items": {
                "$ref": "#/definitions/Address"
              }
            }
          }
        },
        "security": [
          {
            "jwt": []
          }
        ]
      }
    },
    "/object/login": {
      "post": {
        "tags": [
          "auth"
        ],
        "summary": "Perform authenticated user login to NetQ",
        "description": "Sends user-provided login credentials (username and password) to the NetQ Authorization service for validation. Grants access to the NetQ platform and software if user credentials are valid.",
        "operationId": "login",
        "produces": [
          "application/json"
        ],
        "parameters": [
          {
            "in": "body",
            "name": "body",
            "description": "User credentials provided for login request; username and password.",
            "required": true,
            "schema": {
              "$ref": "#/definitions/LoginRequest"
            }
          }
        ],
        "responses": {
          "200": {
            "description": "successful operation",
            "schema": {
              "$ref": "#/definitions/LoginResponse"
            }
          },
          "401": {
            "description": "Invalid credentials",
            "schema": {
              "$ref": "#/definitions/ErrorResponse"
            }
          }
        }
      }
    },
    "/object/bgp": {
      "get": {
        "tags": [
          "bgp"
        ],
        "summary": "Get all BGP session information for all network devices",
        "description": "For every Border Gateway Protocol (BGP) session running on the network, retrieves local node hostname, remote peer hostname, interface, router ID, and ASN, timestamp, VRF, connection state, IP and EVPN prefixes, and so forth. Refer to the BGPSession model for all data collected.",
        "produces": [
          "application/json"
        ],
        "parameters": [
          {
            "name": "eq_timestamp",
            "in": "query",
            "description": "Display results for a given time. Time must be entered in Epoch format. For example, to display the results for Monday February 13, 2019 at 1:25 pm, use a time converter and enter 1550082300.",
            "required": false,
            "type": "integer"
          },
          {
            "name": "count",
            "in": "query",
            "description": "Number of entries to display starting from the offset value. For example, a count of 100 displays 100 entries at a time.",
            "required": false,
            "type": "integer"
          },
          {
            "name": "offset",
            "in": "query",
            "description": "Used in combination with count, offset specifies the starting location within the set of entries returned. For example, an offset of 100 would display results beginning with entry 101.",
            "required": false,
            "type": "integer"
          }
        ],
        "responses": {
          "200": {
            "description": "successful operation",
            "schema": {
              "type": "array",
              "items": {
                "$ref": "#/definitions/BgpSession"
              }
            }
          }
        },
        "security": [
          {
            "jwt": []
          }
        ]
      }
    },
    "/object/bgp/hostname/{hostname}": {
      "get": {
        "tags": [
          "bgp"
        ],
        "summary": "Get all BGP session information for a given network device by hostname",
        "description": "For every BGP session running on the network device, retrieves local node hostname, remote peer hostname, interface, router ID, and ASN, timestamp, VRF, connection state, IP and EVPN prefixes, and so forth. Refer to the BGPSession model for all data collected.",
        "produces": [
          "application/json"
        ],
        "parameters": [
          {
            "name": "hostname",
            "in": "path",
            "description": "User-specified name for a network switch or host. For example, leaf01, spine04, host-6, engr-1, tor-22.",
            "required": true,
            "type": "string"
          },
          {
            "name": "eq_timestamp",
            "in": "query",
            "description": "Display results for a given time. Time must be entered in Epoch format. For example, to display the results for Monday February 13, 2019 at 1:25 pm, use a time converter and enter 1550082300.",
            "required": false,
            "type": "integer"
          },
          {
            "name": "count",
            "in": "query",
            "description": "Number of entries to display starting from the offset value. For example, a count of 100 displays 100 entries at a time.",
            "required": false,
            "type": "integer"
          },
          {
            "name": "offset",
            "in": "query",
            "description": "Used in combination with count, offset specifies the starting location within the set of entries returned. For example, an offset of 100 would display results beginning with entry 101.",
            "required": false,
            "type": "integer"
          }
        ],
        "responses": {
          "200": {
            "description": "successful operation",
            "schema": {
              "type": "array",
              "items": {
                "$ref": "#/definitions/BgpSession"
              }
            }
          }
        },
        "security": [
          {
            "jwt": []
          }
        ]
      }
    },
    "/object/clag": {
      "get": {
        "tags": [
          "clag"
        ],
        "summary": "Get all CLAG session information for all network devices",
        "description": "For every Cumulus multiple Link Aggregation (CLAG) session running on the network, retrieves local node hostname, CLAG sysmac, remote peer role, state, and interface, backup IP address, bond status, and so forth. Refer to the ClagSessionInfo model for all data collected.",
        "produces": [
          "application/json"
        ],
        "parameters": [
          {
            "name": "eq_timestamp",
            "in": "query",
            "description": "Display results for a given time. Time must be entered in Epoch format. For example, to display the results for Monday February 13, 2019 at 1:25 pm, use a time converter and enter 1550082300.",
            "required": false,
            "type": "integer"
          },
          {
            "name": "count",
            "in": "query",
            "description": "Number of entries to display starting from the offset value. For example, a count of 100 displays 100 entries at a time.",
            "required": false,
            "type": "integer"
          },
          {
            "name": "offset",
            "in": "query",
            "description": "Used in combination with count, offset specifies the starting location within the set of entries returned. For example, an offset of 100 would display results beginning with entry 101.",
            "required": false,
            "type": "integer"
          }
        ],
        "responses": {
          "200": {
            "description": "successful operation",
            "schema": {
              "type": "array",
              "items": {
                "$ref": "#/definitions/ClagSessionInfo"
              }
            }
          }
        },
        "security": [
          {
            "jwt": []
          }
        ]
      }
    },
    "/object/clag/hostname/{hostname}": {
      "get": {
        "tags": [
          "clag"
        ],
        "summary": "Get all CLAG session information for a given network device by hostname",
        "description": "For every CLAG session running on the network device, retrieves local node hostname, CLAG sysmac, remote peer role, state, and interface, backup IP address, bond status, and so forth. Refer to the ClagSessionInfo model for all data collected.",
        "produces": [
          "application/json"
        ],
        "parameters": [
          {
            "name": "hostname",
            "in": "path",
            "description": "User-specified name for a network switch or host. For example, leaf01, spine04, host-6, engr-1, tor-22.",
            "required": true,
            "type": "string"
          },
          {
            "name": "eq_timestamp",
            "in": "query",
            "description": "Display results for a given time. Time must be entered in Epoch format. For example, to display the results for Monday February 13, 2019 at 1:25 pm, use a time converter and enter 1550082300.",
            "required": false,
            "type": "integer"
          },
          {
            "name": "count",
            "in": "query",
            "description": "Number of entries to display starting from the offset value. For example, a count of 100 displays 100 entries at a time.",
            "required": false,
            "type": "integer"
          },
          {
            "name": "offset",
            "in": "query",
            "description": "Used in combination with count, offset specifies the starting location within the set of entries returned. For example, an offset of 100 would display results beginning with entry 101.",
            "required": false,
            "type": "integer"
          }
        ],
        "responses": {
          "200": {
            "description": "successful operation",
            "schema": {
              "type": "array",
              "items": {
                "$ref": "#/definitions/ClagSessionInfo"
              }
            }
          }
        },
        "security": [
          {
            "jwt": []
          }
        ]
      }
    },
    "/object/events": {
      "get": {
        "tags": [
          "events"
        ],
        "summary": "Get all events from across the entire network",
        "description": "Retrieves all alarm (critical severity) and informational (warning, info and debug severity) events from all network devices and services.",
        "produces": [
          "application/json"
        ],
        "parameters": [
          {
            "name": "gt_timestamp",
            "in": "query",
            "description": "Used in combination with lt_timestamp, sets the lower limit of the time range to display. Uses Epoch format. Cannot be used with eq_timestamp. For example, to display events between Monday February 11, 2019 at 1:00am and Tuesday February 12, 2019 at 1:00am, lt_timestamp would be entered as 1549864800 and gt_timestamp would be entered as 1549951200.",
            "required": false,
            "type": "integer"
          },
          {
            "name": "lt_timestamp",
            "in": "query",
            "description": "Used in combination with gt_timestamp, sets the upper limit of the time range to display. Uses Epoch format. Cannot be used with eq_timestamp. For example, to display events between Monday February 11, 2019 at 1:00am and Tuesday February 12, 2019 at 1:00am, lt_timestamp would be entered as 1549864800 and gt_timestamp would be entered as 1549951200.",
            "required": false,
            "type": "integer"
          },
          {
            "name": "eq_timestamp",
            "in": "query",
            "description": "Display results for a given time. Time must be entered in Epoch format. For example, to display the results for Monday February 13, 2019 at 1:25 pm, use a time converter and enter 1550082300.",
            "required": false,
            "type": "integer"
          },
          {
            "name": "count",
            "in": "query",
            "description": "Number of entries to display starting from the offset value. For example, a count of 100 displays 100 entries at a time.",
            "required": false,
            "type": "integer"
          },
          {
            "name": "offset",
            "in": "query",
            "description": "Used in combination with count, offset specifies the starting location within the set of entries returned. For example, an offset of 100 would display results beginning with entry 101.",
            "required": false,
            "type": "integer"
          }
        ],
        "responses": {
          "200": {
            "description": "successful operation",
            "schema": {
              "type": "array",
              "items": {
                "type": "string"
              }
            }
          }
        },
        "security": [
          {
            "jwt": []
          }
        ]
      }
    },
    "/object/evpn": {
      "get": {
        "tags": [
          "evpn"
        ],
        "summary": "Get all EVPN session information from across the entire network",
        "description": "For every Ethernet Virtual Private Network (EVPN) session running on the network, retrieves hostname, VNI status, origin IP address, timestamp, export and import routes, and so forth. Refer to the Evpn model for all data collected.",
        "produces": [
          "application/json"
        ],
        "parameters": [
          {
            "name": "eq_timestamp",
            "in": "query",
            "description": "Display results for a given time. Time must be entered in Epoch format. For example, to display the results for Monday February 13, 2019 at 1:25 pm, use a time converter and enter 1550082300.",
            "required": false,
            "type": "integer"
          },
          {
            "name": "count",
            "in": "query",
            "description": "Number of entries to display starting from the offset value. For example, a count of 100 displays 100 entries at a time.",
            "required": false,
            "type": "integer"
          },
          {
            "name": "offset",
            "in": "query",
            "description": "Used in combination with count, offset specifies the starting location within the set of entries returned. For example, an offset of 100 would display results beginning with entry 101.",
            "required": false,
            "type": "integer"
          }
        ],
        "responses": {
          "200": {
            "description": "successful operation",
            "schema": {
              "type": "array",
              "items": {
                "$ref": "#/definitions/Evpn"
              }
            }
          }
        },
        "security": [
          {
            "jwt": []
          }
        ]
      }
    },
    "/object/evpn/hostname/{hostname}": {
      "get": {
        "tags": [
          "evpn"
        ],
        "summary": "Get all EVPN session information from a given network device by hostname",
        "description": "For every EVPN session running on the network device, retrieves hostname, VNI status, origin IP address, timestamp, export and import routes, and so forth. Refer to the Evpn model for all data collected.",
        "produces": [
          "application/json"
        ],
        "parameters": [
          {
            "name": "hostname",
            "in": "path",
            "description": "User-specified name for a network switch or host. For example, leaf01, spine04, host-6, engr-1, tor-22.",
            "required": true,
            "type": "string"
          },
          {
            "name": "eq_timestamp",
            "in": "query",
            "description": "Display results for a given time. Time must be entered in Epoch format. For example, to display the results for Monday February 13, 2019 at 1:25 pm, use a time converter and enter 1550082300.",
            "required": false,
            "type": "integer"
          },
          {
            "name": "count",
            "in": "query",
            "description": "Number of entries to display starting from the offset value. For example, a count of 100 displays 100 entries at a time.",
            "required": false,
            "type": "integer"
          },
          {
            "name": "offset",
            "in": "query",
            "description": "Used in combination with count, offset specifies the starting location within the set of entries returned. For example, an offset of 100 would display results beginning with entry 101.",
            "required": false,
            "type": "integer"
          }
        ],
        "responses": {
          "200": {
            "description": "successful operation",
            "schema": {
              "type": "array",
              "items": {
                "$ref": "#/definitions/Evpn"
              }
            }
          }
        },
        "security": [
          {
            "jwt": []
          }
        ]
      }
    },
    "/object/interface": {
      "get": {
        "tags": [
          "interface"
        ],
        "summary": "Get software interface information for all network devices",
        "description": "Retrieves information about all software interfaces, including type and name of the interfaces, the hostnames of the device where they reside, state, VRF, and so forth. Refer to the Interface model for all data collected.",
        "produces": [
          "application/json"
        ],
        "parameters": [
          {
            "name": "eq_timestamp",
            "in": "query",
            "description": "Display results for a given time. Time must be entered in Epoch format. For example, to display the results for Monday February 13, 2019 at 1:25 pm, use a time converter and enter 1550082300.",
            "required": false,
            "type": "integer"
          },
          {
            "name": "count",
            "in": "query",
            "description": "Number of entries to display starting from the offset value. For example, a count of 100 displays 100 entries at a time.",
            "required": false,
            "type": "integer"
          },
          {
            "name": "offset",
            "in": "query",
            "description": "Used in combination with count, offset specifies the starting location within the set of entries returned. For example, an offset of 100 would display results beginning with entry 101.",
            "required": false,
            "type": "integer"
          }
        ],
        "responses": {
          "200": {
            "description": "successful operation",
            "schema": {
              "type": "array",
              "items": {
                "$ref": "#/definitions/Interface"
              }
            }
          }
        },
        "security": [
          {
            "jwt": []
          }
        ]
      }
    },
    "/object/interface/hostname/{hostname}": {
      "get": {
        "tags": [
          "interface"
        ],
        "summary": "Get software interface information for a given network device by hostname",
        "description": "Retrieves information about all software interfaces on a network device, including type and name of the interfaces, state, VRF, and so forth. Refer to the Interface model for all data collected.",
        "produces": [
          "application/json"
        ],
        "parameters": [
          {
            "name": "hostname",
            "in": "path",
            "description": "User-specified name for a network switch or host. For example, leaf01, spine04, host-6, engr-1, tor-22.",
            "required": true,
            "type": "string"
          },
          {
            "name": "eq_timestamp",
            "in": "query",
            "description": "Display results for a given time. Time must be entered in Epoch format. For example, to display the results for Monday February 13, 2019 at 1:25 pm, use a time converter and enter 1550082300.",
            "required": false,
            "type": "integer"
          },
          {
            "name": "count",
            "in": "query",
            "description": "Number of entries to display starting from the offset value. For example, a count of 100 displays 100 entries at a time.",
            "required": false,
            "type": "integer"
          },
          {
            "name": "offset",
            "in": "query",
            "description": "Used in combination with count, offset specifies the starting location within the set of entries returned. For example, an offset of 100 would display results beginning with entry 101.",
            "required": false,
            "type": "integer"
          }
        ],
        "responses": {
          "200": {
            "description": "successful operation",
            "schema": {
              "type": "array",
              "items": {
                "$ref": "#/definitions/Interface"
              }
            }
          }
        },
        "security": [
          {
            "jwt": []
          }
        ]
      }
    },
    "/object/inventory": {
      "get": {
        "tags": [
          "inventory"
        ],
        "summary": "Get component inventory information from all network devices",
        "description": "Retrieves the hardware and software component information, such as ASIC, platform, and OS vendor and version information, for all switches and hosts in your network. Refer to the InventoryOutput model for all data collected.",
        "produces": [
          "application/json"
        ],
        "parameters": [
          {
            "name": "eq_timestamp",
            "in": "query",
            "description": "Display results for a given time. Time must be entered in Epoch format. For example, to display the results for Monday February 13, 2019 at 1:25 pm, use a time converter and enter 1550082300.",
            "required": false,
            "type": "integer"
          },
          {
            "name": "count",
            "in": "query",
            "description": "Number of entries to display starting from the offset value. For example, a count of 100 displays 100 entries at a time.",
            "required": false,
            "type": "integer"
          },
          {
            "name": "offset",
            "in": "query",
            "description": "Used in combination with count, offset specifies the starting location within the set of entries returned. For example, an offset of 100 would display results beginning with entry 101.",
            "required": false,
            "type": "integer"
          }
        ],
        "responses": {
          "200": {
            "description": "successful operation",
            "schema": {
              "$ref": "#/definitions/InventoryOutput"
            }
          },
          "400": {
            "description": "Invalid Input"
          }
        },
        "security": [
          {
            "jwt": []
          }
        ]
      }
    },
    "/object/inventory/hostname/{hostname}": {
      "get": {
        "tags": [
          "inventory"
        ],
        "summary": "Get component inventory information from a given network device by hostname",
        "description": "Retrieves the hardware and software component information, such as ASIC, platform, and OS vendor and version information, for the given switch or host in your network. Refer to the InventoryOutput model for all data collected.",
        "produces": [
          "application/json"
        ],
        "parameters": [
          {
            "name": "hostname",
            "in": "path",
            "description": "User-specified name for a network switch or host. For example, leaf01, spine04, host-6, engr-1, tor-22.",
            "required": true,
            "type": "string"
          },
          {
            "name": "eq_timestamp",
            "in": "query",
            "description": "Display results for a given time. Time must be entered in Epoch format. For example, to display the results for Monday February 13, 2019 at 1:25 pm, use a time converter and enter 1550082300.",
            "required": false,
            "type": "integer"
          },
          {
            "name": "count",
            "in": "query",
            "description": "Number of entries to display starting from the offset value. For example, a count of 100 displays 100 entries at a time.",
            "required": false,
            "type": "integer"
          },
          {
            "name": "offset",
            "in": "query",
            "description": "Used in combination with count, offset specifies the starting location within the set of entries returned. For example, an offset of 100 would display results beginning with entry 101.",
            "required": false,
            "type": "integer"
          }
        ],
        "responses": {
          "200": {
            "description": "successful operation",
            "schema": {
              "$ref": "#/definitions/InventoryOutput"
            }
          },
          "400": {
            "description": "Invalid Input"
          }
        },
        "security": [
          {
            "jwt": []
          }
        ]
      }
    },
    "/object/lldp": {
      "get": {
        "tags": [
          "lldp"
        ],
        "summary": "Get LLDP information for all network devices",
        "description": "Retrieves Link Layer Discovery Protocol (LLDP) information, such as hostname, interface name, peer hostname, interface name, bridge, router, OS, timestamp, for all switches and hosts in the network. Refer to the LLDP model for all data collected.",
        "produces": [
          "application/json"
        ],
        "parameters": [
          {
            "name": "eq_timestamp",
            "in": "query",
            "description": "Display results for a given time. Time must be entered in Epoch format. For example, to display the results for Monday February 13, 2019 at 1:25 pm, use a time converter and enter 1550082300.",
            "required": false,
            "type": "integer"
          },
          {
            "name": "count",
            "in": "query",
            "description": "Number of entries to display starting from the offset value. For example, a count of 100 displays 100 entries at a time.",
            "required": false,
            "type": "integer"
          },
          {
            "name": "offset",
            "in": "query",
            "description": "Used in combination with count, offset specifies the starting location within the set of entries returned. For example, an offset of 100 would display results beginning with entry 101.",
            "required": false,
            "type": "integer"
          }
        ],
        "responses": {
          "200": {
            "description": "successful operation",
            "schema": {
              "type": "array",
              "items": {
                "$ref": "#/definitions/LLDP"
              }
            }
          }
        },
        "security": [
          {
            "jwt": []
          }
        ]
      }
    },
    "/object/lldp/hostname/{hostname}": {
      "get": {
        "tags": [
          "lldp"
        ],
        "summary": "Get LLDP information for a given network device by hostname",
        "description": "Retrieves Link Layer Discovery Protocol (LLDP) information, such as hostname, interface name, peer hostname, interface name, bridge, router, OS, timestamp, for the given switch or host. Refer to the LLDP model for all data collected.",
        "produces": [
          "application/json"
        ],
        "parameters": [
          {
            "name": "hostname",
            "in": "path",
            "description": "User-specified name for a network switch or host. For example, leaf01, spine04, host-6, engr-1, tor-22.",
            "required": true,
            "type": "string"
          },
          {
            "name": "eq_timestamp",
            "in": "query",
            "description": "Display results for a given time. Time must be entered in Epoch format. For example, to display the results for Monday February 13, 2019 at 1:25 pm, use a time converter and enter 1550082300.",
            "required": false,
            "type": "integer"
          },
          {
            "name": "count",
            "in": "query",
            "description": "Number of entries to display starting from the offset value. For example, a count of 100 displays 100 entries at a time.",
            "required": false,
            "type": "integer"
          },
          {
            "name": "offset",
            "in": "query",
            "description": "Used in combination with count, offset specifies the starting location within the set of entries returned. For example, an offset of 100 would display results beginning with entry 101.",
            "required": false,
            "type": "integer"
          }
        ],
        "responses": {
          "200": {
            "description": "successful operation",
            "schema": {
              "type": "array",
              "items": {
                "$ref": "#/definitions/LLDP"
              }
            }
          }
        },
        "security": [
          {
            "jwt": []
          }
        ]
      }
    },
    "/object/macfdb": {
      "get": {
        "tags": [
          "macfdb"
        ],
        "summary": "Get all MAC FDB information for all network devices",
        "description": "Retrieves all MAC address forwarding database (MACFDB) information for all switches and hosts in the network, such as MAC address, timestamp, next hop, destination, port, and VLAN. Refer to MacFdb model for all data collected.",
        "produces": [
          "application/json"
        ],
        "parameters": [
          {
            "name": "eq_timestamp",
            "in": "query",
            "description": "Display results for a given time. Time must be entered in Epoch format. For example, to display the results for Monday February 13, 2019 at 1:25 pm, use a time converter and enter 1550082300.",
            "required": false,
            "type": "integer"
          },
          {
            "name": "count",
            "in": "query",
            "description": "Number of entries to display starting from the offset value. For example, a count of 100 displays 100 entries at a time.",
            "required": false,
            "type": "integer"
          },
          {
            "name": "offset",
            "in": "query",
            "description": "Used in combination with count, offset specifies the starting location within the set of entries returned. For example, an offset of 100 would display results beginning with entry 101.",
            "required": false,
            "type": "integer"
          }
        ],
        "responses": {
          "200": {
            "description": "successful operation",
            "schema": {
              "type": "array",
              "items": {
                "$ref": "#/definitions/MacFdb"
              }
            }
          }
        },
        "security": [
          {
            "jwt": []
          }
        ]
      }
    },
    "/object/macfdb/hostname/{hostname}": {
      "get": {
        "tags": [
          "macfdb"
        ],
        "summary": "Get all MAC FDB information for a given network device by hostname",
        "description": "Retrieves all MAC address forwarding database (MACFDB) information for a given switch or host in the network, such as MAC address, timestamp, next hop, destination, port, and VLAN. Refer to MacFdb model for all data collected.",
        "produces": [
          "application/json"
        ],
        "parameters": [
          {
            "name": "hostname",
            "in": "path",
            "description": "User-specified name for a network switch or host. For example, leaf01, spine04, host-6, engr-1, tor-22.",
            "required": true,
            "type": "string"
          },
          {
            "name": "eq_timestamp",
            "in": "query",
            "description": "Display results for a given time. Time must be entered in Epoch format. For example, to display the results for Monday February 13, 2019 at 1:25 pm, use a time converter and enter 1550082300.",
            "required": false,
            "type": "integer"
          },
          {
            "name": "count",
            "in": "query",
            "description": "Number of entries to display starting from the offset value. For example, a count of 100 displays 100 entries at a time.",
            "required": false,
            "type": "integer"
          },
          {
            "name": "offset",
            "in": "query",
            "description": "Used in combination with count, offset specifies the starting location within the set of entries returned. For example, an offset of 100 would display results beginning with entry 101.",
            "required": false,
            "type": "integer"
          }
        ],
        "responses": {
          "200": {
            "description": "successful operation",
            "schema": {
              "type": "array",
              "items": {
                "$ref": "#/definitions/MacFdb"
              }
            }
          }
        },
        "security": [
          {
            "jwt": []
          }
        ]
      }
    },
    "/object/mstp": {
      "get": {
        "tags": [
          "mstp"
        ],
        "summary": "Get all MSTP information from all network devices",
        "description": "Retrieves all Multiple Spanning Tree Protocol (MSTP) information, including bridge and port information, changes made to topology, and so forth for all switches and hosts in the network. Refer to MstpInfo model for all data collected.",
        "produces": [
          "application/json"
        ],
        "parameters": [
          {
            "name": "eq_timestamp",
            "in": "query",
            "description": "Display results for a given time. Time must be entered in Epoch format. For example, to display the results for Monday February 13, 2019 at 1:25 pm, use a time converter and enter 1550082300.",
            "required": false,
            "type": "integer"
          },
          {
            "name": "count",
            "in": "query",
            "description": "Number of entries to display starting from the offset value. For example, a count of 100 displays 100 entries at a time.",
            "required": false,
            "type": "integer"
          },
          {
            "name": "offset",
            "in": "query",
            "description": "Used in combination with count, offset specifies the starting location within the set of entries returned. For example, an offset of 100 would display results beginning with entry 101.",
            "required": false,
            "type": "integer"
          }
        ],
        "responses": {
          "200": {
            "description": "successful operation",
            "schema": {
              "type": "array",
              "items": {
                "$ref": "#/definitions/MstpInfo"
              }
            }
          }
        },
        "security": [
          {
            "jwt": []
          }
        ]
      }
    },
    "/object/mstp/hostname/{hostname}": {
      "get": {
        "tags": [
          "mstp"
        ],
        "summary": "Get all MSTP information from a given network device by hostname",
        "description": "Retrieves all MSTP information, including bridge and port information, changes made to topology, and so forth for a given switch or host in the network.  Refer to MstpInfo model for all data collected.",
        "produces": [
          "application/json"
        ],
        "parameters": [
          {
            "name": "hostname",
            "in": "path",
            "description": "User-specified name for a network switch or host. For example, leaf01, spine04, host-6, engr-1, tor-22.",
            "required": true,
            "type": "string"
          },
          {
            "name": "eq_timestamp",
            "in": "query",
            "description": "Display results for a given time. Time must be entered in Epoch format. For example, to display the results for Monday February 13, 2019 at 1:25 pm, use a time converter and enter 1550082300.",
            "required": false,
            "type": "integer"
          },
          {
            "name": "count",
            "in": "query",
            "description": "Number of entries to display starting from the offset value. For example, a count of 100 displays 100 entries at a time.",
            "required": false,
            "type": "integer"
          },
          {
            "name": "offset",
            "in": "query",
            "description": "Used in combination with count, offset specifies the starting location within the set of entries returned. For example, an offset of 100 would display results beginning with entry 101.",
            "required": false,
            "type": "integer"
          }
        ],
        "responses": {
          "200": {
            "description": "successful operation",
            "schema": {
              "type": "array",
              "items": {
                "$ref": "#/definitions/MstpInfo"
              }
            }
          }
        },
        "security": [
          {
            "jwt": []
          }
        ]
      }
    },
    "/object/neighbor": {
      "get": {
        "tags": [
          "neighbor"
        ],
        "summary": "Get neighbor information for all network devices",
        "description": "Retrieves neighbor information, such as hostname, addresses, VRF, interface name and index, for all switches and hosts in the network.  Refer to Neighbor model for all data collected.",
        "produces": [
          "application/json"
        ],
        "parameters": [
          {
            "name": "eq_timestamp",
            "in": "query",
            "description": "Display results for a given time. Time must be entered in Epoch format. For example, to display the results for Monday February 13, 2019 at 1:25 pm, use a time converter and enter 1550082300.",
            "required": false,
            "type": "integer"
          },
          {
            "name": "count",
            "in": "query",
            "description": "Number of entries to display starting from the offset value. For example, a count of 100 displays 100 entries at a time.",
            "required": false,
            "type": "integer"
          },
          {
            "name": "offset",
            "in": "query",
            "description": "Used in combination with count, offset specifies the starting location within the set of entries returned. For example, an offset of 100 would display results beginning with entry 101.",
            "required": false,
            "type": "integer"
          }
        ],
        "responses": {
          "200": {
            "description": "successful operation",
            "schema": {
              "type": "array",
              "items": {
                "$ref": "#/definitions/Neighbor"
              }
            }
          }
        },
        "security": [
          {
            "jwt": []
          }
        ]
      }
    },
    "/object/neighbor/hostname/{hostname}": {
      "get": {
        "tags": [
          "neighbor"
        ],
        "summary": "Get neighbor information for a given network device by hostname",
        "description": "Retrieves neighbor information, such as hostname, addresses, VRF, interface name and index, for a given switch or host in the network.  Refer to Neighbor model for all data collected.",
        "produces": [
          "application/json"
        ],
        "parameters": [
          {
            "name": "hostname",
            "in": "path",
            "description": "User-specified name for a network switch or host. For example, leaf01, spine04, host-6, engr-1, tor-22.",
            "required": true,
            "type": "string"
          },
          {
            "name": "eq_timestamp",
            "in": "query",
            "description": "Display results for a given time. Time must be entered in Epoch format. For example, to display the results for Monday February 13, 2019 at 1:25 pm, use a time converter and enter 1550082300.",
            "required": false,
            "type": "integer"
          },
          {
            "name": "count",
            "in": "query",
            "description": "Number of entries to display starting from the offset value. For example, a count of 100 displays 100 entries at a time.",
            "required": false,
            "type": "integer"
          },
          {
            "name": "offset",
            "in": "query",
            "description": "Used in combination with count, offset specifies the starting location within the set of entries returned. For example, an offset of 100 would display results beginning with entry 101.",
            "required": false,
            "type": "integer"
          }
        ],
        "responses": {
          "200": {
            "description": "successful operation",
            "schema": {
              "type": "array",
              "items": {
                "$ref": "#/definitions/Neighbor"
              }
            }
          }
        },
        "security": [
          {
            "jwt": []
          }
        ]
      }
    },
    "/object/node": {
      "get": {
        "tags": [
          "node"
        ],
        "summary": "Get device status for all network devices",
        "description": "Retrieves hostname, uptime, last update, boot and re-initialization time, version, NTP and DB state, timestamp, and its current state (active or not) for all switches and hosts in the network.",
        "produces": [
          "application/json"
        ],
        "parameters": [
          {
            "name": "eq_timestamp",
            "in": "query",
            "description": "Display results for a given time. Time must be entered in Epoch format. For example, to display the results for Monday February 13, 2019 at 1:25 pm, use a time converter and enter 1550082300.",
            "required": false,
            "type": "integer"
          },
          {
            "name": "count",
            "in": "query",
            "description": "Number of entries to display starting from the offset value. For example, a count of 100 displays 100 entries at a time.",
            "required": false,
            "type": "integer"
          },
          {
            "name": "offset",
            "in": "query",
            "description": "Used in combination with count, offset specifies the starting location within the set of entries returned. For example, an offset of 100 would display results beginning with entry 101.",
            "required": false,
            "type": "integer"
          }
        ],
        "responses": {
          "200": {
            "description": "successful operation",
            "schema": {
              "type": "array",
              "items": {
                "$ref": "#/definitions/NODE"
              }
            }
          }
        },
        "security": [
          {
            "jwt": []
          }
        ]
      }
    },
    "/object/node/hostname/{hostname}": {
      "get": {
        "tags": [
          "node"
        ],
        "summary": "Get device status for a given network device by hostname",
        "description": "Retrieves hostname, uptime, last update, boot and re-initialization time, version, NTP and DB state, timestamp, and its current state (active or not) for a given switch or host in the network.",
        "produces": [
          "application/json"
        ],
        "parameters": [
          {
            "name": "hostname",
            "in": "path",
            "description": "User-specified name for a network switch or host. For example, leaf01, spine04, host-6, engr-1, tor-22.",
            "required": true,
            "type": "string"
          },
          {
            "name": "eq_timestamp",
            "in": "query",
            "description": "Display results for a given time. Time must be entered in Epoch format. For example, to display the results for Monday February 13, 2019 at 1:25 pm, use a time converter and enter 1550082300.",
            "required": false,
            "type": "integer"
          },
          {
            "name": "count",
            "in": "query",
            "description": "Number of entries to display starting from the offset value. For example, a count of 100 displays 100 entries at a time.",
            "required": false,
            "type": "integer"
          },
          {
            "name": "offset",
            "in": "query",
            "description": "Used in combination with count, offset specifies the starting location within the set of entries returned. For example, an offset of 100 would display results beginning with entry 101.",
            "required": false,
            "type": "integer"
          }
        ],
        "responses": {
          "200": {
            "description": "successful operation",
            "schema": {
              "type": "array",
              "items": {
                "$ref": "#/definitions/NODE"
              }
            }
          }
        },
        "security": [
          {
            "jwt": []
          }
        ]
      }
    },
    "/object/ntp": {
      "get": {
        "tags": [
          "ntp"
        ],
        "summary": "Get all NTP information for all network devices",
        "description": "Retrieves all Network Time Protocol (NTP) configuration and status information, such as whether the service is running and if it is in time synchronization, for all switches and hosts in the network. Refer to the NTP model for all data collected.",
        "produces": [
          "application/json"
        ],
        "parameters": [
          {
            "name": "eq_timestamp",
            "in": "query",
            "description": "Display results for a given time. Time must be entered in Epoch format. For example, to display the results for Monday February 13, 2019 at 1:25 pm, use a time converter and enter 1550082300.",
            "required": false,
            "type": "integer"
          },
          {
            "name": "count",
            "in": "query",
            "description": "Number of entries to display starting from the offset value. For example, a count of 100 displays 100 entries at a time.",
            "required": false,
            "type": "integer"
          },
          {
            "name": "offset",
            "in": "query",
            "description": "Used in combination with count, offset specifies the starting location within the set of entries returned. For example, an offset of 100 would display results beginning with entry 101.",
            "required": false,
            "type": "integer"
          }
        ],
        "responses": {
          "200": {
            "description": "successful operation",
            "schema": {
              "type": "array",
              "items": {
                "$ref": "#/definitions/NTP"
              }
            }
          }
        },
        "security": [
          {
            "jwt": []
          }
        ]
      }
    },
    "/object/ntp/hostname/{hostname}": {
      "get": {
        "tags": [
          "ntp"
        ],
        "summary": "Get all NTP information for a given network device by hostname",
        "description": "Retrieves all Network Time Protocol (NTP) configuration and status information, such as whether the service is running and if it is in time synchronization, for a given switch or host in the network. Refer to the NTP model for all data collected.",
        "produces": [
          "application/json"
        ],
        "parameters": [
          {
            "name": "hostname",
            "in": "path",
            "description": "User-specified name for a network switch or host. For example, leaf01, spine04, host-6, engr-1, tor-22.",
            "required": true,
            "type": "string"
          },
          {
            "name": "eq_timestamp",
            "in": "query",
            "description": "Display results for a given time. Time must be entered in Epoch format. For example, to display the results for Monday February 13, 2019 at 1:25 pm, use a time converter and enter 1550082300.",
            "required": false,
            "type": "integer"
          },
          {
            "name": "count",
            "in": "query",
            "description": "Number of entries to display starting from the offset value. For example, a count of 100 displays 100 entries at a time.",
            "required": false,
            "type": "integer"
          },
          {
            "name": "offset",
            "in": "query",
            "description": "Used in combination with count, offset specifies the starting location within the set of entries returned. For example, an offset of 100 would display results beginning with entry 101.",
            "required": false,
            "type": "integer"
          }
        ],
        "responses": {
          "200": {
            "description": "successful operation",
            "schema": {
              "type": "array",
              "items": {
                "$ref": "#/definitions/NTP"
              }
            }
          }
        },
        "security": [
          {
            "jwt": []
          }
        ]
      }
    },
    "/object/port": {
      "get": {
        "tags": [
          "port"
        ],
        "summary": "Get all information for all physical ports on all network devices",
        "description": "Retrieves all physical port information, such as speed, connector, vendor, part and serial number, and FEC support, for all network devices. Refer to Port model for all data collected.",
        "produces": [
          "application/json"
        ],
        "parameters": [
          {
            "name": "eq_timestamp",
            "in": "query",
            "description": "Display results for a given time. Time must be entered in Epoch format. For example, to display the results for Monday February 13, 2019 at 1:25 pm, use a time converter and enter 1550082300.",
            "required": false,
            "type": "integer"
          },
          {
            "name": "count",
            "in": "query",
            "description": "Number of entries to display starting from the offset value. For example, a count of 100 displays 100 entries at a time.",
            "required": false,
            "type": "integer"
          },
          {
            "name": "offset",
            "in": "query",
            "description": "Used in combination with count, offset specifies the starting location within the set of entries returned. For example, an offset of 100 would display results beginning with entry 101.",
            "required": false,
            "type": "integer"
          }
        ],
        "responses": {
          "200": {
            "description": "successful operation",
            "schema": {
              "type": "array",
              "items": {
                "$ref": "#/definitions/Port"
              }
            }
          }
        },
        "security": [
          {
            "jwt": []
          }
        ]
      }
    },
    "/object/port/hostname/{hostname}": {
      "get": {
        "tags": [
          "port"
        ],
        "summary": "Get all information for all physical ports on a given network device by hostname",
        "description": "Retrieves all physical port information, such as speed, connector, vendor, part and serial number, and FEC support, for a given switch or host in the network. Refer to Port model for all data collected.",
        "produces": [
          "application/json"
        ],
        "parameters": [
          {
            "name": "hostname",
            "in": "path",
            "description": "User-specified name for a network switch or host. For example, leaf01, spine04, host-6, engr-1, tor-22.",
            "required": true,
            "type": "string"
          },
          {
            "name": "eq_timestamp",
            "in": "query",
            "description": "Display results for a given time. Time must be entered in Epoch format. For example, to display the results for Monday February 13, 2019 at 1:25 pm, use a time converter and enter 1550082300.",
            "required": false,
            "type": "integer"
          },
          {
            "name": "count",
            "in": "query",
            "description": "Number of entries to display starting from the offset value. For example, a count of 100 displays 100 entries at a time.",
            "required": false,
            "type": "integer"
          },
          {
            "name": "offset",
            "in": "query",
            "description": "Used in combination with count, offset specifies the starting location within the set of entries returned. For example, an offset of 100 would display results beginning with entry 101.",
            "required": false,
            "type": "integer"
          }
        ],
        "responses": {
          "200": {
            "description": "successful operation",
            "schema": {
              "type": "array",
              "items": {
                "$ref": "#/definitions/Port"
              }
            }
          }
        },
        "security": [
          {
            "jwt": []
          }
        ]
      }
    },
    "/object/route": {
      "get": {
        "tags": [
          "route"
        ],
        "summary": "Get all route information for all network devices",
        "description": "Retrieves route information, such as VRF, source, next hops, origin, protocol, and prefix, for all switches and hosts in the network. Refer to Route model for all data collected.",
        "produces": [
          "application/json"
        ],
        "parameters": [
          {
            "name": "eq_timestamp",
            "in": "query",
            "description": "Display results for a given time. Time must be entered in Epoch format. For example, to display the results for Monday February 13, 2019 at 1:25 pm, use a time converter and enter 1550082300.",
            "required": false,
            "type": "integer"
          },
          {
            "name": "count",
            "in": "query",
            "description": "Number of entries to display starting from the offset value. For example, a count of 100 displays 100 entries at a time.",
            "required": false,
            "type": "integer"
          },
          {
            "name": "offset",
            "in": "query",
            "description": "Used in combination with count, offset specifies the starting location within the set of entries returned. For example, an offset of 100 would display results beginning with entry 101.",
            "required": false,
            "type": "integer"
          }
        ],
        "responses": {
          "200": {
            "description": "successful operation",
            "schema": {
              "type": "array",
              "items": {
                "$ref": "#/definitions/Route"
              }
            }
          }
        },
        "security": [
          {
            "jwt": []
          }
        ]
      }
    },
    "/object/route/hostname/{hostname}": {
      "get": {
        "tags": [
          "route"
        ],
        "summary": "Get all route information for a given network device by hostname",
        "description": "Retrieves route information, such as VRF, source, next hops, origin, protocol, and prefix, for a given switch or host in the network. Refer to Route model for all data collected.",
        "produces": [
          "application/json"
        ],
        "parameters": [
          {
            "name": "hostname",
            "in": "path",
            "description": "User-specified name for a network switch or host. For example, leaf01, spine04, host-6, engr-1, tor-22.",
            "required": true,
            "type": "string"
          },
          {
            "name": "eq_timestamp",
            "in": "query",
            "description": "Display results for a given time. Time must be entered in Epoch format. For example, to display the results for Monday February 13, 2019 at 1:25 pm, use a time converter and enter 1550082300.",
            "required": false,
            "type": "integer"
          },
          {
            "name": "count",
            "in": "query",
            "description": "Number of entries to display starting from the offset value. For example, a count of 100 displays 100 entries at a time.",
            "required": false,
            "type": "integer"
          },
          {
            "name": "offset",
            "in": "query",
            "description": "Used in combination with count, offset specifies the starting location within the set of entries returned. For example, an offset of 100 would display results beginning with entry 101.",
            "required": false,
            "type": "integer"
          }
        ],
        "responses": {
          "200": {
            "description": "successful operation",
            "schema": {
              "type": "array",
              "items": {
                "$ref": "#/definitions/Route"
              }
            }
          }
        },
        "security": [
          {
            "jwt": []
          }
        ]
      }
    },
    "/object/sensor": {
      "get": {
        "tags": [
          "sensor"
        ],
        "summary": "Get all sensor information for all network devices",
        "description": "Retrieves data from fan, temperature, and power supply unit sensors, such as their name, state, and threshold status, for all switches and hosts in the network. Refer to Sensor model for all data collected.",
        "produces": [
          "application/json"
        ],
        "parameters": [
          {
            "name": "eq_timestamp",
            "in": "query",
            "description": "Display results for a given time. Time must be entered in Epoch format. For example, to display the results for Monday February 13, 2019 at 1:25 pm, use a time converter and enter 1550082300.",
            "required": false,
            "type": "integer"
          },
          {
            "name": "count",
            "in": "query",
            "description": "Number of entries to display starting from the offset value. For example, a count of 100 displays 100 entries at a time.",
            "required": false,
            "type": "integer"
          },
          {
            "name": "offset",
            "in": "query",
            "description": "Used in combination with count, offset specifies the starting location within the set of entries returned. For example, an offset of 100 would display results beginning with entry 101.",
            "required": false,
            "type": "integer"
          }
        ],
        "responses": {
          "200": {
            "description": "successful operation",
            "schema": {
              "type": "array",
              "items": {
                "$ref": "#/definitions/Sensor"
              }
            }
          }
        },
        "security": [
          {
            "jwt": []
          }
        ]
      }
    },
    "/object/sensor/hostname/{hostname}": {
      "get": {
        "tags": [
          "sensor"
        ],
        "summary": "Get all sensor information for a given network device by hostname",
        "description": "Retrieves data from fan, temperature, and power supply unit sensors, such as their name, state, and threshold status, for a given switch or host in the network. Refer to Sensor model for all data collected.",
        "produces": [
          "application/json"
        ],
        "parameters": [
          {
            "name": "hostname",
            "in": "path",
            "description": "User-specified name for a network switch or host. For example, leaf01, spine04, host-6, engr-1, tor-22.",
            "required": true,
            "type": "string"
          },
          {
            "name": "eq_timestamp",
            "in": "query",
            "description": "Display results for a given time. Time must be entered in Epoch format. For example, to display the results for Monday February 13, 2019 at 1:25 pm, use a time converter and enter 1550082300.",
            "required": false,
            "type": "integer"
          },
          {
            "name": "count",
            "in": "query",
            "description": "Number of entries to display starting from the offset value. For example, a count of 100 displays 100 entries at a time.",
            "required": false,
            "type": "integer"
          },
          {
            "name": "offset",
            "in": "query",
            "description": "Used in combination with count, offset specifies the starting location within the set of entries returned. For example, an offset of 100 would display results beginning with entry 101.",
            "required": false,
            "type": "integer"
          }
        ],
        "responses": {
          "200": {
            "description": "successful operation",
            "schema": {
              "type": "array",
              "items": {
                "$ref": "#/definitions/Sensor"
              }
            }
          }
        },
        "security": [
          {
            "jwt": []
          }
        ]
      }
    },
    "/object/services": {
      "get": {
        "tags": [
          "services"
        ],
        "summary": "Get all services information for all network devices",
        "description": "Retrieves services information, such as XXX, for all switches and hosts in the network. Refer to Services for all data collected.",
        "produces": [
          "application/json"
        ],
        "parameters": [
          {
            "name": "eq_timestamp",
            "in": "query",
            "description": "Display results for a given time. Time must be entered in Epoch format. For example, to display the results for Monday February 13, 2019 at 1:25 pm, use a time converter and enter 1550082300.",
            "required": false,
            "type": "integer"
          },
          {
            "name": "count",
            "in": "query",
            "description": "Number of entries to display starting from the offset value. For example, a count of 100 displays 100 entries at a time.",
            "required": false,
            "type": "integer"
          },
          {
            "name": "offset",
            "in": "query",
            "description": "Used in combination with count, offset specifies the starting location within the set of entries returned. For example, an offset of 100 would display results beginning with entry 101.",
            "required": false,
            "type": "integer"
          }
        ],
        "responses": {
          "200": {
            "description": "successful operation",
            "schema": {
              "type": "array",
              "items": {
                "$ref": "#/definitions/Services"
              }
            }
          }
        },
        "security": [
          {
            "jwt": []
          }
        ]
      }
    },
    "/object/services/hostname/{hostname}": {
      "get": {
        "tags": [
          "services"
        ],
        "summary": "Get all services information for a given network device by hostname",
        "description": "Retrieves services information, such as XXX, for a given switch or host in the network. Refer to Services for all data collected.",
        "produces": [
          "application/json"
        ],
        "parameters": [
          {
            "name": "hostname",
            "in": "path",
            "description": "User-specified name for a network switch or host. For example, leaf01, spine04, host-6, engr-1, tor-22.",
            "required": true,
            "type": "string"
          },
          {
            "name": "eq_timestamp",
            "in": "query",
            "description": "Display results for a given time. Time must be entered in Epoch format. For example, to display the results for Monday February 13, 2019 at 1:25 pm, use a time converter and enter 1550082300.",
            "required": false,
            "type": "integer"
          },
          {
            "name": "count",
            "in": "query",
            "description": "Number of entries to display starting from the offset value. For example, a count of 100 displays 100 entries at a time.",
            "required": false,
            "type": "integer"
          },
          {
            "name": "offset",
            "in": "query",
            "description": "Used in combination with count, offset specifies the starting location within the set of entries returned. For example, an offset of 100 would display results beginning with entry 101.",
            "required": false,
            "type": "integer"
          }
        ],
        "responses": {
          "200": {
            "description": "successful operation",
            "schema": {
              "type": "array",
              "items": {
                "$ref": "#/definitions/Services"
              }
            }
          }
        },
        "security": [
          {
            "jwt": []
          }
        ]
      }
    },
    "/object/vlan": {
      "get": {
        "tags": [
          "vlan"
        ],
        "summary": "Get all VLAN information for all network devices",
        "description": "Retrieves VLAN information, such as hostname, interface name, associated VLANs, ports, and time of last change, for all switches and hosts in the network. Refer to Vlan model for all data collected.",
        "produces": [
          "application/json"
        ],
        "parameters": [
          {
            "name": "eq_timestamp",
            "in": "query",
            "description": "Display results for a given time. Time must be entered in Epoch format. For example, to display the results for Monday February 13, 2019 at 1:25 pm, use a time converter and enter 1550082300.",
            "required": false,
            "type": "integer"
          },
          {
            "name": "count",
            "in": "query",
            "description": "Number of entries to display starting from the offset value. For example, a count of 100 displays 100 entries at a time.",
            "required": false,
            "type": "integer"
          },
          {
            "name": "offset",
            "in": "query",
            "description": "Used in combination with count, offset specifies the starting location within the set of entries returned. For example, an offset of 100 would display results beginning with entry 101.",
            "required": false,
            "type": "integer"
          }
        ],
        "responses": {
          "200": {
            "description": "successful operation",
            "schema": {
              "type": "array",
              "items": {
                "$ref": "#/definitions/Vlan"
              }
            }
          }
        },
        "security": [
          {
            "jwt": []
          }
        ]
      }
    },
    "/object/vlan/hostname/{hostname}": {
      "get": {
        "tags": [
          "vlan"
        ],
        "summary": "Get all VLAN information for a given network device by hostname",
        "description": "Retrieves VLAN information, such as hostname, interface name, associated VLANs, ports, and time of last change, for a given switch or  host in the network. Refer to Vlan model for all data collected.",
        "produces": [
          "application/json"
        ],
        "parameters": [
          {
            "name": "hostname",
            "in": "path",
            "description": "User-specified name for a network switch or host. For example, leaf01, spine04, host-6, engr-1, tor-22.",
            "required": true,
            "type": "string"
          },
          {
            "name": "eq_timestamp",
            "in": "query",
            "description": "Display results for a given time. Time must be entered in Epoch format. For example, to display the results for Monday February 13, 2019 at 1:25 pm, use a time converter and enter 1550082300.",
            "required": false,
            "type": "integer"
          },
          {
            "name": "count",
            "in": "query",
            "description": "Number of entries to display starting from the offset value. For example, a count of 100 displays 100 entries at a time.",
            "required": false,
            "type": "integer"
          },
          {
            "name": "offset",
            "in": "query",
            "description": "Used in combination with count, offset specifies the starting location within the set of entries returned. For example, an offset of 100 would display results beginning with entry 101.",
            "required": false,
            "type": "integer"
          }
        ],
        "responses": {
          "200": {
            "description": "successful operation",
            "schema": {
              "type": "array",
              "items": {
                "$ref": "#/definitions/Vlan"
              }
            }
          }
        },
        "security": [
          {
            "jwt": []
          }
        ]
      }
    }
  },
  "securityDefinitions": {
    "jwt": {
      "type": "apiKey",
      "name": "Authorization",
      "in": "header"
    }
  },
  "definitions": {
    "Address": {
      "description": "This model contains descriptions of the data collected and returned by the Address endpoint.",
      "type": "object",
      "required": [
        "schema"
      ],
      "properties": {
        "schema": {
          "$ref": "#/definitions/Schema"
        },
        "opid": {
          "type": "integer",
          "format": "int32",
          "description": "Internal use only"
        },
        "hostname": {
          "type": "string",
          "description": "User-defined name for a device"
        },
        "ifname": {
          "type": "string",
          "description": "Name of a software (versus physical) interface"
        },
        "timestamp": {
          "type": "integer",
          "format": "int64",
          "description": "Date and time data was collected"
        },
        "prefix": {
          "type": "string",
          "description": "Address prefix for IPv4, IPv6, or EVPN traffic"
        },
        "mask": {
          "type": "integer",
          "format": "int32",
          "description": "Address mask for IPv4, IPv6, or EVPN traffic"
        },
        "is_ipv6": {
          "type": "boolean",
          "description": "Indicates whether address is an IPv6 address (true) or not (false)"
        },
        "vrf": {
          "type": "string",
          "description": "Virtual Route Forwarding interface name"
        },
        "db_state": {
          "type": "string",
          "description": "Internal use only"
        }
      }
    },
    "BgpSession": {
      "description": "This model contains descriptions of the data collected and returned by the BGP endpoint.",
      "type": "object",
      "required": [
        "schema"
      ],
      "properties": {
        "schema": {
          "$ref": "#/definitions/Schema"
        },
        "opid": {
          "type": "integer",
          "format": "int32",
          "description": "Internal use only"
        },
        "hostname": {
          "type": "string",
          "description": "User-defined name for a device"
        },
        "peer_name": {
          "type": "string",
          "description": "Interface name or hostname for a peer device"
        },
        "timestamp": {
          "type": "integer",
          "format": "int64",
          "description": "Date and time data was collected"
        },
        "db_state": {
          "type": "string",
          "description": "Internal use only"
        },
        "state": {
          "type": "string",
          "description": "Current state of the BGP session. Values include established and not established."
        },
        "peer_router_id": {
          "type": "string",
          "description": "If peer is a router, IP address of router"
        },
        "peer_asn": {
          "type": "integer",
          "format": "int64",
          "description": "Peer autonomous system number (ASN), identifier for a collection of IP networks and routers"
        },
        "peer_hostname": {
          "type": "string",
          "description": "User-defined name for the peer device"
        },
        "asn": {
          "type": "integer",
          "format": "int64",
          "description": "Host autonomous system number (ASN), identifier for a collection of IP networks and routers"
        },
        "reason": {
          "type": "string",
          "description": "Text describing the cause of, or trigger for, an event"
        },
        "ipv4_pfx_rcvd": {
          "type": "integer",
          "format": "int32",
          "description": "Address prefix received for an IPv4 address"
        },
        "ipv6_pfx_rcvd": {
          "type": "integer",
          "format": "int32",
          "description": "Address prefix received for an IPv6 address"
        },
        "evpn_pfx_rcvd": {
          "type": "integer",
          "format": "int32",
          "description": "Address prefix received for an EVPN address"
        },
        "last_reset_time": {
          "type": "number",
          "format": "float",
          "description": "Date and time at which the session was last established or reset"
        },
        "up_time": {
          "type": "number",
          "format": "float",
          "description": "Number of seconds the session has been established, in EPOCH notation"
        },
        "conn_estd": {
          "type": "integer",
          "format": "int32",
          "description": "Number of connections established for a given session"
        },
        "conn_dropped": {
          "type": "integer",
          "format": "int32",
          "description": "Number of dropped connections for a given session"
        },
        "upd8_rx": {
          "type": "integer",
          "format": "int32",
          "description": "Count of protocol messages received"
        },
        "upd8_tx": {
          "type": "integer",
          "format": "int32",
          "description": "Count of protocol messages transmitted"
        },
        "vrfid": {
          "type": "integer",
          "format": "int32",
          "description": "Integer identifier of the VRF interface when used"
        },
        "vrf": {
          "type": "string",
          "description": "Name of the Virtual Route Forwarding interface"
        },
        "tx_families": {
          "type": "string",
          "description": "Address families supported for the transmit session channel. Values include ipv4, ipv6, and evpn."
        },
        "rx_families": {
          "type": "string",
          "description": "Address families supported for the receive session channel. Values include ipv4, ipv6, and evpn."
        }
      }
    },
    "ClagSessionInfo": {
      "description": "This model contains descriptions of the data collected and returned by the CLAG endpoint.",
      "type": "object",
      "required": [
        "schema"
      ],
      "properties": {
        "schema": {
          "$ref": "#/definitions/Schema"
        },
        "opid": {
          "type": "integer",
          "format": "int32",
          "description": "Internal use only"
        },
        "hostname": {
          "type": "string",
          "description": "User-defined name for a device"
        },
        "clag_sysmac": {
          "type": "string",
          "description": "Unique MAC address for each bond interface pair. This must be a value between 44:38:39:ff:00:00 and 44:38:39:ff:ff:ff."
        },
        "timestamp": {
          "type": "integer",
          "format": "int64",
          "description": "Date and time the CLAG session was started, deleted, updated, or marked dead (device went down)"
        },
        "db_state": {
          "type": "string",
          "description": "Internal use only"
        },
        "peer_role": {
          "type": "string",
          "description": "Role of the peer device. Values include primary and secondary."
        },
        "peer_state": {
          "type": "boolean",
          "description": "Indicates if peer device is up (true) or down (false)"
        },
        "peer_if": {
          "type": "string",
          "description": "Name of the peer interface used for the session"
        },
        "backup_ip_active": {
          "type": "boolean",
          "description": "Indicates whether the backup IP address has been specified and is active (true) or not (false)"
        },
        "backup_ip": {
          "type": "string",
          "description": "IP address of the interface to use if the peerlink (or bond) goes down"
        },
        "single_bonds": {
          "type": "string",
          "description": "Identifies a set of interfaces connecting to only one of the two switches in the bond"
        },
        "dual_bonds": {
          "type": "string",
          "description": "Identifies a set of interfaces connecting to both switches in the bond"
        },
        "conflicted_bonds": {
          "type": "string",
          "description": "Identifies the set of interfaces in a bond that do not match on each end of the bond"
        },
        "proto_down_bonds": {
          "type": "string",
          "description": "Interface on the switch brought down by the clagd service. Value is blank if no interfaces are down due to the clagd service."
        },
        "vxlan_anycast": {
          "type": "string",
          "description": "Anycast IP address used for VXLAN termination"
        },
        "role": {
          "type": "string",
          "description": "Role of the host device. Values include primary and secondary."
        }
      }
    },
    "ErrorResponse": {
      "description": "Standard error response",
      "type": "object",
      "properties": {
        "message": {
          "type": "string",
          "description": "One or more errors have been encountered during the processing of the associated request"
        }
      }
    },
    "Evpn": {
      "description": "This model contains descriptions of the data collected and returned by the EVPN endpoint.",
      "type": "object",
      "required": [
        "schema"
      ],
      "properties": {
        "schema": {
          "$ref": "#/definitions/Schema"
        },
        "hostname": {
          "type": "string",
          "description": "User-defined name for a device"
        },
        "opid": {
          "type": "integer",
          "format": "int32",
          "description": "Internal use only"
        },
        "vni": {
          "type": "integer",
          "format": "int32",
          "description": "Name of the virtual network instance (VNI) where session is running"
        },
        "origin_ip": {
          "type": "string",
          "description": "Host device's local VXLAN tunnel IP address for the EVPN instance"
        },
        "timestamp": {
          "type": "integer",
          "format": "int64",
          "description": "Date and time the session was started, deleted, updated or marked as dead (device is down)"
        },
        "rd": {
          "type": "string",
          "description": "Route distinguisher used in the filtering mechanism for BGP route exchange"
        },
        "export_rt": {
          "type": "string",
          "description": "IP address and port of the export route target used in the filtering mechanism for BGP route exchange"
        },
        "import_rt": {
          "type": "string",
          "description": "IP address and port of the import route target used in the filtering mechanism for BGP route exchange"
        },
        "in_kernel": {
          "type": "boolean",
          "description": "Indicates whether the associated VNI is in the kernel (in kernel) or not (not in kernel)"
        },
        "adv_all_vni": {
          "type": "boolean",
          "description": "Indicates whether the VNI state is advertising all VNIs (true) or not (false)"
        },
        "adv_gw_ip": {
          "type": "string",
          "description": "Indicates whether the host device is advertising the gateway IP address (true) or not (false)"
        },
        "is_l3": {
          "type": "boolean",
          "description": "Indicates whether the session is part of a layer 3 configuration (true) or not (false)"
        },
        "db_state": {
          "type": "string",
          "description": "Internal use only"
        }
      }
    },
    "Field": {
      "type": "object",
      "required": [
        "aliases",
        "defaultValue",
        "doc",
        "jsonProps",
        "name",
        "objectProps",
        "order",
        "props",
        "schema"
      ],
      "properties": {
        "props": {
          "type": "object",
          "additionalProperties": {
            "type": "string"
          }
        },
        "name": {
          "type": "string"
        },
        "schema": {
          "$ref": "#/definitions/Schema"
        },
        "doc": {
          "type": "string"
        },
        "defaultValue": {
          "$ref": "#/definitions/JsonNode"
        },
        "order": {
          "type": "string",
          "enum": [
            "ASCENDING",
            "DESCENDING",
            "IGNORE"
          ]
        },
        "aliases": {
          "type": "array",
          "uniqueItems": true,
          "items": {
            "type": "string"
          }
        },
        "jsonProps": {
          "type": "object",
          "additionalProperties": {
            "$ref": "#/definitions/JsonNode"
          }
        },
        "objectProps": {
          "type": "object",
          "additionalProperties": {
            "type": "object",
            "properties": {}
          }
        }
      }
    },
    "Interface": {
      "description": "This model contains descriptions of the data collected and returned by the Interface endpoint.",
      "type": "object",
      "required": [
        "schema"
      ],
      "properties": {
        "schema": {
          "$ref": "#/definitions/Schema"
        },
        "opid": {
          "type": "integer",
          "format": "int32",
          "description": "Internal use only"
        },
        "hostname": {
          "type": "string",
          "description": "User-defined name for a device"
        },
        "type": {
          "type": "string",
          "description": "Identifier of the kind of interface. Values include bond, bridge, eth, loopback, macvlan, swp, vlan, vrf, and vxlan."
        },
        "timestamp": {
          "type": "integer",
          "format": "int64",
          "description": "Date and time the data was collected"
        },
        "last_changed": {
          "type": "integer",
          "format": "int64",
          "description": "Date and time the interface was started, deleted, updated or marked as dead (device is down)"
        },
        "ifname": {
          "type": "string",
          "description": "Name of the interface"
        },
        "state": {
          "type": "string",
          "description": "Indicates whether the interface is up or down"
        },
        "vrf": {
          "type": "string",
          "description": "Name of the virtual route forwarding (VRF) interface, if present"
        },
        "details": {
          "type": "string",
          "description": "???"
        }
      }
    },
    "InventoryModel": {
      "type": "object",
      "required": [
        "label",
        "value"
      ],
      "properties": {
        "label": {
          "type": "string"
        },
        "value": {
          "type": "integer",
          "format": "int32"
        }
      }
    },
    "InventoryOutput": {
      "type": "object",
      "properties": {
        "data": {
          "$ref": "#/definitions/InventorySampleClass"
        }
      }
    },
    "InventorySampleClass": {
      "type": "object",
      "properties": {
        "total": {
          "type": "integer",
          "format": "int32",
          "example": 100,
          "description": "total number of devices"
        },
        "os_version": {
          "$ref": "#/definitions/InventorySuperModel"
        },
        "os_vendor": {
          "$ref": "#/definitions/InventorySuperModel"
        },
        "asic": {
          "$ref": "#/definitions/InventorySuperModel"
        },
        "asic_vendor": {
          "$ref": "#/definitions/InventorySuperModel"
        },
        "asic_model": {
          "$ref": "#/definitions/InventorySuperModel"
        },
        "cl_license": {
          "$ref": "#/definitions/InventorySuperModel"
        },
        "agent_version": {
          "$ref": "#/definitions/InventorySuperModel"
        },
        "agent_state": {
          "$ref": "#/definitions/InventorySuperModel"
        },
        "platform": {
          "$ref": "#/definitions/InventorySuperModel"
        },
        "platform_vendor": {
          "$ref": "#/definitions/InventorySuperModel"
        },
        "disk_size": {
          "$ref": "#/definitions/InventorySuperModel"
        },
        "memory_size": {
          "$ref": "#/definitions/InventorySuperModel"
        },
        "platform_model": {
          "$ref": "#/definitions/InventorySuperModel"
        },
        "interface_speeds": {
          "$ref": "#/definitions/InventorySuperModel"
        }
      }
    },
    "InventorySuperModel": {
      "type": "object",
      "required": [
        "data",
        "label"
      ],
      "properties": {
        "label": {
          "type": "string"
        },
        "data": {
          "type": "array",
          "items": {
            "$ref": "#/definitions/InventoryModel"
          }
        }
      }
    },
    "IteratorEntryStringJsonNode": {
      "type": "object"
    },
    "IteratorJsonNode": {
      "type": "object"
    },
    "IteratorString": {
      "type": "object"
    },
    "JsonNode": {
      "type": "object",
      "required": [
        "array",
        "bigDecimal",
        "bigInteger",
        "bigIntegerValue",
        "binary",
        "binaryValue",
        "boolean",
        "booleanValue",
        "containerNode",
        "decimalValue",
        "double",
        "doubleValue",
        "elements",
        "fieldNames",
        "fields",
        "floatingPointNumber",
        "int",
        "intValue",
        "integralNumber",
        "long",
        "longValue",
        "missingNode",
        "null",
        "number",
        "numberType",
        "numberValue",
        "object",
        "pojo",
        "textValue",
        "textual",
        "valueAsBoolean",
        "valueAsDouble",
        "valueAsInt",
        "valueAsLong",
        "valueAsText",
        "valueNode"
      ],
      "properties": {
        "elements": {
          "$ref": "#/definitions/IteratorJsonNode"
        },
        "fieldNames": {
          "$ref": "#/definitions/IteratorString"
        },
        "binary": {
          "type": "boolean"
        },
        "intValue": {
          "type": "integer",
          "format": "int32"
        },
        "object": {
          "type": "boolean"
        },
        "int": {
          "type": "boolean"
        },
        "long": {
          "type": "boolean"
        },
        "double": {
          "type": "boolean"
        },
        "bigDecimal": {
          "type": "boolean"
        },
        "bigInteger": {
          "type": "boolean"
        },
        "textual": {
          "type": "boolean"
        },
        "boolean": {
          "type": "boolean"
        },
        "valueNode": {
          "type": "boolean"
        },
        "containerNode": {
          "type": "boolean"
        },
        "missingNode": {
          "type": "boolean"
        },
        "pojo": {
          "type": "boolean"
        },
        "number": {
          "type": "boolean"
        },
        "integralNumber": {
          "type": "boolean"
        },
        "floatingPointNumber": {
          "type": "boolean"
        },
        "numberValue": {
          "$ref": "#/definitions/Number"
        },
        "numberType": {
          "type": "string",
          "enum": [
            "INT",
            "LONG",
            "BIG_INTEGER",
            "FLOAT",
            "DOUBLE",
            "BIG_DECIMAL"
          ]
        },
        "longValue": {
          "type": "integer",
          "format": "int64"
        },
        "bigIntegerValue": {
          "type": "integer"
        },
        "doubleValue": {
          "type": "number",
          "format": "double"
        },
        "decimalValue": {
          "type": "number"
        },
        "booleanValue": {
          "type": "boolean"
        },
        "binaryValue": {
          "type": "array",
          "items": {
            "type": "string",
            "format": "byte",
            "pattern": "^(?:[A-Za-z0-9+/]{4})*(?:[A-Za-z0-9+/]{2}==|[A-Za-z0-9+/]{3}=)?$"
          }
        },
        "valueAsInt": {
          "type": "integer",
          "format": "int32"
        },
        "valueAsLong": {
          "type": "integer",
          "format": "int64"
        },
        "valueAsDouble": {
          "type": "number",
          "format": "double"
        },
        "valueAsBoolean": {
          "type": "boolean"
        },
        "textValue": {
          "type": "string"
        },
        "valueAsText": {
          "type": "string"
        },
        "array": {
          "type": "boolean"
        },
        "fields": {
          "$ref": "#/definitions/IteratorEntryStringJsonNode"
        },
        "null": {
          "type": "boolean"
        }
      }
    },
    "LLDP": {
      "description": "This model contains descriptions of the data collected and returned by the LLDP endpoint.",
      "type": "object",
      "required": [
        "schema"
      ],
      "properties": {
        "schema": {
          "$ref": "#/definitions/Schema"
        },
        "opid": {
          "type": "integer",
          "format": "int32",
          "description": "Internal use only"
        },
        "hostname": {
          "type": "string",
          "description": "User-defined name for the host device"
        },
        "ifname": {
          "type": "string",
          "description": "Name of the host interface where the LLDP service is running"
        },
        "timestamp": {
          "type": "integer",
          "format": "int64",
          "description": "Date and time that the session was started, deleted, updated, or marked dead (device is down)"
        },
        "peer_hostname": {
          "type": "string",
          "description": "User-defined name for the peer device"
        },
        "peer_ifname": {
          "type": "string",
          "description": "Name of the peer interface where the session is running"
        },
        "lldp_peer_bridge": {
          "type": "boolean",
          "description": "Indicates whether the peer device is a bridge (true) or not (false)"
        },
        "lldp_peer_router": {
          "type": "boolean",
          "description": "Indicates whether the peer device is a router (true) or not (false)"
        },
        "lldp_peer_station": {
          "type": "boolean",
          "description": "Indicates whether the peer device is a station (true) or not (false)"
        },
        "lldp_peer_os": {
          "type": "string",
          "description": "Operating system (OS) used by peer device. Values include Cumulus Linux, RedHat, Ubuntu, and CentOS."
        },
        "lldp_peer_osv": {
          "type": "string",
          "description": "Version of the OS used by peer device. Example values include 3.7.3, 2.5.x, 16.04, 7.1."
        },
        "db_state": {
          "type": "string",
          "description": "Internal use only"
        }
      }
    },
    "LogicalType": {
      "type": "object",
      "required": [
        "name"
      ],
      "properties": {
        "name": {
          "type": "string"
        }
      }
    },
    "LoginRequest": {
      "description": "User-entered credentials used to validate if user is allowed to access NetQ",
      "type": "object",
      "required": [
        "password",
        "username"
      ],
      "properties": {
        "username": {
          "type": "string"
        },
        "password": {
          "type": "string"
        }
      }
    },
    "LoginResponse": {
      "description": "Response to user login request",
      "type": "object",
      "required": [
        "id"
      ],
      "properties": {
        "terms_of_use_accepted": {
          "type": "boolean",
          "description": "Indicates whether user has accepted the terms of use"
        },
        "access_token": {
          "type": "string",
          "description": "Grants jason web token (jwt) access token. The access token also contains the NetQ Platform or Appliance (opid) which the user is permitted to access. By default, it is the primary opid given by the user."
        },
        "expires_at": {
          "type": "integer",
          "format": "int64",
          "description": "Number of hours the access token is valid before it automatically expires, epoch miliseconds. By default, tokens are valid for 24 hours."
        },
        "id": {
          "type": "string"
        },
        "premises": {
          "type": "array",
          "description": "List of premises that this user is authorized to view",
          "items": {
            "$ref": "#/definitions/Premises"
          }
        },
        "customer_id": {
          "type": "integer",
          "format": "int32",
          "description": "customer id of this user"
        }
      }
    },
    "MacFdb": {
      "description": "This model contains descriptions of the data collected and returned by the MacFdb endpoint.",
      "type": "object",
      "required": [
        "schema"
      ],
      "properties": {
        "schema": {
          "$ref": "#/definitions/Schema"
        },
        "opid": {
          "type": "integer",
          "format": "int32",
          "description": "Internal use only"
        },
        "hostname": {
          "type": "string",
          "description": "User-defined name for a device"
        },
        "mac_address": {
          "type": "string",
          "description": "Media access control address for a device reachable via the local bridge member port 'nexthop' or via remote VTEP with IP address of 'dst'"
        },
        "timestamp": {
          "type": "integer",
          "format": "int64",
          "description": "Date and time data was collected"
        },
        "dst": {
          "type": "string",
          "description": "IP address of a remote VTEP from which this MAC address is reachable"
        },
        "nexthop": {
          "type": "string",
          "description": "Interface where the MAC address can be reached"
        },
        "is_remote": {
          "type": "boolean",
          "description": "Indicates if the MAC address is reachable locally on 'nexthop' (false) or remotely via a VTEP with address 'dst' (true)"
        },
        "port": {
          "type": "string",
          "description": "Currently unused"
        },
        "vlan": {
          "type": "integer",
          "format": "int32",
          "description": "Name of associated VLAN"
        },
        "is_static": {
          "type": "boolean",
          "description": "Indicates if the MAC address is a static address (true) or dynamic address (false)"
        },
        "origin": {
          "type": "boolean",
          "description": "Indicates whether the MAC address is one of the host's interface addresses (true) or not (false)"
        },
        "active": {
          "type": "boolean",
          "description": "Currently unused"
        },
        "db_state": {
          "type": "string",
          "description": "Internal use only"
        }
      }
    },
    "MstpInfo": {
      "description": "This model contains descriptions of the data collected and returned by the MSTP endpoint.",
      "type": "object",
      "required": [
        "schema"
      ],
      "properties": {
        "schema": {
          "$ref": "#/definitions/Schema"
        },
        "opid": {
          "type": "integer",
          "format": "int32",
          "description": "Internal use only"
        },
        "hostname": {
          "type": "string",
          "description": "User-defined name for a device"
        },
        "bridge_name": {
          "type": "string",
          "description": "User-defined name for a bridge"
        },
        "timestamp": {
          "type": "integer",
          "format": "int64",
          "description": "Date and time data was collected"
        },
        "db_state": {
          "type": "string",
          "description": "Internal use only"
        },
        "state": {
          "type": "boolean",
          "description": "Indicates whether MSTP is enabled (true) or not (false)"
        },
        "root_port_name": {
          "type": "string",
          "description": "Name of the physical interface (port) that provides the minimum cost path from the Bridge to the MSTI Regional Root"
        },
        "root_bridge": {
          "type": "string",
          "description": "Name of the CIST root for the bridged LAN"
        },
        "topo_chg_ports": {
          "type": "string",
          "description": "Names of ports that were part of the last topology change event"
        },
        "time_since_tcn": {
          "type": "integer",
          "format": "int64",
          "description": "Amount of time, in seconds, since the last topology change notification"
        },
        "topo_chg_cntr": {
          "type": "integer",
          "format": "int64",
          "description": "Number of times topology change notifications have been sent"
        },
        "bridge_id": {
          "type": "string",
          "description": "Spanning Tree bridge identifier for current host"
        },
        "edge_ports": {
          "type": "string",
          "description": "List of port names that are Spanning Tree edge ports"
        },
        "network_ports": {
          "type": "string",
          "description": "List of port names that are Spanning Tree network ports"
        },
        "disputed_ports": {
          "type": "string",
          "description": "List of port names that are in Spanning Tree dispute state"
        },
        "bpduguard_ports": {
          "type": "string",
          "description": "List of port names where BPDU Guard is enabled"
        },
        "bpduguard_err_ports": {
          "type": "string",
          "description": "List of port names where BPDU Guard violation occurred"
        },
        "ba_inconsistent_ports": {
          "type": "string",
          "description": "List of port names where Spanning Tree Bridge Assurance is failing"
        },
        "bpdufilter_ports": {
          "type": "string",
          "description": "List of port names where Spanning Tree BPDU Filter is enabled"
        },
        "ports": {
          "type": "string",
          "description": "List of port names in the Spanning Tree instance"
        },
        "is_vlan_filtering": {
          "type": "boolean",
          "description": "Indicates whether the bridge is enabled with VLAN filtering (is VLAN-aware) (true) or not (false)"
        }
      }
    },
    "Neighbor": {
      "description": "This model contains descriptions of the data collected and returned by the Neighbor endpoint.",
      "type": "object",
      "required": [
        "schema"
      ],
      "properties": {
        "schema": {
          "$ref": "#/definitions/Schema"
        },
        "opid": {
          "type": "integer",
          "format": "int32",
          "description": "Internal use only"
        },
        "hostname": {
          "type": "string",
          "description": "User-defined name of a device"
        },
        "ifname": {
          "type": "string",
          "description": "User-defined name of an software interface on a device"
        },
        "timestamp": {
          "type": "integer",
          "format": "int64",
          "description": "Date and time when data was collected"
        },
        "vrf": {
          "type": "string",
          "description": "Name of virtual route forwarding (VRF) interface, when applicable"
        },
        "is_remote": {
          "type": "boolean",
          "description": "Indicates if the neighbor is reachable through a local interface (false) or remotely through a ??? (true)"
        },
        "ifindex": {
          "type": "integer",
          "format": "int32",
          "description": "IP address index for the neighbor device"
        },
        "mac_address": {
          "type": "string",
          "description": "MAC address for the neighbor device"
        },
        "is_ipv6": {
          "type": "boolean",
          "description": "Indicates whether the neighbor's IP address is version six (IPv6) (true) or version four (IPv4) (false)"
        },
        "message_type": {
          "type": "string",
          "description": "Network protocol or service identifier used in neighbor-related events. Value is neighbor."
        },
        "ip_address": {
          "type": "string",
          "description": "IPv4 or IPv6 address for the neighbor device"
        },
        "db_state": {
          "type": "string",
          "description": "Internal use only"
        }
      }
    },
    "NODE": {
      "description": "This model contains descriptions of the data collected and returned by the Node endpoint.",
      "type": "object",
      "required": [
        "schema"
      ],
      "properties": {
        "schema": {
          "$ref": "#/definitions/Schema"
        },
        "opid": {
          "type": "integer",
          "format": "int32",
          "description": "Internal use only"
        },
        "hostname": {
          "type": "string",
          "description": "User-defined name of the device"
        },
        "sys_uptime": {
          "type": "integer",
          "format": "int64",
          "description": "Amount of time, in seconds???, this device has been powered up"
        },
        "lastboot": {
          "type": "integer",
          "format": "int64",
          "description": "Date and time, in EPOCH format???, this device was last booted"
        },
        "last_reinit": {
          "type": "integer",
          "format": "int64",
          "description": "Date and time, in xxx????, this device was last initialized"
        },
        "active": {
          "type": "boolean",
          "description": "Indicates whether this device is active(???) (true) or not (false)"
        },
        "version": {
          "type": "string",
          "description": "????"
        },
        "ntp_state": {
          "type": "string",
          "description": "Status of the NTP service running on this device; in sync, not in sync, or unknown"
        },
        "timestamp": {
          "type": "integer",
          "format": "int64",
          "description": "Date and time data was collected"
        },
        "last_update_time": {
          "type": "integer",
          "format": "int64",
          "description": "Date and time the device was last updated"
        },
        "db_state": {
          "type": "string",
          "description": "Internal use only"
        }
      }
    },
    "NTP": {
      "description": "This model contains descriptions of the data collected and returned by the NTP endpoint.",
      "type": "object",
      "required": [
        "schema"
      ],
      "properties": {
        "schema": {
          "$ref": "#/definitions/Schema"
        },
        "hostname": {
          "type": "string",
          "description": "User-defined name of device running NTP service"
        },
        "opid": {
          "type": "integer",
          "format": "int32",
          "description": "Internal use only"
        },
        "timestamp": {
          "type": "integer",
          "format": "int64",
          "description": "Date and time data was collected"
        },
        "ntp_sync": {
          "type": "string",
          "description": "Status of the NTP service running on this device; in sync, not in sync, or unknown"
        },
        "stratum": {
          "type": "integer",
          "format": "int32",
          "description": "????"
        },
        "ntp_app": {
          "type": "string",
          "description": "Name/release? of the NTP service????"
        },
        "message_type": {
          "type": "string",
          "description": "Network protocol or service identifier used in NTP-related events. Value is ntp."
        },
        "current_server": {
          "type": "string",
          "description": "Name or address of server providing time synchronization"
        },
        "active": {
          "type": "boolean",
          "description": "Indicates whether NTP service is running (true) or not (false)"
        },
        "db_state": {
          "type": "string",
          "description": "Internal use only"
        }
      }
    },
    "Number": {
      "type": "object",
      "description": "????"
    },
    "Port": {
      "description": "This model contains descriptions of the data collected and returned by the Port endpoint.",
      "type": "object",
      "required": [
        "schema"
      ],
      "properties": {
        "schema": {
          "$ref": "#/definitions/Schema"
        },
        "opid": {
          "type": "integer",
          "format": "int32",
          "description": "Internal use only"
        },
        "hostname": {
          "type": "string",
          "description": "User-defined name for the device with this port"
        },
        "ifname": {
          "type": "string",
          "description": "User-defined name for the software interface on this port"
        },
        "timestamp": {
          "type": "integer",
          "format": "int64",
          "description": "Date and time data was collected"
        },
        "speed": {
          "type": "string",
          "description": "Maximum rating for port. Examples include 10G, 25G, 40G, unknown."
        },
        "identifier": {
          "type": "string",
          "description": "Identifies type of port module if installed. Example values include empty, QSFP+, SFP, RJ45"
        },
        "autoneg": {
          "type": "string",
          "description": "Indicates status of the auto-negotiation feature. Values include on and off."
        },
        "db_state": {
          "type": "string",
          "description": "Internal use only"
        },
        "transreceiver": {
          "type": "string",
          "description": "Name of installed transceiver. Example values include 40G Base-CR4, 10Gtek."
        },
        "connector": {
          "type": "string",
          "description": "Name of installed connector. Example values include LC, copper pigtail, RJ-45, n/a."
        },
        "vendor_name": {
          "type": "string",
          "description": "Name of the port vendor. Example values include OEM, Mellanox, Amphenol, Finisar, Fiberstore, n/a."
        },
        "part_number": {
          "type": "string",
          "description": "Manufacturer part number"
        },
        "serial_number": {
          "type": "string",
          "description": "Manufacturer serial number"
        },
        "length": {
          "type": "string",
          "description": "Length of cable connected (or length the transceiver can transmit or ????). Example values include 1m, 2m, n/a."
        },
        "supported_fec": {
          "type": "string",
          "description": "List of forward error correction (FEC) algorithms supported on this port. Example values include BaseR, RS, Not reported, None."
        },
        "advertised_fec": {
          "type": "string",
          "description": "Type of FEC advertised by this port"
        },
        "fec": {
          "type": "string",
          "description": "????"
        },
        "message_type": {
          "type": "string",
          "description": "Network protocol or service identifier used in port-related events. Value is port."
        },
        "state": {
          "type": "string",
          "description": "Status of the port, either up or down."
        }
      }
    },
    "Premises": {
      "type": "object",
      "required": [
        "name",
        "opid"
      ],
      "properties": {
        "opid": {
          "type": "integer",
          "format": "int32"
        },
        "name": {
          "type": "string"
        }
      },
      "description": "Premises"
    },
    "Route": {
      "description": "This module contains descirptions of the data collected and returned by the Route endpoint.",
      "type": "object",
      "required": [
        "schema"
      ],
      "properties": {
        "schema": {
          "$ref": "#/definitions/Schema"
        },
        "opid": {
          "type": "integer",
          "format": "int32",
          "description": "Internal use only"
        },
        "hostname": {
          "type": "string",
          "description": "User-defined name for a device"
        },
        "timestamp": {
          "type": "integer",
          "format": "int64",
          "description": "Date and time data was collected"
        },
        "vrf": {
          "type": "string",
          "description": "Name of associated virtual route forwarding (VRF) interface, if applicable"
        },
        "message_type": {
          "type": "string",
          "description": "Network protocol or service identifier used in route-related events. Value is route."
        },
        "db_state": {
          "type": "string",
          "description": "Internal use only"
        },
        "is_ipv6": {
          "type": "boolean",
          "description": "Indicates whether the IP address for this route is an IPv6 address (true) or an IPv4 address (false)"
        },
        "rt_table_id": {
          "type": "integer",
          "format": "int32",
          "description": "Routing table identifier for this route"
        },
        "src": {
          "type": "string",
          "description": "Hostname?? of device where this route originated"
        },
        "nexthops": {
          "type": "string",
          "description": "List of hostnames/interfaces/ports remaining to reach destination????"
        },
        "route_type": {
          "type": "integer",
          "format": "int32",
          "description": "????"
        },
        "origin": {
          "type": "boolean",
          "description": "Indicates whether the source of this route is on the  device indicated by 'hostname'????"
        },
        "protocol": {
          "type": "string",
          "description": "Protocol used for routing. Example values include BGP, ????"
        },
        "prefix": {
          "type": "string",
          "description": "Address prefix for this route"
        }
      }
    },
    "Schema": {
      "type": "object",
      "required": [
        "aliases",
        "doc",
        "elementType",
        "enumSymbols",
        "error",
        "fields",
        "fixedSize",
        "fullName",
        "hashCode",
        "jsonProps",
        "logicalType",
        "name",
        "namespace",
        "objectProps",
        "props",
        "type",
        "types",
        "valueType"
      ],
      "properties": {
        "props": {
          "type": "object",
          "additionalProperties": {
            "type": "string"
          }
        },
        "type": {
          "type": "string",
          "enum": [
            "RECORD",
            "ENUM",
            "ARRAY",
            "MAP",
            "UNION",
            "FIXED",
            "STRING",
            "BYTES",
            "INT",
            "LONG",
            "FLOAT",
            "DOUBLE",
            "BOOLEAN",
            "NULL"
          ]
        },
        "logicalType": {
          "$ref": "#/definitions/LogicalType"
        },
        "hashCode": {
          "type": "integer",
          "format": "int32"
        },
        "elementType": {
          "$ref": "#/definitions/Schema"
        },
        "aliases": {
          "type": "array",
          "uniqueItems": true,
          "items": {
            "type": "string"
          }
        },
        "namespace": {
          "type": "string"
        },
        "fields": {
          "type": "array",
          "items": {
            "$ref": "#/definitions/Field"
          }
        },
        "types": {
          "type": "array",
          "items": {
            "$ref": "#/definitions/Schema"
          }
        },
        "fullName": {
          "type": "string"
        },
        "enumSymbols": {
          "type": "array",
          "items": {
            "type": "string"
          }
        },
        "doc": {
          "type": "string"
        },
        "valueType": {
          "$ref": "#/definitions/Schema"
        },
        "fixedSize": {
          "type": "integer",
          "format": "int32"
        },
        "name": {
          "type": "string"
        },
        "error": {
          "type": "boolean"
        },
        "jsonProps": {
          "type": "object",
          "additionalProperties": {
            "$ref": "#/definitions/JsonNode"
          }
        },
        "objectProps": {
          "type": "object",
          "additionalProperties": {
            "type": "object",
            "properties": {}
          }
        }
      }
    },
    "Sensor": {
      "description": "This model contains descriptions of the data collected and returned from the Sensor endpoint.",
      "type": "object",
      "required": [
        "schema"
      ],
      "properties": {
        "schema": {
          "$ref": "#/definitions/Schema"
        },
        "opid": {
          "type": "integer",
          "format": "int32",
          "description": "Internal use only"
        },
        "hostname": {
          "type": "string",
          "description": "User-defined name of the device where the sensor resides"
        },
        "timestamp": {
          "type": "integer",
          "format": "int64",
          "description": "Date and time data was collected"
        },
        "s_prev_state": {
          "type": "string",
          "description": "Previous state of a fan or power supply unit (PSU) sensor. Values include OK, absent, and bad."
        },
        "s_name": {
          "type": "string",
          "description": "Type of sensor. Values include fan, psu, temp.????"
        },
        "s_state": {
          "type": "string",
          "description": "Current state of a fan or power supply unit (PSU) sensor. Values include OK, absent, and bad."
        },
        "s_input": {
          "type": "number",
          "format": "float",
          "description": "????"
        },
        "message_type": {
          "type": "string",
          "description": "Network protocol or service identifier used in sensor-related events. Value is sensor."
        },
        "s_msg": {
          "type": "string",
          "description": "Sensor message????"
        },
        "s_desc": {
          "type": "string",
          "description": "User-defined name of sensor. Example values include fan1, fan-2, psu1, psu02, psu1temp1, temp2. ????"
        },
        "s_max": {
          "type": "integer",
          "format": "int32",
          "description": "Current maximum temperature threshold value"
        },
        "s_min": {
          "type": "integer",
          "format": "int32",
          "description": "Current minimum temperature threshold value"
        },
        "s_crit": {
          "type": "integer",
          "format": "int32",
          "description": "Current critical high temperature threshold value"
        },
        "s_lcrit": {
          "type": "integer",
          "format": "int32",
          "description": "Current critical low temperature threshold value"
        },
        "db_state": {
          "type": "string",
          "description": "Internal use only"
        },
        "active": {
          "type": "boolean",
          "description": "Indicates whether the identified sensor is operating (true) or not (false)"
        },
        "deleted": {
          "type": "boolean",
          "description": "Indicates whether the sensor ???? has been deleted (true) or not (false)"
        }
      }
    },
    "Services": {
      "description": "This model contains descriptions of the data collected and returned from the Sensor endpoint.",
      "type": "object",
      "required": [
        "schema"
      ],
      "properties": {
        "schema": {
          "$ref": "#/definitions/Schema"
        },
        "opid": {
          "type": "integer",
          "format": "int32",
          "description": "Internal use only"
        },
        "hostname": {
          "type": "string",
          "description": "User-defined name of the device where the network services are running."
        },
        "name": {
          "type": "string",
          "description": "Name of the service; for example, BGP, OSPF, LLDP, NTP, and so forth."
        },
        "vrf": {
          "type": "string",
          "description": "Name of the Virtual Route Forwarding (VRF) interface if employed."
        },
        "timestamp": {
          "type": "integer",
          "format": "int64",
          "description": "Date and time data was collected"
        },
        "is_enabled": {
          "type": "boolean",
          "description": "Indicates whether the network service is enabled."
        },
        "is_active": {
          "type": "boolean",
          "description": "Indicates whether the network service is currently active."
        },
        "is_monitored": {
          "type": "boolean",
          "description": "Indicates whether the network service is currently being monitored."
        },
        "status": {
          "type": "integer",
          "format": "int32",
          "description": "Status of the network service connection; up or down."
        },
        "start_time": {
          "type": "integer",
          "format": "int64",
          "description": "Date and time that the network service was most recently started."
        },
        "db_state": {
          "type": "string",
          "description": "Internal use only"
        }
      }
    },
    "Vlan": {
      "description": "This model contains descriptions of the data collected and returned by the VLAN endpoint.",
      "type": "object",
      "required": [
        "schema"
      ],
      "properties": {
        "schema": {
          "$ref": "#/definitions/Schema"
        },
        "opid": {
          "type": "integer",
          "format": "int32",
          "description": "Internal use only"
        },
        "hostname": {
          "type": "string",
          "description": "User-defined name for a device"
        },
        "ifname": {
          "type": "string",
          "description": "User-defined name for a software interface"
        },
        "timestamp": {
          "type": "integer",
          "format": "int64",
          "description": "Date and time data was collected"
        },
        "last_changed": {
          "type": "integer",
          "format": "int64",
          "description": "Date and time the VLAN configuration was changed (updated, deleted,???)"
        },
        "vlans": {
          "type": "string",
          "description": "List of other VLANs known to this VLAN or on this device????"
        },
        "svi": {
          "type": "string",
          "description": "Switch virtual interface (SVI) associated with this VLAN"
        },
        "db_state": {
          "type": "string",
          "description": "Internal use only"
        },
        "ports": {
          "type": "string",
          "description": "Names of ports on the device associated with this VLAN"
        }
      }
    }
  }
}

Cumulus NetQ UI User Guide

This guide is intended for network administrators and operators who are responsible for monitoring and troubleshooting the network in their data center environment. NetQ 2.x offers the ability to easily monitor and manage your data center network infrastructure and operational health. This guide provides instructions and information about monitoring individual components of the network, the network as a whole, and the NetQ software itself using the NetQ graphical user interface (GUI). If you prefer to use a command line interface, refer to the Cumulus NetQ CLI User Guide.

NetQ User Interface Overview

The NetQ 2.x graphical user interface (UI) enables you to access NetQ capabilities through a web browser as opposed to through a terminal window using the Command Line Interface (CLI). Visual representations of the health of the network, inventory, and system events make it easy to both find faults and misconfigurations, and to fix them.

The UI is accessible in both on-site and in-cloud deployments. It is supported on Google Chrome. Other popular browsers may be used, but have not been tested and may have some presentation issues.

Before you get started, you should refer to the release notes for this version.

Access the NetQ UI

Logging in to the NetQ UI is as easy as opening any web page.

To log in to the UI:

  1. Open a new Internet browser window or tab.

  2. Enter the following URL into the Address bar for the on-site NetQ Platform/NetQ Appliance or the NetQ Cloud Appliance:

  3. Enter your username and then your password:

    • NetQ Platform: admin, admin by default
    • NetQ Appliance: cumulus, CumulusLinux! by default
    • NetQ Cloud Appliance: Use credentials provided by Cumulus via email titled Welcome to Cumulus NetQ! and accept the terms of use.

    On your first login, the default Cumulus Workbench opens, with your username shown in the upper right corner of the application. The NetQ Cloud UI has a Premises list in the application header, but is otherwise the same. On future logins, the last workbench that you were viewing is displayed.

To log out of the UI:

  1. Click the user icon at the top right of the application.

  2. Select Log Out.

Application Layout

The NetQ UI contains two main areas:

Found in the application header, click to open the main menu which provides navigation to:

Recent Actions

Found in the header, Recent Actions keeps track of every action you take on your workbench and then saves each action with a timestamp. This enables you to go back to a previous state or repeat an action.

To open Recent Actions, click . Click on any of the actions to perform that action again.

The Global Search field in the UI header enables you to search for devices. It behaves like most searches and can help you quickly find device information. For more detail on creating and running searches, refer to Create and Run Searches.

Clicking on the Cumulus logo takes you to your favorite workbench. For details about specifying your favorite workbench, refer to Set User Preferences.

Quick Network Health View

Found in the header, the graph and performance rating provide a view into the health of your network at a glance.

On initial start up of the application, it may take up to an hour to reach an accurate health indication as some processes run every 30 minutes.

Workbenches

A workbench is comprised of a given set of cards. A pre-configured default workbench, Cumulus Workbench, is available to get you started. It contains Device Inventory, Switch Inventory, Alarm and Info Events, and Network Health cards. On initial login, this workbench is opened. You can create your own workbenches and add or remove cards to meet your particular needs. For more detail about managing your data using workbenches, refer to Focus Your Monitoring Using Workbenches.

Cards

Cards present information about your network for monitoring and troubleshooting. This is where you can expect to spend most of your time. Each card describes a particular aspect of the network. Cards are available in multiple sizes, from small to full screen. The level of the content on a card varies in accordance with the size of the card, with the highest level of information on the smallest card to the most detailed information on the full-screen view. Cards are collected onto a workbench where you see all of the data relevant to a task or set of tasks. You can add and remove cards from a workbench, move between cards and card sizes, and make copies of cards to show different levels of data at the same time. For details about working with cards, refer to Access Data with Cards.

User Settings

Each user can customize the NetQ application display, change their account password, and manage their workbenches. This is all performed from User Settings > Profile & Preferences. For details, refer to Set User Preferences.

Format Cues

Color is used to indicate links, options, and status within the UI.

ItemColor
Hover on itemBlue
Clickable itemBlack
Selected itemGreen
Highlighted itemBlue
LinkBlue
Good/Successful resultsGreen
Result with critical severity eventPink
Result with high severity eventRed
Result with medium severity eventOrange
Result with low severity eventYellow

Create and Run Searches

The Global Search field in the UI header enables you to search for devices or cards. You can create new searches or run existing searches.

As with most search fields, simply begin entering the criteria in the search field. As you type, items that match the search criteria are shown in the search history dropdown along with the last time the search was viewed. Wildcards are not allowed, but this predictive matching eliminates the need for them. By default, the most recent searches are shown. If more have been performed, they can be accessed. This may provide a quicker search by reducing entry specifics and suggesting recent searches. Selecting a suggested search from the list provides a preview of the search results to the right.

To create a new search:

  1. Click in the Global Search field.

  2. Enter your search criteria.

  3. Click the device hostname or card workflow in the search list to open the associated information.

    If you have more matches than fit in the window, click the See All # Results link to view all found matches. The count represents the number of devices found. It does not include cards found.

You can re-run a recent search, saving time if you are comparing data from two or more devices.

To re-run a recent search:

  1. Click in the Global Search field.

  2. When the desired search appears in the suggested searches list, select it.

    You may need to click See All # Results to find the desired search. If you do not find it in the list, you may still be able to find it in the Recent Actions list.

Focus Your Monitoring Using Workbenches

Workbenches are an integral structure of the Cumulus NetQ application. They are where you collect and view the data that is important to you.

There are two types of workbenches:

Both types of workbenches display a set of cards. Default workbenches are public (available for viewing by all users), whereas Custom workbenches are private (only viewable by user who created them).

Default Workbenches

In this release, only one default workbench is available, the Cumulus Workbench, to get you started. It contains Device Inventory, Switch Inventory, Alarm and Info Events, and Network Health cards, giving you a high-level view of how your network is operating.

On initial login, the Cumulus Workbench is opened. On subsequent logins, the last workbench you had displayed is opened.

Custom Workbenches

Users with either administrative or user roles can create and save as many custom workbenches as suits their needs. For example, a user might create a workbench that:

Create a Workbench

To create a workbench:

  1. Click in the workbench header.

  2. Enter a name for the workbench.

  3. Click Create to open a blank new workbench, or Cancel to discard the workbench.

  4. Add cards to the workbench using or .

Refer to Access Data with Cards for information about interacting with cards on your workbenches.

Remove a Workbench

Once you have created a number of custom workbenches, you might find that you no longer need some of them. As an administrative user, you can remove any workbench, except for the default Cumulus Workbench. Users with a user role can only remove workbenches they have created.

To remove a workbench:

  1. Click in the application header to open the User Settings options.

  2. Click Profile & Preferences.

  3. Locate the Workbenches card.

  4. Hover over the workbench you want to remove, and click Delete.

Open an Existing Workbench

There are several options for opening workbenches:

Manage Auto-Refresh for Your Workbenches

With NetQ 2.3.1 and later, you can specify how often to update the data displayed on your workbenches. Three refresh rates are available:

By default, auto-refresh is enabled and configured to update every 30 seconds.

Disable/Enable Auto-Refresh

To disable or pause auto-refresh of your workbenches, simply click the Refresh icon. This toggles between the two states, Running and Paused, where indicates it is currently disabled and indicates it is currently enabled.

While having the workbenches update regularly is good most of the time, you may find that you want to pause the auto-refresh feature when you are troubleshooting and you do not want the data to change on a given set of cards temporarily. In this case, you can disable the auto-refresh and then enable it again when you are finished.

View Current Settings

To view the current auto-refresh rate and operational status, hover over the Refresh icon on a workbench header, to open the tool tip as follows:

Change Settings

To modify the auto-refresh setting:

  1. Click on the Refresh icon.

  2. Select the refresh rate you want. The refresh rate is applied immediately. A check mark is shown next to the current selection.

Manage Workbenches

To manage your workbenches as a group, either:

Both of these open the Profiles & Preferences page. Look for the Workbenches card and refer to Manage Your Workbenches for more information.

Access Data with Cards

Cards present information about your network for monitoring and troubleshooting. This is where you can expect to spend most of your time. Each card describes a particular aspect of the network. Cards are available in multiple sizes, from small to full screen. The level of the content on a card varies in accordance with the size of the card, with the highest level of information on the smallest card to the most detailed information on the full-screen card. Cards are collected onto a workbench where you see all of the data relevant to a task or set of tasks. You can add and remove cards from a workbench, move between cards and card sizes, change the time period of the data shown on a card, and make copies of cards to show different levels of data at the same time.

Card Sizes

The various sizes of cards enables you to view your content at just the right level. For each aspect that you are monitoring there is typically a single card, that presents increasing amounts of data over its four sizes. For example, a snapshot of your total inventory may be sufficient, but to monitor the distribution of hardware vendors may requires a bit more space.

Small Cards

Small cards are most effective at providing a quick view of the performance or statistical value of a given aspect of your network. They are commonly comprised of an icon to identify the aspect being monitored, summary performance or statistics in the form of a graph and/or counts, and often an indication of any related events. Other content items may be present. Some examples include a Devices Inventory card, a Switch Inventory card, an Alarm Events card, an Info Events card, and a Network Health card, as shown here:

Medium Cards

Medium cards are most effective at providing the key measurements for a given aspect of your network. They are commonly comprised of an icon to identify the aspect being monitored, one or more key measurements that make up the overall performance. Often additional information is also included, such as related events or components. Some examples include a Devices Inventory card, a Switch Inventory card, an Alarm Events card, an Info Events card, and a Network Health card, as shown here. Compare these with their related small- and large-sized cards.

Large Cards

Large cards are most effective at providing the detailed information for monitoring specific components or functions of a given aspect of your network. These can aid in isolating and resolving existing issues or preventing potential issues. They are commonly comprised of detailed statistics and graphics. Some large cards also have tabs for additional detail about a given statistic or other related information. Some examples include a Devices Inventory card, an Alarm Events card, and a Network Health card, as shown here. Compare these with their related small- and medium-sized cards.

Full-Screen Cards

Full-screen cards are most effective for viewing all available data about an aspect of your network all in one place. When you cannot find what you need in the small, medium, or large cards, it is likely on the full-screen card. Most full-screen cards display data in a grid, or table; however, some contain visualizations. Some examples include All Events card and All Switches card, as shown here.

Card Size Summary

Card Size

Small

Medium

Large

Full Screen

Primary Purpose

  • Quick view of status, typically at the level of good or bad

  • Enable quick actions, run a validation or trace for example

  • View key performance parameters or statistics

  • Perform an action

  • Look for potential issues

  • View detailed performance and statistics

  • Perform actions

  • Compare and review related information

  • View all attributes for given network aspect

  • Free-form data analysis and visualization

  • Export data to third-party tools

Card Workflows

The UI provides a number of card workflows. Card workflows focus on a particular aspect of your network and are a linked set of each size card-a small card, a medium card, one or more large cards, and one or more full screen cards. The following card workflows are available:

Access a Card Workflow

You can access a card workflow in multiple ways:

If you have multiple cards open on your workbench already, you might need to scroll down to see the card you have just added.

To open the card workflow through an existing workbench:

  1. Click in the workbench task bar.

  2. Select the relevant workbench.

    The workbench opens, hiding your previous workbench.

To open the card workflow from Recent Actions:

  1. Click in the application header.

  2. Look for an “Add: <card name>” item.

  3. If it is still available, click the item.

    The card appears on the current workbench, at the bottom.

To access the card workflow by adding the card:

  1. Click in the workbench task bar.

  2. Follow the instructions in Add Cards to Your Workbench or Add Switch Cards to Your Workbench.

    The card appears on the current workbench, at the bottom.

To access the card workflow by searching for the card:

  1. Click in the Global Search field.

  2. Begin typing the name of the card.

  3. Select it from the list.

    The card appears on a current workbench, at the bottom.

Card Interactions

Every card contains a standard set of interactions, including the ability to switch between card sizes, and change the time period of the presented data. Most cards also have additional actions that can be taken, in the form of links to other cards, scrolling, and so forth. The four sizes of cards for a particular aspect of the network are connected into a flow; however, you can have duplicate cards displayed at the different sizes. Cards with tabular data provide filtering, sorting, and export of data. The medium and large cards have descriptive text on the back of the cards.

To access the time period, card size, and additional actions, hover over the card. These options appear, covering the card header, enabling you to select the desired option.

Add Cards to Your Workbench

You can add one or more cards to a workbench at any time. To add Devices|Switches cards, refer to Add Switch Cards to Your Workbench. For all other cards, follow the steps in this section.

To add one or more cards:

  1. Click to open the Cards modal.

  2. Scroll down until you find the card you want to add, or select the category of cards to find the card you want to add.

  3. Click on each card you want to add.

    As you select each card, it is grayed out and a appears on top of it. If you have selected one or more cards using the category option, you can selected another category without losing your current selection. Note that the total number of cards selected for addition to your workbench is noted at the bottom.

    Also note that if you change your mind and do not want to add a particular card you have selected, simply click on it again to remove it from the cards to be added. Note the total number of cards selected decreases with each card you remove.

  4. When you have selected all of the cards you want to add to your workbench, you can confirm which cards have been selected by clicking the Cards Selected link. Modify your selection as needed.

  5. Click Open Cards to add the selected cards, or Cancel to return to your workbench without adding any cards.

The cards are placed at the end of the set of cards currently on the workbench. You might need to scroll down to see them. By default, the medium size of the card is added to your workbench for all except the Validation and Trace cards. These are added in the large size by default. You can rearrange the cards as described in Reposition a Card on Your Workbench.

Add Switch Cards to Your Workbench

You can add switch cards to a workbench at any time. For all other cards, follow the steps in Add Cards to Your Workbench.

To add a switch card:

  1. Click to open the Add Switch Card modal.

  2. Begin entering the hostname of the switch you want to monitor.

  3. Select the device from the suggestions that appear.

    If you attempt to enter a hostname that is unknown to NetQ, a pink border appears around the entry field and you are unable to select Add. Try checking for spelling errors. If you feel your entry is valid, but not an available choice, consult with your network administrator.

  4. Optionally select the small or large size to display instead of the medium size.

  5. Click Add to add the switch card to your workbench, or Cancel to return to your workbench without adding the switch card.

Remove Cards from Your Workbench

Removing cards is handled one card at a time.

To remove a card:

  1. Hover over the card you want to remove.

  2. Click (More Actions menu).

  3. Click Remove.

The card is removed from the workbench, but not from the application.

Change the Time Period for the Card Data

All cards have a default time period for the data shown on the card, typically the last 24 hours. You can change the time period to view the data during a different time range to aid analysis of previous or existing issues.

To change the time period for a card:

  1. Hover over any card.

  2. Click in the header.

  3. Select a time period from the dropdown list.

Changing the time period in this manner only changes the time period for the given card.

Switch to a Different Card Size

You can switch between the different card sizes at any time. Only one size is visible at a time. To view the same card in different sizes, open a second copy of the card.

To change the card size:

  1. Hover over the card.

  2. Hover over the Card Size Picker and move the cursor to the right or left until the desired size option is highlighted.

    Single width opens a small card. Double width opens a medium card. Triple width opens large cards. Full width opens full-screen cards.

  3. Click the Picker.
    The card changes to the selected size, and may move its location on the workbench.

View a Description of the Card Content

When you hover over a medium or large card, the bottom right corner turns up and is highlighted. Clicking the corner turns the card over where a description of the card and any relevant tabs are described. Hover and click again to turn it back to the front side.

Reposition a Card on Your Workbench

You can also move cards around on the workbench, using a simple drag and drop method.

To move a card:

  1. Simply click and drag the card to left or right of another card, next to where you want to place the card.

  2. Release your hold on the card when the other card becomes highlighted with a dotted line. In this example, we are moving the medium Network Health card to the left of the medium Devices Inventory card.

Table Settings

You can manipulate the data in a data grid in a full-screen card in several ways. The available options are displayed above each table. The options vary depending on the card and what is selected in the table.

IconActionDescription
Select AllSelects all items in the list
Clear AllClears all existing selections in the list.
EditEdits the selected item
DeleteRemoves the selected items
FilterFilters the list using available parameters. Refer to Filter Table Data for more detail.
, Generate/Delete AuthKeysCreates or removes NetQ CLI authorization keys
Open CardsOpens the corresponding validation or trace card(s)
ExportExports selected data into either a .csv or JSON-formatted file. Refer to Export Data for more detail.

When there are numerous items in a table, NetQ loads the first 25 by default and provides the rest in additional table pages. In this case, pagination is shown under the table.

From there, you can:

Change Order of Columns

You can rearrange the columns within a table. Click and hold on a column header, then drag it to the location where you want it.

Filter Table Data

The filter option associated with tables on full-screen cards can be used to filter the data by any parameter (column name). The parameters available vary according to the table you are viewing. Some tables offer the ability to filter on more than one parameter.

Tables that Support a Single Filter

Tables that allow a single filter to be applied let you select the parameter and set the value. You can use partial values.

For example, to set the filter to show only BGP sessions using a particular VRF:

  1. Open the full-screen Network Services | All BGP Sessions card.

  2. Click the All Sessions tab.

  3. Click above the table.

  4. Select VRF from the Field dropdown.

  5. Enter the name of the VRF of interest. In our example, we chose vrf1.

  6. Click Apply.

    The filter icon displays a red dot to indicate filters are applied.

  7. To remove the filter, click (with the red dot).

  8. Click Clear.

  9. Close the Filters dialog by clicking .

Tables that Support Multiple Filters

For tables that offer filtering by multiple parameters, the Filter dialog is slightly different. For example, to filter the list of IP Addresses in your system by hostname and interface:

  1. Click .

  2. Select IP Addresses under Network.

  3. Click above the table.

  4. Enter a hostname and interface name in the respective fields.

  5. Click Apply.

    The filter icon displays a red dot to indicate filters are applied, and each filter is presented above the table.

  6. To remove a filter, simply click on the filter, or to remove all filters at once, click Clear All Filters.

Export Data

You can export tabular data from a full-screen card to a CSV- or JSON-formatted file.

To export the all data:

  1. Click above the table.

  2. Select the export format.

  3. Click Export to save the file to your downloads directory.

To export selected data:

  1. Select the individual items from the list by clicking in the checkbox next to each item.

  2. Click above the table.

  3. Select the export format.

  4. Click Export to save the file to your downloads directory.

Set User Preferences

Each user can customize the NetQ application display, change his account password, and manage his workbenches.

Configure Display Settings

The Display card contains the options for setting the application theme, language, time zone, and date formats. There are two themes available: a Light theme and a Dark theme (default). The screen captures in this document are all displayed with the Dark theme. English is the only language available for this release. You can choose to view data in the time zone where you or your data center resides. You can also select the date and time format, choosing words or number format and a 12- or 24-hour clock. All changes take effect immediately.

To configure the display settings:

  1. Click in the application header to open the User Settings options.

  2. Click Profile & Preferences.

  3. Locate the Display card.

  4. In the Theme field, click to select your choice of theme. This figure shows the light theme. Switch back and forth as desired.

  5. In the Time Zone field, click to change the time zone from the default.
    By default, the time zone is set to the user’s local time zone. If a time zone has not been selected, NetQ defaults to the current local time zone where NetQ is installed. All time values are based on this setting. This is displayed in the application header, and is based on Greenwich Mean Time (GMT).

    Tip: You can also change the time zone from the header display.

    If your deployment is not local to you (for example, you want to view the data from the perspective of a data center in another time zone) you can change the display to another time zone. The following table presents a sample of time zones:

    Time ZoneDescriptionAbbreviation
    GMT +12New Zealand Standard TimeNST
    GMT +11Solomon Standard TimeSST
    GMT +10Australian Eastern TimeAET
    GMT +9:30Australia Central TimeACT
    GMT +9Japan Standard TimeJST
    GMT +8China Taiwan TimeCTT
    GMT +7Vietnam Standard TimeVST
    GMT +6Bangladesh Standard TimeBST
    GMT +5:30India Standard TimeIST
    GMT+5Pakistan Lahore TimePLT
    GMT +4Near East TimeNET
    GMT +3:30Middle East TimeMET
    GMT +3Eastern African Time/Arab Standard TimeEAT/AST
    GMT +2Eastern European TimeEET
    GMT +1European Central TimeECT
    GMTGreenwich Mean TimeGMT
    GMT -1Central African TimeCAT
    GMT -2Uruguay Summer TimeUYST
    GMT -3Argentina Standard/Brazil Eastern TimeAGT/BET
    GMT -4Atlantic Standard Time/Puerto Rico TimeAST/PRT
    GMT -5Eastern Standard TimeEST
    GMT -6Central Standard TimeCST
    GMT -7Mountain Standard TimeMST
    GMT -8Pacific Standard TimePST
    GMT -9Alaskan Standard TimeAST
    GMT -10Hawaiian Standard TimeHST
    GMT -11Samoa Standard TimeSST
    GMT -12New Zealand Standard TimeNST
  6. In the Date Format field, select the data and time format you want displayed on the cards.

    The four options include the date displayed in words or abbreviated with numbers, and either a 12- or 24-hour time representation. The default is the third option.

  7. Return to your workbench by clicking and selecting a workbench from the NetQ list.

Change Your Password

You can change your account password at any time should you suspect someone has hacked your account or your administrator requests you to do so.

To change your password:

  1. Click in the application header to open the User Settings options.

  2. Click Profile & Preferences.

  3. Locate the Basic Account Info card.

  4. Click Change Password.

  5. Enter your current password.

  6. Enter and confirm a new password.

  7. Click Save to change to the new password, or click Cancel to discard your changes.

  8. Return to your workbench by clicking and selecting a workbench from the NetQ list.

Manage Your Workbenches

You can view all of your workbenches in a list form, making it possible to manage various aspects of them. There are public and private workbenches. Public workbenches are visible by all users. Private workbenches are visible only by the user who created the workbench. From the Workbenches card, you can:

To manage your workbenches:

  1. Click in the application header to open the User Settings options.

  2. Click Profile & Preferences.

  3. Locate the Workbenches card.

  4. To specify a favorite workbench, click to the left of the desired workbench name. is placed there to indicate its status as your favorite workbench.

  5. To search the workbench list by name, access type, and cards present on the workbench, click the relevant header and begin typing your search criteria.

  6. To sort the workbench list, click the relevant header and click .

  7. To delete a workbench, hover over the workbench name to view the Delete button. As an administrator, you can delete both private and public workbenches.

  8. Return to your workbench by clicking and selecting a workbench from the NetQ list.

NetQ Management

As an administrator, you have two major tasks related to managing Cumulus NetQ:

The NetQ UI makes both of these tasks easier than ever.

Application Management

As an administrator, you can manage access to and various application-wide settings for the Cumulus NetQ UI from a single location.

Individual users have the ability to set preferences specific to their workspaces. This information is covered separately. Refer to Set User Preferences.

NetQ Management Workbench

The NetQ Management workbench is accessed from the main menu. For the user(s) responsible for maintaining the application, this is a good place to start each day.

To open the workbench, click , and select Management under the Admin column.

For on-premises deployments, an additional LDAP Server Info card is available. Refer to Integrate NetQ with Your LDAP server for details.

Manage User Accounts

From the NetQ Management workbench, you can view the number of users with accounts in the system. As an administrator, you can also add, modify, and delete user accounts using the User Accounts card.

Add New User Account

For each user that monitors at least one aspect of your data center network, a user account is needed. Adding a local user is described here. Refer to Integrate NetQ with Your LDAP server for instructions for adding LDAP users.

To add a new user account:

  1. Click Manage on the User Accounts card to open the User Accounts tab.

  2. Click Add User.

  3. Enter the user’s email address, along with their first and last name.

    Be especially careful entering the email address as you cannot change it once you save the account. If you save a mistyped email address, you must delete the account and create a new one.

  4. Select the user type: Admin or User.

  5. Enter your password in the Admin Password field (only users with administrative permissions can add users).

  6. Create a password for the user.

    1. Enter a password for the user.
    2. Re-enter the user password. If you do not enter a matching password, it will be underlined in red.
  7. Click Save to create the user account, or Cancel to discard the user account.

    By default the User Accounts table is sorted by Role.

  8. Repeat these steps to add all of your users.

Edit a User Name

If a user’s first or last name was incorrectly entered, you can fix them easily.

To change a user name:

  1. Click Manage on the User Accounts card to open the User Accounts tab.

  2. Click the checkbox next to the account you want to edit.

  3. Click above the account list.

  4. Modify the first and/or last name as needed.

  5. Enter your admin password.

  6. Click Save to commit the changes or Cancel to discard them.

Change a User’s Password

Should a user forget his password or for security reasons, you can change a password for a particular user account.

To change a password:

  1. Click Manage on the User Accounts card to open the User Accounts tab.

  2. Click the checkbox next to the account you want to edit.

  3. Click above the account list.

  4. Click Reset Password.

  5. Enter your admin password.

  6. Enter a new password for the user.

  7. Re-enter the user password. Tip: If the password you enter does not match, Save is gray (not activated).

  8. Click Save to commit the change, or Cancel to discard the change.

Change a User’s Access Permissions

If a particular user has only standard user permissions and they need administrator permissions to perform their job (or the opposite, they have administrator permissions, but only need user permissions), you can modify their access rights.

To change access permissions:

  1. Click Manage on the User Accounts card to open the User Accounts tab.

  2. Click the checkbox next to the account you want to edit.

  3. Click above the account list.

  4. Select the appropriate user type from the dropdown list.

  5. Enter your admin password.

  6. Click Save to commit the change, or Cancel to discard the change.

Correct a Mistyped User ID (Email Address)

You cannot edit a user’s email address, because this is the identifier the system uses for authentication. If you need to change an email address, you must create a new one for this user. Refer to Add New User Account. You should delete the incorrect user account. Select the user account, and click .

Export a List of User Accounts

You can export user account information at any time using the User Accounts tab.

To export information for one or more user accounts:

  1. Click Manage on the User Accounts card to open the User Accounts tab.

  2. Select one or more accounts that you want to export by clicking the checkbox next to them. Alternately select all accounts by clicking .

  3. Click to export the selected user accounts.

Delete a User Account

NetQ application administrators should remove user accounts associated with users that are no longer using the application.

To delete one or more user accounts:

  1. Click Manage on the User Accounts card to open the User Accounts tab.

  2. Select one or more accounts that you want to remove by clicking the checkbox next to them.

  3. Click to remove the accounts.

Manage Scheduled Traces

From the NetQ Management workbench, you can view the number of traces scheduled to run in the system. A set of default traces are provided with the NetQ GUI. As an administrator, you can run one or more scheduled traces, add new scheduled traces, and edit or delete existing traces.

Add a Scheduled Trace

You can create a scheduled trace to provide regular status about a particularly important connection between a pair of devices in your network or for temporary troubleshooting.

To add a trace:

  1. Click Manage on the Scheduled Traces card to open the Scheduled Traces tab.

  2. Click Add Trace to open the large New Trace Request card.

  3. Enter source and destination addresses.

    For layer 2 traces, the source must be a hostname and the destination must be a MAC address. For layer 3 traces, the source can be a hostname or IP address, and the destination must be an IP address.

  4. Specify a VLAN for a layer 2 trace or (optionally) a VRF for a layer 3 trace.

  5. Set the schedule for the trace, by selecting how often to run the trace and when to start it the first time.

  6. Click Save As New to add the trace. You are prompted to enter a name for the trace in the Name field.

    If you want to run the new trace right away for a baseline, select the trace you just added from the dropdown list, and click Run Now.

Delete a Scheduled Trace

If you do not want to run a given scheduled trace any longer, you can remove it.

To delete a scheduled trace:

  1. Click Manage on the Scheduled Trace card to open the Scheduled Traces tab.

  2. Select at least one trace by clicking on the checkbox next to the trace.

  3. Click .

Export a Scheduled Trace

You can export a scheduled trace configuration at any time using the Scheduled Traces tab.

To export one or more scheduled trace configurations:

  1. Click Manage on the Scheduled Trace card to open the Scheduled Traces tab.

  2. Select one or more traces by clicking on the checkbox next to the trace. Alternately, click to select all traces.

  3. Click to export the selected traces.

Manage Scheduled Validations

From the NetQ Management workbench, you can view the total number of validations scheduled to run in the system. A set of default scheduled validations are provided and pre-configured with the NetQ UI. These are not included in the total count. As an administrator, you can view and export the configurations for all scheduled validations, or add a new validation.

View Scheduled Validation Configurations

You can view the configuration of a scheduled validation at any time. This can be useful when you are trying to determine if the validation request needs to be modified to produce a slightly different set of results (editing or cloning) or if it would be best to create a new one.

To view the configurations:

  1. Click Manage on the Scheduled Validations card to open the Scheduled Validations tab.

  2. Click in the top right to return to your NetQ Management cards.

Add a Scheduled Validation

You can add a scheduled validation at any time using the Scheduled Validations tab.

To add a scheduled validation:

  1. Click Manage on the Scheduled Validations card to open the Scheduled Validations tab.

  2. Click Add Validation to open the large Validation Request card.

  3. Configure the request. Refer to Validate Network Protocol and Service Operations for details.

Delete Scheduled Validations

You can remove a scheduled validation that you created (one of the 15 allowed) at any time. You cannot remove the default scheduled validations included with NetQ.

To remove a scheduled validation:

  1. Click Manage on the Scheduled Validations card to open the Scheduled Validations tab.

  2. Select one or more validations that you want to delete.

  3. Click above the validations list.

Export Scheduled Validation Configurations

You can export one or more scheduled validation configurations at any time using the Scheduled Validations tab.

To export a scheduled validation:

  1. Click Manage on the Scheduled Validations card to open the Scheduled Validations tab.

  2. Select one or more validations by clicking the checkbox next to the validation. Alternately, click to select all validations.

  3. Click to export selected validations.

Manage Threshold Crossing Rules

NetQ supports a set of events that are triggered by crossing a user-defined threshold. These events allow detection and prevention of network failures for selected interface, utilization, sensor, forwarding, and ACL events.

A notification configuration must contain one rule. Each rule must contain a scope and a threshold.

Supported Events

The following events are supported:

CategoryEvent IDDescription
Interface StatisticsTCA_RXBROADCAST_UPPERrx_broadcast bytes per second on a given switch or host is greater than maximum threshold
Interface StatisticsTCA_RXBYTES_UPPERrx_bytes per second on a given switch or host is greater than maximum threshold
Interface StatisticsTCA_RXMULTICAST_UPPERrx_multicast per second on a given switch or host is greater than maximum threshold
Interface StatisticsTCA_TXBROADCAST_UPPERtx_broadcast bytes per second on a given switch or host is greater than maximum threshold
Interface StatisticsTCA_TXBYTES_UPPERtx_bytes per second on a given switch or host is greater than maximum threshold
Interface StatisticsTCA_TXMULTICAST_UPPERtx_multicast bytes per second on a given switch or host is greater than maximum threshold
Resource UtilizationTCA_CPU_UTILIZATION_UPPERCPU utilization (%) on a given switch or host is greater than maximum threshold
Resource UtilizationTCA_DISK_UTILIZATION_UPPERDisk utilization (%) on a given switch or host is greater than maximum threshold
Resource UtilizationTCA_MEMORY_UTILIZATION_UPPERMemory utilization (%) on a given switch or host is greater than maximum threshold
SensorsTCA_SENSOR_FAN_UPPERSwitch sensor reported fan speed on a given switch or host is greater than maximum threshold
SensorsTCA_SENSOR_POWER_UPPERSwitch sensor reported power (Watts) on a given switch or host is greater than maximum threshold
SensorsTCA_SENSOR_TEMPERATURE_UPPERSwitch sensor reported temperature (°C) on a given switch or host is greater than maximum threshold
SensorsTCA_SENSOR_VOLTAGE_UPPERSwitch sensor reported voltage (Volts) on a given switch or host is greater than maximum threshold
Forwarding ResourcesTCA_TCAM_TOTAL_ROUTE_ENTRIES_UPPERNumber of routes on a given switch or host is greater than maximum threshold
Forwarding ResourcesTCA_TCAM_TOTAL_MCAST_ROUTES_UPPERNumber of multicast routes on a given switch or host is greater than maximum threshold
Forwarding ResourcesTCA_TCAM_MAC_ENTRIES_UPPERNumber of MAC addresses on a given switch or host is greater than maximum threshold
Forwarding ResourcesTCA_TCAM_IPV4_ROUTE_UPPERNumber of IPv4 routes on a given switch or host is greater than maximum threshold
Forwarding ResourcesTCA_TCAM_IPV4_HOST_UPPERNumber of IPv4 hosts on a given switch or host is greater than maximum threshold
Forwarding ResourcesTCA_TCAM_IPV6_ROUTE_UPPERNumber of IPv6 hosts on a given switch or host is greater than maximum threshold
Forwarding ResourcesTCA_TCAM_IPV6_HOST_UPPERNumber of IPv6 hosts on a given switch or host is greater than maximum threshold
Forwarding ResourcesTCA_TCAM_ECMP_NEXTHOPS_UPPERNumber of equal cost multi-path (ECMP) next hop entries on a given switch or host is greater than maximum threshold
ACL ResourcesTCA_TCAM_IN_ACL_V4_FILTER_UPPERNumber of ingress ACL filters for IPv4 addresses on a given switch or host is greater than maximum threshold
ACL ResourcesTCA_TCAM_EG_ACL_V4_FILTER_UPPERNumber of egress ACL filters for IPv4 addresses on a given switch or host is greater than maximum threshold
ACL ResourcesTCA_TCAM_IN_ACL_V4_MANGLE_UPPERNumber of ingress ACL mangles for IPv4 addresses on a given switch or host is greater than maximum threshold
ACL ResourcesTCA_TCAM_EG_ACL_V4_MANGLE_UPPERNumber of egress ACL mangles for IPv4 addresses on a given switch or host is greater than maximum threshold
ACL ResourcesTCA_TCAM_IN_ACL_V6_FILTER_UPPERNumber of ingress ACL filters for IPv6 addresses on a given switch or host is greater than maximum threshold
ACL ResourcesTCA_TCAM_EG_ACL_V6_FILTER_UPPERNumber of egress ACL filters for IPv6 addresses on a given switch or host is greater than maximum threshold
ACL ResourcesTCA_TCAM_IN_ACL_V6_MANGLE_UPPERNumber of ingress ACL mangles for IPv6 addresses on a given switch or host is greater than maximum threshold
ACL ResourcesTCA_TCAM_EG_ACL_V6_MANGLE_UPPERNumber of egress ACL mangles for IPv6 addresses on a given switch or host is greater than maximum threshold
ACL ResourcesTCA_TCAM_IN_ACL_8021x_FILTER_UPPERNumber of ingress ACL 802.1 filters on a given switch or host is greater than maximum threshold
ACL ResourcesTCA_TCAM_ACL_L4_PORT_CHECKERS_UPPERNumber of ACL port range checkers on a given switch or host is greater than maximum threshold
ACL ResourcesTCA_TCAM_ACL_REGIONS_UPPERNumber of ACL regions on a given switch or host is greater than maximum threshold
ACL ResourcesTCA_TCAM_IN_ACL_MIRROR_UPPERNumber of ingress ACL mirrors on a given switch or host is greater than maximum threshold
ACL ResourcesTCA_TCAM_ACL_18B_RULES_UPPERNumber of ACL 18B rules on a given switch or host is greater than maximum threshold
ACL ResourcesTCA_TCAM_ACL_32B_RULES_UPPERNumber of ACL 32B rules on a given switch or host is greater than maximum threshold
ACL ResourcesTCA_TCAM_ACL_54B_RULES_UPPERNumber of ACL 54B rules on a given switch or host is greater than maximum threshold
ACL ResourcesTCA_TCAM_IN_PBR_V4_FILTER_UPPERNumber of ingress policy-based routing (PBR) filters for IPv4 addresses on a given switch or host is greater than maximum threshold
ACL ResourcesTCA_TCAM_IN_PBR_V6_FILTER_UPPERNumber of ingress policy-based routing (PBR) filters for IPv6 addresses on a given switch or host is greater than maximum threshold

Define a Scope

A scope is used to filter the events generated by a given rule. Scope values are set on a per TCA rule basis. All rules can be filtered on Hostname. Some rules can also be filtered by other parameters, as shown in this

CategoryEvent IDScope Parameters
Interface StatisticsTCA_RXBROADCAST_UPPERHostname, Interface
Interface StatisticsTCA_RXBYTES_UPPERHostname, Interface
Interface StatisticsTCA_RXMULTICAST_UPPERHostname, Interface
Interface StatisticsTCA_TXBROADCAST_UPPERHostname, Interface
Interface StatisticsTCA_TXBYTES_UPPERHostname, Interface
Interface StatisticsTCA_TXMULTICAST_UPPERHostname, Interface
Resource UtilizationTCA_CPU_UTILIZATION_UPPERHostname
Resource UtilizationTCA_DISK_UTILIZATION_UPPERHostname
Resource UtilizationTCA_MEMORY_UTILIZATION_UPPERHostname
SensorsTCA_SENSOR_FAN_UPPERHostname, Sensor Name
SensorsTCA_SENSOR_POWER_UPPERHostname, Sensor Name
SensorsTCA_SENSOR_TEMPERATURE_UPPERHostname, Sensor Name
SensorsTCA_SENSOR_VOLTAGE_UPPERHostname, Sensor Name
Forwarding ResourcesTCA_TCAM_TOTAL_ROUTE_ENTRIES_UPPERHostname
Forwarding ResourcesTCA_TCAM_TOTAL_MCAST_ROUTES_UPPERHostname
Forwarding ResourcesTCA_TCAM_MAC_ENTRIES_UPPERHostname
Forwarding ResourcesTCA_TCAM_ECMP_NEXTHOPS_UPPERHostname
Forwarding ResourcesTCA_TCAM_IPV4_ROUTE_UPPERHostname
Forwarding ResourcesTCA_TCAM_IPV4_HOST_UPPERHostname
Forwarding ResourcesTCA_TCAM_IPV6_ROUTE_UPPERHostname
Forwarding ResourcesTCA_TCAM_IPV6_HOST_UPPERHostname
ACL ResourcesTCA_TCAM_IN_ACL_V4_FILTER_UPPERHostname
ACL ResourcesTCA_TCAM_EG_ACL_V4_FILTER_UPPERHostname
ACL ResourcesTCA_TCAM_IN_ACL_V4_MANGLE_UPPERHostname
ACL ResourcesTCA_TCAM_EG_ACL_V4_MANGLE_UPPERHostname
ACL ResourcesTCA_TCAM_IN_ACL_V6_FILTER_UPPERHostname
ACL ResourcesTCA_TCAM_EG_ACL_V6_FILTER_UPPERHostname
ACL ResourcesTCA_TCAM_IN_ACL_V6_MANGLE_UPPERHostname
ACL ResourcesTCA_TCAM_EG_ACL_V6_MANGLE_UPPERHostname
ACL ResourcesTCA_TCAM_IN_ACL_8021x_FILTER_UPPERHostname
ACL ResourcesTCA_TCAM_ACL_L4_PORT_CHECKERS_UPPERHostname
ACL ResourcesTCA_TCAM_ACL_REGIONS_UPPERHostname
ACL ResourcesTCA_TCAM_IN_ACL_MIRROR_UPPERHostname
ACL ResourcesTCA_TCAM_ACL_18B_RULES_UPPERHostname
ACL ResourcesTCA_TCAM_ACL_32B_RULES_UPPERHostname
ACL ResourcesTCA_TCAM_ACL_54B_RULES_UPPERHostname
ACL ResourcesTCA_TCAM_IN_PBR_V4_FILTER_UPPERHostname
ACL ResourcesTCA_TCAM_IN_PBR_V6_FILTER_UPPERHostname

Scopes are displayed as regular expressions in the rule card.

ScopeDisplay in CardResult
All deviceshostname = *Show events for all devices
All interfacesifname = *Show events for all devices and all interfaces
All sensorss_name = *Show events for all devices and all sensors
Particular devicehostname = leaf01Show events for leaf01 switch
Particular interfacesifname = swp14Show events for swp14 interface
Particular sensorss_name = fan2Show events for the fan2 fan
Set of deviceshostname ^ leafShow events for switches having names starting with leaf
Set of interfacesifname ^ swpShow events for interfaces having names starting with swp
Set of sensorss_name ^ fanShow events for sensors having names starting with fan

When a rule is filtered by more than one parameter, each is displayed on the card. Leaving a value blank for a parameter defaults to all; all hostnames, interfaces, or sensors.

Create a TCA Rule

Now that you know which events are supported and how to set the scope, you can create a basic rule to deliver one of the TCA events to a notification channel.

To create a TCA rule:

  1. Click to open the Main Menu.

  2. Click Threshold Crossing Rules under Notifications.

  3. Click to add a rule.

    The Create TCA Rule dialog opens. Three steps create the rule.

    Note that you can move forward and backward until you are satisfied with your rule definition.

  4. On the Enter Details step, enter a name for your rule, choose your TCA event type, and assign a severity.

    Note: The rule name has a maximum of 20 characters (including spaces).

  5. Click Next.

  6. On the Choose Event step, select the attribute to measure against.

    Note: The attributes presented depend on the event type chosen in the Enter Details step. This example shows the attributes available when Resource Utilization was selected.

  7. Click Next.

  8. On the Set Threshold step, enter a threshold value.

    If you stop there and click Finish, the event is triggered for all monitored devices in the network.

    If you want to restrict the rule to a particular device, toggle the scope filter and enter a hostname or other parameter values. Then click Finish.

This example shows two rules. The rule on the left triggers an informational event when switch leaf01 exceeds the maximum CPU utilization of 87%. The rule on the right triggers a critical event when any device exceeds the maximum CPU utilization of 93%. Note that the cards indicate both rules are currently Active.

View All TCA Rules

You can view all of the threshold-crossing event rules you have created by clicking and then selecting Threshold Crossing Rules under Notifications.

Modify TCA Rules

You can modify the threshold value and scope of any existing rules.

To edit a rule:

  1. Click to open the Main Menu.

  2. Click Threshold Crossing Rules under Notifications.

  3. Locate the rule you want to modify and hover over the card.

  4. Click .

  5. Modify the rule or the scope.

  6. Click Update Rule.

If you want to modify the rule name or severity after creating the rule, you must delete the rule and recreate it.

Manage TCA Rules

Once you have created a bunch of rules, you might have the need to manage them; suppress a rule, disable a rule, or delete a rule.

Rule States

The TCA rules have three possible states:

Suppress a Rule

To suppress a rule for a designated amount of time, you must change the state of the rule.

To suppress a rule:

  1. Click to open the Main Menu.

  2. Click Threshold Crossing Rules under Notifications.

  3. Locate the rule you want to suppress.

  4. Click Disable.

  5. Click in the Date/Time field to set when you want the rule to be automatically re-enabled.

  6. Click Disable.

    Note the changes in the card:

    • The state is now marked as Inactive, but remains green
    • The date and time that the rule will be enabled is noted in the Suppressed field
    • The Disable option has changed to Disable Forever. Refer to Disable a Rule for information about this change.

Disable a Rule

To disable a rule until you want to manually re-enable it, you must change the state of the rule.

To disable a rule that is currently active:

  1. Click to open the Main Menu.

  2. Click Threshold Crossing Rules under Notifications.

  3. Locate the rule you want to disable.

  4. Click Disable.

  5. Leave the Date/Time field blank.

  6. Click Disable.

    Note the changes in the card:

    • The state is now marked as Inactive and is red
    • The rule definition is grayed out
    • The Disable option has changed to Enable to reactivate the rule when you are ready

To disable a rule that is currently suppressed:

  1. Click to open the Main Menu.

  2. Click Threshold Crossing Rules under Notifications.

  3. Locate the rule you want to disable.

  4. Click Disable Forever.

    Note the changes in the card:

    • The state is now marked as Inactive and is red
    • The rule definition is grayed out
    • The Disable option has changed to Enable to reactivate the rule when you are ready

Delete a Rule

You might find that you no longer want to received event notifications for a particular TCA event. In that case, you can either disable the event if you think you may want to receive them again or delete the rule altogether. Refer to Disable a Rule in the first case. Follow the instructions here to remove the rule. The rule can be in any of the three states.

To delete a rule:

  1. Click to open the Main Menu.

  2. Click Threshold Crossing Rules under Notifications.

  3. Locate the rule you want to remove and hover over the card.

  4. Click .

Resolve Scope Conflicts

There may be occasions where the scope defined by multiple rules for a given TCA event may overlap each other. In such cases, the TCA rule with the most specific scope that is still true is used to generate the event.

To clarify this, consider this example. Three events have occurred:

NetQ attempts to match the TCA event against hostname and interface name with three TCA rules with different scopes:

The result is:

In summary:

Input EventScope ParametersRule 1, Scope 1Rule 2, Scope 2Rule 3, Scope 3Scope Applied
leaf01, swp1Hostname, Interfacehostname=leaf01, ifname=swp1hostname ^ leaf, ifname=*hostname=*, ifname=*Scope 1
leaf01, swp3Hostname, Interfacehostname=leaf01, ifname=swp1hostname ^ leaf, ifname=*hostname=*, ifname=*Scope 2
spine01, swp1Hostname, Interfacehostname=leaf01, ifname=swp1hostname ^ leaf, ifname=*hostname=*, ifname=*Scope 3

Lifecycle Management

As an administrator, you want to manage the deployment of Cumulus NetQ software onto your network devices (servers, appliances, switches, and hosts) in the most efficient way and with the most information about the process as possible. With this release, NetQ provides the first of many features to enable you to do just that. It includes the ability to take a snapshot of the live network state and configuration before you make changes to your network, take a snapshot after you make those changes, and then compare them.

Create a Network Snapshot

It is simple to capture the state of your network using the snapshot feature.

To create a snapshot:

  1. From any workbench, click in the workbench header.

  2. Click Create Snapshot.

  3. Enter a name and, optionally, a descriptive note for the snapshot.

  4. Click Finish.

    A medium Snapshot card appears on your desktop. Spinning arrows are visible while it works. When it finishes you can see the number of items that have been captured, and if any failed. This example shows a successful result.

    If you change your mind and do not want to create the snapshot, click Back or Choose Action. Do not click Done until you are ready to close the card. Done saves the snapshot automatically.

Compare Network Snapshots

You can compare the state of your network before and after an upgrade or other configuration change to validate the changes.

To compare network snapshots:

  1. Create a snapshot (as described in previous section) before you make any changes.

  2. Make your changes.

  3. Create a second snapshot.

  4. Compare the results of the two snapshots:

    • If you have the two desired snapshot cards open:

      • Simply put them next to each other to view an overview.
      • Scroll down to see all of the items.
    • If you have only one of the cards open:

      • Click Compare on the open card.
      • Select the snapshot to compare with. Note that only snapshots taken before this snapshot appear in the selection list.
    • If you have closed one or both of the cards (you may have created them some time before):

      • Click .
      • Click Compare Snapshots.
      • Click on the two snapshots you want to compare.
      • Click Finish. Note that two snapshots must be selected before Finish is active.

    In the latter two cases, the large Snapshot card opens. The only difference is in the card title. If you opened the comparison card from a snapshot on your workbench, the title includes the name of that card. If you open the comparison card through the Snapshot menu, the title is generic, indicating a comparison only. Functionally, you have reached the same point.

Interpreting the Comparison Data

For each network element that is compared, count values and changes are shown:

For example, if the snapshot taken first had a total count of 110 interfaces, changes were made that added 40 interfaces and removed 32 interfaces before the second snapshot was taken, the second snapshot total count of interfaces would be eight more than in the first snapshot, or 118.

From this card, you can also change which snapshots to compare. Select an alternate snapshot from one of the two snapshot dropdowns and then click Compare.

View Change Details

You can view additional details about the changes that have occurred between the two snapshots by clicking View Details. This opens the full screen Detailed Snapshot Comparison card.

From this card you can:

ElementData Descriptions
BGP
  • Hostname: Name of the host running the BGP session
  • VRF: Virtual route forwarding interface if used
  • BGP Session: Session that was removed or added
  • ASN: Autonomous system number
CLAG
  • Hostname: Name of the host running the CLAG session
  • CLAG Sysmac: MAC address for a bond interface pair that was removed or added
Interface
  • Hostname: Name of the host where the interface resides
  • IF Name: Name of the interface that was removed or added
IP Address
  • Hostname: Name of the host where address was removed or added
  • Prefix: IP address prefix
  • Mask: IP address mask
  • IF Name: Name of the interface that owns the address
Links
  • Hostname: Name of the host where the link was removed or added
  • IF Name: Name of the link
  • Kind: Bond, bridge, eth, loopback, macvlan, swp, vlan, vrf, or vxlan
LLDP
  • Hostname: Name of the discovered host that was removed or added
  • IF Name: Name of the interface
MAC Address
  • Hostname: Name of the host where MAC address resides
  • MAC address: MAC address that was removed or added
  • VLAN: VLAN associated with the MAC address
Neighbor
  • Hostname: Name of the neighbor peer that was removed or added
  • VRF: Virtual route forwarding interface if used
  • IF Name: Name of the neighbor interface
  • IP address: Neighbor IP address
Node
  • Hostname: Name of the network node that was removed or added
OSPF
  • Hostname: Name of the host running the OSPF session
  • IF Name: Name of the associated interface that was removed or added
  • Area: Routing domain for this host device
  • Peer ID: Network subnet address of router with access to the peer device
Route
  • Hostname: Name of the host running the route that was removed or added
  • VRF: Virtual route forwarding interface associated with route
  • Prefix: IP address prefix
Sensors
  • Hostname: Name of the host where sensor resides
  • Kind: Power supply unit, fan, or temperature
  • Name: Name of the sensor that was removed or added
Services
  • Hostname: Name of the host where service is running
  • Name: Name of the service that was removed or added
  • VRF: Virtual route forwarding interface associated with service

Manage Network Snapshots

You can create as many snapshots as you like and view them at any time. When a snapshot becomes old and no longer useful, you can remove it.

To view an existing snapshot:

  1. From any workbench, click in the workbench header.

  2. Click View/Delete Snapshots.

  3. Click View.

  4. Click one or more snapshots you want to view, then click Finish.

    Click Back or Choose Action to cancel viewing of your selected snapshot(s).

To remove an existing snapshot:

  1. From any workbench, click in the workbench header.

  2. Click View/Delete Snapshots.

  3. Click Delete.

  4. Click one or more snapshots you want to remove, then click Finish.

    Click Back or Choose Action to cancel the deletion of your selected snapshot(s).

Monitor Events

Two event workflows, the Alarms card workflow and the Info card workflow, provide a view into the events occurring in the network. The Alarms card workflow tracks critical severity events, whereas the Info card workflow tracks all warning, info, and debug severity events.

To focus on events from a single device perspective, refer to Monitor Switches.

Monitor Critical Events

You can easily monitor critical events occurring across your network using the Alarms card. You can determine the number of events for the various system, interface, and network protocols and services components in the network. The content of the cards in the workflow is described first, and then followed by common tasks you would perform using this card workflow.

Alarms Card Workflow Summary

The small Alarms card displays:

ItemDescription
Indicates data is for all critical severity events in the network
Alarm trendTrend of alarm count, represented by an arrow:
  • Pointing upward and bright pink: alarm count is higher than the last two time periods, an increasing trend
  • Pointing downward and green: alarm count is lower than the last two time periods, a decreasing trend
  • No arrow: alarm count is unchanged over the last two time periods, trend is steady
Alarm scoreCurrent count of alarms during the designated time period
Alarm ratingCount of alarms relative to the average count of alarms during the designated time period:
  • Low: Count of alarms is below the average count; a nominal count
  • Med: Count of alarms is in range of the average count; some room for improvement
  • High: Count of alarms is above the average count; user intervention recommended
ChartDistribution alarms received during the designated time period and a total count of all alarms present in the system

The medium Alarms card displays:

ItemDescription
Time periodRange of time in which the displayed data was collected; applies to all card sizes
Indicates data is for all critical events in the network
CountTotal number of alarms received during the designated time period
Alarm scoreCurrent count of alarms received from each category (overall, system, interface, and network services) during the designated time period
ChartDistribution of all alarms received from each category during the designated time period

The large Alarms card has one tab.

The Alarm Summary tab displays:

ItemDescription
Time periodRange of time in which the displayed data was collected; applies to all card sizes
Indicates data is for all system, trace and interface critical events in the network
Alarm Distribution

Chart: Distribution of all alarms received from each category during the designated time period:

  • NetQ Agent
  • BTRFS Information
  • CL Support
  • Config Diff
  • CL License
  • Installed Packages
  • Link
  • LLDP
  • MTU
  • Node
  • Port
  • Resource
  • Running Config Diff
  • Sensor
  • Services
  • SSD Utilization
  • TCA Interface Stats
  • TCA Resource Utilization
  • TCA Sensors
The category with the largest number of alarms is shown at the top, followed by the next most, down to the chart with the fewest alarms.

Count: Total number of alarms received from each category during the designated time period

TableListing of items that match the filter selection for the selected alarm categories:
  • Events by Most Recent: Most recent event are listed at the top
  • Devices by Event Count: Devices with the most events are listed at the top
Show All EventsOpens full screen Events | Alarms card with a listing of all events

The full screen Alarms card provides tabs for all events.

ItemDescription
TitleEvents | Alarms
Closes full screen card and returns to workbench
Default TimeRange of time in which the displayed data was collected
Displays data refresh status. Click to pause data refresh. Click to resume data refresh. Current refresh rate is visible by hovering over icon.
ResultsNumber of results found for the selected tab
All EventsDisplays all events (both alarms and info) received in the time period. By default, the requests list is sorted by the date and time that the event occurred (Time). This tab provides the following additional data about each request:
  • Source: Hostname of the given event
  • Message: Text describing the alarm or info event that occurred
  • Type: Name of network protocol and/or service that triggered the given event
  • Severity: Importance of the event-critical, warning, info, or debug
Table ActionsSelect, export, or filter the list. Refer to Table Settings.

View Alarm Status Summary

A summary of the critical alarms in the network includes the number of alarms, a trend indicator, a performance indicator, and a distribution of those alarms.

To view the summary, open the small Alarms card.

In this example, there are a small number of alarms (2), the number of alarms is decreasing (down arrow), and there are fewer alarms right now than the average number of alarms during this time period. This would indicate no further investigation is needed. Note that with such a small number of alarms, the rating may be a bit skewed.

View the Distribution of Alarms

It is helpful to know where and when alarms are occurring in your network. The Alarms card workflow enables you to see the distribution of alarms based on its source-network services, interfaces, or other system services. You can also view the trend of alarms in each source category.

To view the alarm distribution, open the medium Alarms card. Scroll down to view all of the charts.

Monitor System and Interface Alarm Details

The Alarms card workflow enables users to easily view and track critical severity system and interface alarms occurring anywhere in your network.

View All System and Interface Alarms

You can view the alarms associated with the system and interfaces using the Alarms card workflow. You can sort alarms based on their occurrence or view devices with the most network services alarms.

To view network services alarms, open the large Alarms card.

From this card, you can view the distribution of alarms for each of the categories over time. The charts are sorted by total alarm count, with the highest number of alarms i a category listed at the top. Scroll down to view any hidden charts. A list of the associated alarms is also displayed. By default, the list of the most recent alarms for the systems and interfaces is displayed when viewing the large cards.

View Devices with the Most Alarms

You can filter instead for the devices that have the most alarms.

To view devices with the most alarms, open the large Alarms card, and then select Devices by event count from the dropdown.

Filter Alarms by Category

You can focus your view to include alarms for one or more selected alarm categories.

To filter for selected categories:

  1. Click the checkbox to the left of one or more charts to remove that set of alarms from the table on the right.

  2. Select the Devices by event count to view the devices with the most alarms for the selected categories.

  3. Switch back to most recent events by selecting Events by most recent.

  4. Click the checkbox again to return a category’s data to the table.

In this example, we removed the Services from the event listing.

Compare Alarms with a Prior Time

You can change the time period for the data to compare with a prior time. If the same devices are consistently indicating the most alarms, you might want to look more carefully at those devices using the Switches card workflow.

To compare two time periods:

  1. Open a second Alarm Events card. Remember it goes to the bottom of the workbench.

  2. Switch to the large size view.

  3. Move the card to be next to the original Alarm Events card. Note that moving large cards can take a few extra seconds since they contain a large amount of data.

  4. Hover over the card and click .

  5. Select a different time period.

  6. Compare the two cards with the Devices by event count filter applied.

    In this example, both the total alarm count and the devices with the most alarms in each time period are unchanged. You could go back further in time to see if this changes or investigate the current status of the largest offenders.

View All Events

You can view all events in the network either by clicking the Show All Events link under the table on the large Alarm Events card, or by opening the full screen Alarm Events card.

OR

To return to your workbench, click in the top right corner of the card.

Monitor Informational Events

You can easily monitor warning, info, and debug severity events occurring across your network using the Info card. You can determine the number of events for the various system, interface, and network protocols and services components in the network. The content of the cards in the workflow is described first, and then followed by common tasks you would perform using this card workflow.

Info Card Workflow Summary

The Info card workflow enables users to easily view and track informational alarms occurring anywhere in your network.

The small Info card displays:

ItemDescription
Indicates data is for all warning, info, and debug severity events in the network
Info countNumber of info events received during the designated time period
Alarm countNumber of alarm events received during the designated time period
ChartDistribution of all info events and alarms received during the designated time period

The medium Info card displays:

ItemDescription
Time periodRange of time in which the displayed data was collected; applies to all card sizes
Indicates data is for all warning, info, and debug severity events in the network
Types of InfoChart which displays the services that have triggered events during the designated time period. Hover over chart to view a count for each type.
Distribution of InfoInfo Status
  • Count: Number of info events received during the designated time period
  • Chart: Distribution of all info events received during the designated time period
Alarms Status
  • Count: Number of alarm events received during the designated time period
  • Chart: Distribution of all alarm events received during the designated time period

The large Info card displays:

ItemDescription
Time periodRange of time in which the displayed data was collected; applies to all card sizes
Indicates data is for all warning, info, and debug severity events in the network
Types of InfoChart which displays the services that have triggered events during the designated time period. Hover over chart to view a count for each type.
Distribution of InfoInfo Status
  • Count: Current number of info events received during the designated time period
  • Chart: Distribution of all info events received during the designated time period
Alarms Status
  • Count: Current number of alarm events received during the designated time period
  • Chart: Distribution of all alarm events received during the designated time period
TableListing of items that match the filter selection:
  • Events by Most Recent: Most recent event are listed at the top
  • Devices by Event Count: Devices with the most events are listed at the top
Show All EventsOpens full screen Events | Info card with a listing of all events

The full screen Info card provides tabs for all events.

ItemDescription
TitleEvents | Info
Closes full screen card and returns to workbench
Default TimeRange of time in which the displayed data was collected
Displays data refresh status. Click to pause data refresh. Click to resume data refresh. Current refresh rate is visible by hovering over icon.
ResultsNumber of results found for the selected tab
All EventsDisplays all events (both alarms and info) received in the time period. By default, the requests list is sorted by the date and time that the event occurred (Time). This tab provides the following additional data about each request:
  • Source: Hostname of the given event
  • Message: Text describing the alarm or info event that occurred
  • Type: Name of network protocol and/or service that triggered the given event
  • Severity: Importance of the event-critical, warning, info, or debug
Table ActionsSelect, export, or filter the list. Refer to Table Settings.

View Info Status Summary

A summary of the informational events occurring in the network can be found on the small, medium, and large Info cards. Additional details are available as you increase the size of the card.

To view the summary with the small Info card, simply open the card. This card gives you a high-level view in a condensed visual, including the number and distribution of the info events along with the alarms that have occurred during the same time period.

To view the summary with the medium Info card, simply open the card. This card gives you the same count and distribution of info and alarm events, but it also provides information about the sources of the info events and enables you to view a small slice of time using the distribution charts.

Use the chart at the top of the card to view the various sources of info events. The four or so types with the most info events are called out separately, with all others collected together into an Other category. Hover over segment of chart to view the count for each type.

To view the summary with the large Info card, open the card. The left side of the card provides the same capabilities as the medium Info card.

Compare Timing of Info and Alarm Events

While you can see the relative relationship between info and alarm events on the small Info card, the medium and large cards provide considerably more information. Open either of these to view individual line charts for the events. Generally, alarms have some corollary info events. For example, when a network service becomes unavailable, a critical alarm is often issued, and when the service becomes available again, an info event of severity warning is generated. For this reason, you might see some level of tracking between the info and alarm counts and distributions. Some other possible scenarios:

View All Info Events Sorted by Time of Occurrence

You can view all info events using the large Info card. Open the large card and confirm the Events By Most Recent option is selected in the filter above the table on the right. When this option is selected, all of the info events are listed with the most recently occurring event at the top. Scrolling down shows you the info events that have occurred at an earlier time within the selected time period for the card.

View Devices with the Most Info Events

You can filter instead for the devices that have the most info events by selecting the Devices by Event Count option from the filter above the table.

View All Events

You can view all events in the network either by clicking the Show All Events link under the table on the large Info Events card, or by opening the full screen Info Events card.

OR

To return to your workbench, click in the top right corner of the card.

Events Reference

The following table lists all event messages organized by type.

The messages can be viewed through third-party notification applications. For details about configuring notifications using the NetQ CLI, refer to Integrate NetQ with Notification Applications.

For information about configuring threshold-based events (TCAs), refer to Application Management.

TypeTriggerSeverityMessage FormatExample
agentNetQ Agent state changed to Rotten (not heard from in over 15 seconds)CriticalAgent state changed to rottenAgent state changed to rotten
agentNetQ Agent rebootedCriticalNetq-agent rebooted at (@last_boot)Netq-agent rebooted at 1573166417
agentNode running NetQ Agent rebootedCriticalSwitch rebooted at (@sys_uptime)Switch rebooted at 1573166131
agentNetQ Agent state changed to FreshInfoAgent state changed to freshAgent state changed to fresh
agentNetQ Agent state was resetInfoAgent state was paused and resumed at (@last_reinit)Agent state was paused and resumed at 1573166125
agentVersion of NetQ Agent has changedInfoAgent version has been changed old_version:@old_version and new_version:@new_version. Agent reset at @sys_uptimeAgent version has been changed old_version:2.1.2 and new_version:2.3.1. Agent reset at 1573079725
bgpBGP Session state changedCriticalBGP session with peer @peer @neighbor vrf @vrf state changed from @old_state to @new_stateBGP session with peer leaf03 leaf04 vrf mgmt state changed from Established to Failed
bgpBGP Session state changed from Failed to EstablishedInfoBGP session with peer @peer @peerhost @neighbor vrf @vrf session state changed from Failed to EstablishedBGP session with peer swp5 spine02 spine03 vrf default session state changed from Failed to Established
bgpBGP Session state changed from Established to FailedInfoBGP session with peer @peer @neighbor vrf @vrf state changed from established to failedBGP session with peer leaf03 leaf04 vrf mgmt state changed from down to up
bgpThe reset time for a BGP session changedInfoBGP session with peer @peer @neighbor vrf @vrf reset time changed from @old_last_reset_time to @new_last_reset_timeBGP session with peer spine03 swp9 vrf vrf2 reset time changed from 1559427694 to 1559837484
btrfsinfoDisk space available after BTRFS allocation is less than 80% of partition size or only 2 GB remain.Critical@info : @detailshigh btrfs allocation space : greater than 80% of partition size, 61708420
btrfsinfoIndicates if space would be freed by a rebalance operation on the diskCritical@info : @detailsdata storage efficiency : space left after allocation greater than chunk size 6170849.2","
cableLink speed is not the same on both ends of the linkCritical@ifname speed @speed, mismatched with peer @peer @peer_if speed @peer_speedswp2 speed 10, mismatched with peer server02 swp8 speed 40
cableThe speed setting for a given port changedInfo@ifname speed changed from @old_speed to @new_speedswp9 speed changed from 10 to 40
cableThe transceiver status for a given port changedInfo@ifname transceiver changed from @old_transceiver to @new_transceiverswp4 transceiver changed from disabled to enabled
cableThe vendor of a given transceiver changedInfo@ifname vendor name changed from @old_vendor_name to @new_vendor_nameswp23 vendor name changed from Broadcom to Mellanox
cableThe part number of a given transceiver changedInfo@ifname part number changed from @old_part_number to @new_part_numberswp7 part number changed from FP1ZZ5654002A to MSN2700-CS2F0
cableThe serial number of a given transceiver changedInfo@ifname serial number changed from @old_serial_number to @new_serial_numberswp4 serial number changed from 571254X1507020 to MT1552X12041
cableThe status of forward error correction (FEC) support for a given port changedInfo@ifname supported fec changed from @old_supported_fec to @new_supported_fecswp12 supported fec changed from supported to unsupported

swp12 supported fec changed from unsupported to supported

cableThe advertised support for FEC for a given port changedInfo@ifname supported fec changed from @old_advertised_fec to @new_advertised_fecswp24 supported FEC changed from advertised to not advertised
cableThe FEC status for a given port changedInfo@ifname fec changed from @old_fec to @new_fecswp15 fec changed from disabled to enabled
clagCLAG remote peer state changed from up to downCriticalPeer state changed to downPeer state changed to down
clagLocal CLAG host MTU does not match its remote peer MTUCriticalSVI @svi1 on vlan @vlan mtu @mtu1 mismatched with peer mtu @mtu2SVI svi7 on vlan 4 mtu 1592 mistmatched with peer mtu 1680
clagCLAG SVI on VLAN is missing from remote peer stateWarningSVI on vlan @vlan is missing from peerSVI on vlan vlan4 is missing from peer
clagCLAG peerlink is not opperating at full capacity. At least one link is down.Warning

Clag peerlink not at full redundancy, member link @slave is downClag peerlink not at full redundancy, member link swp40 is down
clagCLAG remote peer state changed from down to upInfoPeer state changed to upPeer state changed to up
clagLocal CLAG host state changed from down to upInfoClag state changed from down to upClag state changed from down to up
clagCLAG bond in Conflicted state was updated with new bondsInfoClag conflicted bond changed from @old_conflicted_bonds to @new_conflicted_bondsClag conflicted bond changed from swp7 swp8 to @swp9 swp10
clagCLAG bond changed state from protodown to up stateInfoClag conflicted bond changed from @old_state_protodownbond to @new_state_protodownbondClag conflicted bond changed from protodown to up
clsupportA new CL Support file has been created for the given nodeCriticalHostName @hostname has new CL SUPPORT fileHostName leaf01 has new CL SUPPORT file
configdiffConfiguration file deleted on a deviceCritical@hostname config file @type was deletedspine03 config file /etc/frr/frr.conf was deleted
configdiffConfiguration file has been createdInfo@hostname config file @type was createdleaf12 config file /etc/lldp.d/README.conf was created
configdiffConfiguration file has been modifiedInfo@hostname config file @type was modifiedspine03 config file /etc/frr/frr.conf was modified
evpnA VNI was configured and moved from the up state to the down stateCriticalVNI @vni state changed from up to downVNI 36 state changed from up to down
evpnA VNI was configured and moved from the down state to the up stateInfoVNI @vni state changed from down to upVNI 36 state changed from down to up
evpnThe kernel state changed on a VNIInfoVNI @vni kernel state changed from @old_in_kernel_state to @new_in_kernel_stateVNI 3 kernel state changed from down to up
evpnA VNI state changed from not advertising all VNIs to advertising all VNIsInfoVNI @vni vni state changed from @old_adv_all_vni_state to @new_adv_all_vni_stateVNI 11 vni state changed from false to true
licenseLicense state is missing or invalidCriticalLicense check failed, name @lic_name state @stateLicense check failed, name agent.lic state invalid
licenseLicense state is missing or invalid on a particular deviceCriticalLicense check failed on @hostnameLicense check failed on leaf03
linkLink operational state changed from up to downCriticalHostName @hostname changed state from @old_state to @new_state Interface:@ifnameHostName leaf01 changed state from up to down Interface:swp34
linkLink operational state changed from down to upInfoHostName @hostname changed state from @old_state to @new_state Interface:@ifnameHostName leaf04 changed state from down to up Interface:swp11
lldpLocal LLDP host has new neighbor informationInfoLLDP Session with host @hostname and @ifname modified fields @changed_fieldsLLDP Session with host leaf02 swp6 modified fields leaf06 swp21
lldpLocal LLDP host has new peer interface nameInfoLLDP Session with host @hostname and @ifname @old_peer_ifname changed to @new_peer_ifnameLLDP Session with host spine01 and swp5 swp12 changed to port12
lldpLocal LLDP host has new peer hostnameInfoLLDP Session with host @hostname and @ifname @old_peer_hostname changed to @new_peer_hostnameLLDP Session with host leaf03 and swp2 leaf07 changed to exit01
lnvVXLAN registration daemon, vxrd, is not runningCriticalvxrd service not runningvxrd service not running
mtuVLAN interface link MTU is smaller than that of its parent MTUWarningvlan interface @link mtu @mtu is smaller than parent @parent mtu @parent_mtuvlan interface swp3 mtu 1500 is smaller than parent peerlink-1 mtu 1690
mtuBridge interface MTU is smaller than the member interface with the smallest MTUWarningbridge @link mtu @mtu is smaller than least of member interface mtu @minbridge swp0 mtu 1280 is smaller than least of member interface mtu 1500
ntpNTP sync state changed from in sync to not in syncCriticalSync state changed from @old_state to @new_state for @hostnameSync state changed from in sync to not sync for leaf06
ntpNTP sync state changed from not in sync to in syncInfoSync state changed from @old_state to @new_state for @hostnameSync state changed from not sync to in sync for leaf06
ospfOSPF session state on a given interface changed from Full to a down stateCriticalOSPF session @ifname with @peer_address changed from Full to @down_state

OSPF session swp7 with 27.0.0.18 state changed from Full to Fail

OSPF session swp7 with 27.0.0.18 state changed from Full to ExStart

ospfOSPF session state on a given interface changed from a down state to fullInfoOSPF session @ifname with @peer_address changed from @down_state to Full

OSPF session swp7 with 27.0.0.18 state changed from Down to Full

OSPF session swp7 with 27.0.0.18 state changed from Init to Full

OSPF session swp7 with 27.0.0.18 state changed from Fail to Full

packageinfoPackage version on device does not match the version identified in the existing manifestCritical@package_name manifest version mismatchnetq-apps manifest version mismatch
ptmPhysical interface cabling does not match configuration specified in topology.dot fileCriticalPTM cable status failedPTM cable status failed
ptmPhysical interface cabling matches configuration specified in topology.dot fileCriticalPTM cable status passedPTM cable status passed
resourceA physical resource has been deleted from a deviceCriticalResource Utils deleted for @hostnameResource Utils deleted for spine02
resourceRoot file system access on a device has changed from Read/Write to Read OnlyCritical@hostname root file system access mode set to Read Onlyserver03 root file system access mode set to Read Only
resourceRoot file system access on a device has changed from Read Only to Read/WriteInfo@hostname root file system access mode set to Read/Writeleaf11 root file system access mode set to Read/Write
resourceA physical resource has been added to a deviceInfoResource Utils added for @hostnameResource Utils added for spine04
runningconfigdiffRunning configuration file has been modifiedInfo@commandname config result was modified@commandname config result was modified
sensorA fan or power supply unit sensor has changed stateCriticalSensor @sensor state changed from @old_s_state to @new_s_stateSensor fan state changed from up to down
sensorA temperature sensor has crossed the maximum threshold for that sensorCriticalSensor @sensor max value @new_s_max exceeds threshold @new_s_critSensor temp max value 110 exceeds the threshold 95
sensorA temperature sensor has crossed the minimum threshold for that sensorCriticalSensor @sensor min value @new_s_lcrit fall behind threshold @new_s_minSensor psu min value 10 fell below threshold 25
sensorA temperature, fan, or power supply sensor state changedInfoSensor @sensor state changed from @old_state to @new_state

Sensor temperature state changed from critical to ok

Sensor fan state changed from absent to ok

Sensor psu state changed from bad to ok

sensorA fan or power supply sensor state changedInfoSensor @sensor state changed from @old_s_state to @new_s_state

Sensor fan state changed from down to up

Sensor psu state changed from down to up

servicesA service status changed from down to upCriticalService @name status changed from @old_status to @new_statusService bgp status changed from down to up
servicesA service status changed from up to downCriticalService @name status changed from @old_status to @new_statusService lldp status changed from up to down
servicesA service changed state from inactive to activeInfoService @name changed state from inactive to active

Service bgp changed state from inactive to active

Service lldp changed state from inactive to active

ssdutil3ME3 disk health has dropped below 10%Critical@info: @detailslow health : 5.0%
ssdutilA dip in 3ME3 disk health of more than 2% has occured within the last 24 hoursCritical@info: @detailssignificant health drop : 3.0%
tcaPercentage of CPU utilization exceeded user-defined maximum threshold on a switchCriticalCPU Utilization for host @hostname exceed configured mark @cpu_utilizationCPU Utilization for host leaf11 exceed configured mark 85
tcaPercentage of disk utilization exceeded user-defined maximum threshold on a switchCriticalDisk Utilization for host @hostname exceed configured mark @disk_utilizationDisk Utilization for host leaf11 exceed configured mark 90
tcaPercentage of memory utilization exceeded user-defined maximum threshold on a switchCriticalMemory Utilization for host @hostname exceed configured mark @mem_utilizationMemory Utilization for host leaf11 exceed configured mark 95
tcaNumber of transmit bytes exceeded user-defined maximum threshold on a switch interfaceCriticalTX bytes upper threshold breached for host @hostname ifname:@ifname value: @tx_bytesTX bytes upper threshold breached for host spine02 ifname:swp4 value: 20000
tcaNumber of broadcast transmit bytes exceeded user-defined maximum threshold on a switch interfaceCriticalTX broadcast upper threshold breached for host @hostname ifname:@ifname value: @rx_broadcastTX broadcast upper threshold breached for host leaf04 ifname:swp45 value: 40200
tcaNumber of multicast transmit bytes exceeded user-defined maximum threshold on a switch interfaceCriticalTX multicast upper threshold breached for host @hostname ifname:@ifname value: @rx_broadcastTX multicast upper threshold breached for host leaf04 ifname:swp45 value: 30000
tcaNumber of receive bytes exceeded user-defined maximum threshold on a switch interfaceCriticalRX bytes upper threshold breached for host @hostname ifname:@ifname value: @tx_bytesRX bytes upper threshold breached for host spine02 ifname:swp4 value: 20000
tcaNumber of broadcast receive bytes exceeded user-defined maximum threshold on a switch interfaceCriticalRX broadcast upper threshold breached for host @hostname ifname:@ifname value: @rx_broadcastRX broadcast upper threshold breached for host leaf04 ifname:swp45 value: 40200
tcaNumber of multicast receive bytes exceeded user-defined maximum threshold on a switch interfaceCriticalRX multicast upper threshold breached for host @hostname ifname:@ifname value: @rx_broadcastRX multicast upper threshold breached for host leaf04 ifname:swp45 value: 30000
tcaFan speed exceeded user-defined maximum threshold on a switchCriticalSensor for @hostname exceeded threshold fan speed @s_input for sensor @s_nameSensor for spine03 exceeded threshold fan speed 700 for sensor fan2
tcaPower supply output exceeded user-defined maximum threshold on a switchCriticalSensor for @hostname exceeded threshold power @s_input watts for sensor @s_nameSensor for leaf14 exceeded threshold power 120 watts for sensor psu1
tcaTemperature (° C) exceeded user-defined maximum threshold on a switchCriticalSensor for @hostname exceeded threshold temperature @s_input for sensor @s_nameSensor for leaf14 exceeded threshold temperature 90 for sensor temp1
tcaPower supply voltage exceeded user-defined maximum threshold on a switchCriticalSensor for @hostname exceeded threshold voltage @s_input volts for sensor @s_nameSensor for leaf14 exceeded threshold voltage 12 volts for sensor psu2
versionAn unknown version of the operating system was detectedCriticalunexpected os version @my_verunexpected os version cl3.2
versionDesired version of the operating system is not availableCriticalos version @veros version cl3.7.9
versionAn unknown version of a software package was detectedCriticalexpected release version @verexpected release version cl3.6.2
versionDesired version of a software package is not availableCriticaldifferent from version @verdifferent from version cl4.0
vxlanReplication list is contains an inconsistent set of nodes<>Critical<>VNI @vni replication list inconsistent with @conflicts diff:@diff<>VNI 14 replication list inconsistent with ["leaf03","leaf04"] diff:+:["leaf03","leaf04"] -:["leaf07","leaf08"]

Monitor Network Performance

The core capabilities of Cumulus NetQ enable you to monitor your network by viewing performance and configuration data about your individual network devices and the entire fabric network-wide. The topics contained in this section describe monitoring tasks that apply across the entire network. For device-specific monitoring refer to Monitor Devices.

Monitor Network Health

As with any network, one of the challenges is keeping track of all of the moving parts. With the NetQ GUI, you can view the overall health of your network at a glance and then delve deeper for periodic checks or as conditions arise that require attention. For a general understanding of how well your network is operating, the Network Health card workflow is the best place to start as it contains the highest view and performance roll-ups.

Network Health Card Workflow Summary

The small Network Health card displays:

ItemDescription
Indicates data is for overall Network Health
Health trendTrend of overall network health, represented by an arrow:
  • Pointing upward and green: Health score in the most recent window is higher than in the last two data collection windows, an increasing trend
  • Pointing downward and bright pink: Health score in the most recent window is lower than in the last two data collection windows, a decreasing trend
  • No arrow: Health score is unchanged over the last two data collection windows, trend is steady

The data collection window varies based on the time period of the card. For a 24 hour time period (default), the window is one hour. This gives you current, hourly, updates about your network health.

Health score

Average of health scores for system health, network services health, and interface health during the last data collection window. The health score for each category is calculated as the percentage of items which passed validations versus the number of items checked.

The collection window varies based on the time period of the card. For a 24 hour time period (default), the window is one hour. This gives you current, hourly, updates about your network health.

Health ratingPerformance rating based on the health score during the time window:
  • Low: Health score is less than 40%
  • Med: Health score is between 40% and 70%
  • High: Health score is greater than 70%
ChartDistribution of overall health status during the designated time period

The medium Network Health card displays the distribution, score, and trend of the:

ItemDescription
Time periodRange of time in which the displayed data was collected; applies to all card sizes
Indicates data is for overall Network Health
Health trendTrend of system, network service, and interface health, represented by an arrow:
  • Pointing upward and green: Health score in the most recent window is higher than in the last two data collection windows, an increasing trend
  • Pointing downward and bright pink: Health score in the most recent window is lower than in the last two data collection windows, a decreasing trend
  • No arrow: Health score is unchanged over the last two data collection windows, trend is steady

The data collection window varies based on the time period of the card. For a 24 hour time period (default), the window is one hour. This gives you current, hourly, updates about your network health.

Health scorePercentage of devices which passed validation versus the number of devices checked during the time window for:
  • System health: NetQ Agent health, Cumulus Linux license status, and sensors
  • Network services health: BGP, CLAG, EVPN, LNV, NTP, OSPF, and VXLAN health
  • Interface health: interfaces MTU, VLAN health

The data collection window varies based on the time period of the card. For a 24 hour time period (default), the window is one hour. This gives you current, hourly, updates about your network health.

ChartDistribution of overall health status during the designated time period

The large Network Health card contains three tabs.

The System Health tab displays:

ItemDescription
Time periodRange of time in which the displayed data was collected; applies to all card sizes
Indicates data is for System Health
Health trendTrend of NetQ Agents, Cumulus Linux licenses, and sensor health, represented by an arrow:
  • Pointing upward and green: Health score in the most recent window is higher than in the last two data collection windows, an increasing trend
  • Pointing downward and bright pink: Health score in the most recent window is lower than in the last two data collection windows, a decreasing trend
  • No arrow: Health score is unchanged over the last two data collection windows, trend is steady

The data collection window varies based on the time period of the card. For a 24 hour time period (default), the window is one hour. This gives you current, hourly, updates about your network health.

Health score

Percentage of devices which passed validation versus the number of devices checked during the time window for NetQ Agents, Cumulus Linux license status, and platform sensors.

The data collection window varies based on the time period of the card. For a 24 hour time period (default), the window is one hour. This gives you current, hourly, updates about your network health.

ChartsDistribution of health score for NetQ Agents, Cumulus Linux license status, and platform sensors during the designated time period
TableListing of items that match the filter selection:
  • Most Failures: Devices with the most validation failures are listed at the top
  • Recent Failures: Most recent validation failures are listed at the top
Show All ValidationsOpens full screen Network Health card with a listing of validations performed by network service and protocol

The Network Service Health tab displays:

ItemDescription
Time periodRange of time in which the displayed data was collected; applies to all card sizes
Indicates data is for Network Protocols and Services Health
Health trendTrend of BGP, CLAG, EVPN, LNV, NTP, OSPF, and VXLAN services health, represented by an arrow:
  • Pointing upward and green: Health score in the most recent window is higher than in the last two data collection windows, an increasing trend
  • Pointing downward and bright pink: Health score in the most recent window is lower than in the last two data collection windows, a decreasing trend
  • No arrow: Health score is unchanged over the last two data collection windows, trend is steady

The data collection window varies based on the time period of the card. For a 24 hour time period (default), the window is one hour. This gives you current, hourly, updates about your network health.

Health score

Percentage of devices which passed validation versus the number of devices checked during the time window for BGP, CLAG, EVPN, LNV, NTP, and VXLAN protocols and services.

The data collection window varies based on the time period of the card. For a 24 hour time period (default), the window is one hour. This gives you current, hourly, updates about your network health.

ChartsDistribution of passing validations for BGP, CLAG, EVPN, LNV, NTP, and VXLAN services during the designated time period
TableListing of devices that match the filter selection:
  • Most Failures: Devices with the most validation failures are listed at the top
  • Recent Failures: Most recent validation failures are listed at the top
Show All ValidationsOpens full screen Network Health card with a listing of validations performed by network service and protocol

The Interface Health tab displays:

ItemDescription
Time periodRange of time in which the displayed data was collected; applies to all card sizes
Indicates data is for Interface Health
Health trendTrend of interfaces, VLAN, and MTU health, represented by an arrow:
  • Pointing upward and green: Health score in the most recent window is higher than in the last two data collection windows, an increasing trend
  • Pointing downward and bright pink: Health score in the most recent window is lower than in the last two data collection windows, a decreasing trend
  • No arrow: Health score is unchanged over the last two data collection windows, trend is steady

The data collection window varies based on the time period of the card. For a 24 hour time period (default), the window is one hour. This gives you current, hourly, updates about your network health.

Health score

Percentage of devices which passed validation versus the number of devices checked during the time window for interfaces, VLAN, and MTU protocols and ports.

The data collection window varies based on the time period of the card. For a 24 hour time period (default), the window is one hour. This gives you current, hourly, updates about your network health.

ChartsDistribution of passing validations for interfaces, VLAN, and MTU protocols and ports during the designated time period
TableListing of devices that match the filter selection:
  • Most Failures: Devices with the most validation failures are listed at the top
  • Recent Failures: Most recent validation failures are listed at the top
Show All ValidationsOpens full screen Network Health card with a listing of validations performed by network service and protocol

The full screen Network Health card displays all events in the network.

ItemDescription
TitleNetwork Health
Closes full screen card and returns to workbench
Default TimeRange of time in which the displayed data was collected
Displays data refresh status. Click to pause data refresh. Click to resume data refresh. Current refresh rate is visible by hovering over icon.
ResultsNumber of results found for the selected tab
Network protocol or service tabDisplays results of that network protocol or service validations that occurred during the designated time period. By default, the requests list is sorted by the date and time that the validation was completed (Time). This tab provides the following additional data about all protocols and services:
  • Validation Label: User-defined name of a validation or Default validation
  • Total Node Count: Number of nodes running the protocol or service
  • Checked Node Count: Number of nodes running the protocol or service included in the validation
  • Failed Node Count: Number of nodes that failed the validation
  • Rotten Node Count: Number of nodes that were unreachable during the validation run
  • Warning Node Count: Number of nodes that had errors during the validation run

The following protocols and services have additional data:

  • BGP
    • Total Session Count: Number of sessions running BGP included in the validation
    • Failed Session Count: Number of BGP sessions that failed the validation
  • EVPN
    • Total Session Count: Number of sessions running BGP included in the validation
    • Checked VNIs Count: Number of VNIs included in the validation
    • Failed BGP Session Count: Number of BGP sessions that failed the validation
  • Interfaces
    • Checked Port Count: Number of ports included in the validation
    • Failed Port Count: Number of ports that failed the validation.
    • Unverified Port Count: Number of ports where a peer could not be identified
  • Licenses
    • Checked License Count: Number of licenses included in the validation
    • Failed License Count: Number of licenses that failed the validation
  • MTU
    • Total Link Count: Number of links included in the validation
    • Failed Link Count: Number of links that failed the validation
  • NTP
    • Unknown Node Count: Number of nodes that NetQ sees but are not in its inventory an thus not included in the validation
  • OSPF
    • Total Adjacent Count: Number of adjacencies included in the validation
    • Failed Adjacent Count: Number of adjacencies that failed the validation
  • Sensors
    • Checked Sensor Count: Number of sensors included in the validation
    • Failed Sensor Count: Number of sensors that failed the validation
  • VLAN
    • Total Link Count: Number of links included in the validation
    • Failed Link Count: Number of links that failed the validation
Table ActionsSelect, export, or filter the list. Refer to Table Settings.

View Network Health Summary

Overall network health is based on successful validation results. The summary includes the percentage of successful results, a trend indicator, and a distribution of the validation results.

To view a summary of your network health, open the small Network Health card.

In this example, the overall health is relatively good, but improving compared to recent status. Refer to the next section for viewing the key health metrics.

View Key Metrics of Network Health

Overall network health is a calculated average of several key health metrics: System, Network Services, and Interface health.

To view these key metrics, open the medium Network Health card. Each metric is shown with the the percentage of successful validations, a trend indicator, and a distribution of the validation results.

In this example, the health of each of the system and network services are good, but interface health is on the lower side. While it is improving, you might choose to dig further if it does not continue to improve. Refer to the following section for additional details.

View System Health

The system health is a calculated average of the NetQ Agent, Cumulus Linux license, and sensor health metrics. In all cases, validation is performed on the agents and licenses. If you are monitoring platform sensors, the calculation includes these as well. You can view the overall health of the system from the medium Network Health card and information about each component from the System Health tab on the large Network Health card.

To view information about each system component:

  1. Open the large Network Health card.

  2. Hover over the card and click .

    The health of each system protocol or service is represented on the left side of the card by a distribution of the health score, a trend indicator, and a percentage of successful results. The right side of the card provides a listing of devices running the services.

View Devices with the Most Issues

It is useful to know which devices are experiencing the most issues with their system services in general, as this can help focus troubleshooting efforts toward selected devices versus the service itself. To view devices with the most issues, select Most Failures from the filter above the table on the right.

Devices with the highest number of issues are listed at the top. Scroll down to view those with fewer issues. To further investigate the critical devices, open the Event cards and filter on the indicated switches.

View Devices with Recent Issues

It is useful to know which devices are experiencing the most issues with their system services right now, as this can help focus troubleshooting efforts toward selected devices versus the service itself. To view devices with recent issues, select Recent Failures from the filter above the table on the right. Devices with the highest number of issues are listed at the top. Scroll down to view those with fewer issues. To further investigate the critical devices, open the Switch card or the Event cards and filter on the indicated switches.

Filter Results by System Service

You can focus the data in the table on the right, by unselecting one or more services. Click the checkbox next to the service you want to remove from the data. In this example, we have unchecked Licenses.

This removes the checkbox next to the associated chart and grays out the title of the chart, temporarily removing the data related to that service from the table. Add it back by hovering over the chart and clicking the checkbox that appears.

View Details of a Particular System Service

From the System Health tab on the large Network Health card you can click on a chart to take you to the full-screen card pre-focused on that service data.

View Network Services Health

The network services health is a calculated average of the individual network protocol and services health metrics. In all cases, validation is performed on NTP. If you are running BGP, CLAG, EVPN, LNV, OSPF, or VXLAN protocols the calculation includes these as well. You can view the overall health of network services from the medium Network Health card and information about individual services from the Network Service Health tab on the large Network Health card.

To view information about each network protocol or service:

  1. Open the large Network Health card.

  2. Hover over the card and click .

The health of each network protocol or service is represented on the left side of the card by a distribution of the health score, a trend indicator, and a percentage of successful results. The right side of the card provides a listing of devices running the services.

If you have more services running than fit naturally into the chart area, a scroll bar appears for you to access their data. Use the scroll bars on the table to view more columns and rows.

View Devices with the Most Issues

It is useful to know which devices are experiencing the most issues with their system services in general, as this can help focus troubleshooting efforts toward selected devices versus the protocol or service. To view devices with the most issues, open the large Network Health card, then click the Network Services tab. Select Most Failures from the dropdown above the table on the right.

Devices with the highest number of issues are listed at the top. Scroll down to view those with fewer issues. To further investigate the critical devices, open the Event cards and filter on the indicated switches.

View Devices with Recent Issues

It is useful to know which devices are experiencing the most issues with their network services right now, as this can help focus troubleshooting efforts toward selected devices versus the protocol or service. To view devices with the most issues, open the large Network Health card. Select Recent Failures from the dropdown above the table on the right. Devices with the highest number of issues are listed at the top. Scroll down to view those with fewer issues. To further investigate the critical devices, open the Switch card or the Event cards and filter on the indicated switches.

Filter Results by Network Service

You can focus the data in the table on the right, by unselecting one or more services. Click the checkbox next to the service you want to remove. In this example, we removed NTP and are in the process of removing OSPF.

This grays out the chart title and removes the associated checkbox, temporarily removing the data related to that service from the table.

View Details of a Particular Network Service

From the Network Service Health tab on the large Network Health card you can click on a chart to take you to the full-screen card pre-focused on that service data.

View Interfaces Health

The interface health is a calculated average of the interfaces, VLAN, and MTU health metrics. You can view the overall health of interfaces from the medium Interface Health card and information about each component from the Interface Health tab on the large Interface Health card.

To view information about each system component:

  1. Open the large Network Health card.

  2. Hover over the card and click .

    The health of each interface protocol or service is represented on the left side of the card by a distribution of the health score, a trend indicator, and a percentage of successful results. The right side of the card provides a listing of devices running the services.

View Devices with the Most Issues

It is useful to know which devices are experiencing the most issues with their interfaces in general, as this can help focus troubleshooting efforts toward selected devices versus the service itself. To view devices with the most issues, select Most Failures from the filter above the table on the right.

Devices with the highest number of issues are listed at the top. Scroll down to view those with fewer issues. To further investigate the critical devices, open the Event cards and filter on the indicated switches.

View Devices with Recent Issues

It is useful to know which devices are experiencing the most issues with their network services right now, as this can help focus troubleshooting efforts toward selected devices versus the service itself. To view devices with recent issues, select Recent Failures from the filter above the table on the right. Devices with the highest number of issues are listed at the top. Scroll down to view those with fewer issues. To further investigate the critical devices, open the Switch card or the Event cards and filter on the indicated switches.

Filter Results by Interface Service

You can focus the data in the table on the right, by unselecting one or more services. Click the checkbox next to the interface item you want to remove from the data. In this example, we have unchecked MTU.

This removes the checkbox next to the associated chart and grays out the title of the chart, temporarily removing the data related to that service from the table. Add it back by hovering over the chart and clicking the checkbox that appears.

View Details of a Particular Interface Service

From the Interface Health tab on the large Network Health card you can click on a chart to take you to the full-screen card pre-focused on that service data.

View All Network Protocol and Service Validation Results

The Network Health card workflow enables you to view all of the results of all validations run on the network protocols and services during the designated time period.

To view all the validation results:

  1. Open the full screen Network Health card.

  2. Click <network protocol or service name> tab in the navigation panel.

  3. Look for patterns in the data. For example, when did nodes, sessions, links, ports, or devices start failing validation? Was it at a specific time? Was it when you starting running the service on more nodes? Did sessions fail, but nodes were fine?

Where to go next depends on what data you see, but a few options include:

Validate Network Protocol and Service Operations

With the NetQ UI, you can validate the operation of the network protocols and services running in your network either on demand or on a scheduled basis. There are three card workflows to perform this validation: one for creating the validation request (either on-demand or scheduled) and two validation results (one for on-demand and one for scheduled).

This release supports validation of the following network protocols and services: Agents, BGP, CLAG, EVPN, Interfaces, License, MTU, NTP, OSPF, Sensors, VLAN, and VXLAN.

For a more general understanding of how well your network is operating, refer to the Monitor Network Health topic.

Create Validation Requests

The Validation Request card workflow is used to create on-demand validation requests to evaluate the health of your network protocols and services.

Validation Request Card Workflow

The small Validation Request card displays:

ItemDescription
Indicates a validation request
Validation

Select a scheduled request to run that request on-demand. A default validation is provided for each supported network protocol and service, which runs a network-wide validation check. These validations run every 60 minutes, but you may run them on-demand at any time.

Note: No new requests can be configured from this size card.

GOStart the validation request. The corresponding On-demand Validation Result cards are opened on your workbench, one per protocol and service.

The medium Validation Request card displays:

ItemDescription
Indicates a validation request
TitleValidation Request
Validation

Select a scheduled request to run that request on-demand. A default validation is provided for each supported network protocol and service, which runs a network-wide validation check. These validations run every 60 minutes, but you may run them on-demand at any time.

Note: No new requests can be configured from this size card.

ProtocolsThe protocols included in a selected validation request are listed here.
ScheduleFor a selected scheduled validation, the schedule and the time of the last run are displayed.
Start the validation requestRun Now

The large Validation Request card displays:

ItemDescription
Indicates a validation request
TitleValidation Request
ValidationDepending on user intent, this field is used to:
  • Select a scheduled request to run that request on-demand. A default validation is provided for each supported network protocol and service, which runs a network-wide validation check. These validations run every 60 minutes, but you may run them on-demand at any time.
  • Leave as is to create a new scheduled validation request
  • Select a scheduled request to modify
ProtocolsFor a selected scheduled validation, the protocols included in a validation request are listed here. For new on-demand or scheduled validations, click these to include them in the validation.
Schedule:For a selected scheduled validation, the schedule and the time of the last run are displayed. For new scheduled validations, select the frequency and starting date and time.
  • Run Every: Select how often to run the request. Choose from 30 minutes, 1, 3, 6, or 12 hours, or 1 day.
  • Starting: Select the date and time to start the first request in the series
  • Last Run: Timestamp of when the selected validation was started
Scheduled ValidationsCount of scheduled validations that are currently scheduled compared to the maximum of 15 allowed
Run NowStart the validation request
UpdateWhen changes are made to a selected validation request, Update becomes available so that you can save your changes.

Be aware, that if you update a previously saved validation request, the historical data collected will no longer match the data results of future runs of the request. If your intention is to leave this request unchanged and create a new request, click Save As New instead.

Save As NewWhen changes are made to a previously saved validation request, Save As New becomes available so that you can save the modified request as a new request.

The full screen Validation Request card displays all scheduled validation requests.

ItemDescription
TitleValidation Request
Closes full screen card and returns to workbench
Default TimeNo time period is displayed for this card as each validation request has its own time relationship.
Displays data refresh status. Click to pause data refresh. Click to resume data refresh. Current refresh rate is visible by hovering over icon.
ResultsNumber of results found for the selected tab
Validation RequestsDisplays all scheduled validation requests. By default, the requests list is sorted by the date and time that it was originally created (Created At). This tab provides the following additional data about each request:
  • Name: Text identifier of the validation
  • Type: Name of network protocols and/or services included in the validation
  • Start Time: Data and time that the validation request was run
  • Last Modified: Date and time of the most recent change made to the validation request
  • Cadence (Min): How often, in minutes, the validation is scheduled to run. This is empty for new on-demand requests.
  • Is Active: Indicates whether the request is currently running according to its schedule (true) or it is not running (false)
Table ActionsSelect, export, or filter the list. Refer to Table Settings.

Create On-demand and Scheduled Validation Requests

There are several types of validation requests that a user can make. Each has a slightly different flow through the Validation Request card, and is therefore described separately. The types are based on the intent of the request:

Run an Existing Scheduled Validation Request On Demand

You may find that although you have a validation scheduled to run at a later time, you would like to run it now.

To run a scheduled validation now:

  1. Open either the small, medium, or large Validation Request card.

  2. Select the validation from the Validation dropdown list.

  3. Click Go or Run Now.
    The associated Validation Result card is opened on your workbench. Refer to View On-demand Validation Results.

Create a New On-demand Validation Request

When you want to validate the operation of one or more network protocols and services right now, you can create and run an on-demand validation request using the large Validation Request card.

To create and run a request for a single protocol or service:

  1. Open the small, medium or large Validation Request card.

  2. Select the validation from the Validation dropdown list.

  3. Click Go or Run Now.
    The associated Validation Result card is opened on your workbench. Refer to View On-demand Validation Results.

To create and run a request for more than one protocol and/or service:

  1. Open the large Validation Request card.

  2. Click the names of the protocols and services you want to validate. We selected BGP and EVPN in this example.

  3. Click Run Now to start the validation.
    The associated on-demand validation result cards (one per protocol or service selected) are opened on your current workbench. Refer to View On-demand Validation Results.

Create a New Scheduled Validation Request

When you want to see validation results on a regular basis, it is useful to configure a scheduled validation request to avoid re-creating the request each time.

To create and run a new scheduled validation:

  1. Open the large Validation Request card.

  2. Select the protocols and/or services you want to include in the validation. In this example we have chosen the Agents and NTP services.

  3. Enter the schedule frequency (30 min, 1 hour, 3 hours, 6 hours, 12 hours, or 1 day) by selecting it from the Run every list. Default is hourly.

  4. Select the time to start the validation runs, by clicking in the Starting field. Select a day and click Next, then select the starting time and click OK.

  5. Verify the selections were made correctly.

  6. Click Save As New.

  7. Enter a name for the validation.

    Spaces and special characters are not allowed in validation request names.

  8. Click Save.

The validation can now be selected from the Validation listing (on the small, medium or large size card) and run immediately using Run Now, or you can wait for it to run the first time according to the schedule you specified. Refer to View Scheduled Validation Results. Note that the number of scheduled validations is now two (2).

Modify an Existing Scheduled Validation Request

At some point you might want to change the schedule or validation types that are specified in a scheduled validation request.

When you update a scheduled request, the results for all future runs of the validation will be different than the results of previous runs of the validation.

To modify a scheduled validation:

  1. Open the large Validation Request card.
  2. Select the validation from the Validation dropdown list.
  3. Edit the schedule or validation types.
  4. Click Update.

The validation can now be selected from the Validation listing (on the small, medium or large size card) and run immediately using Run Now, or you can wait for it to run the first time according to the schedule you specified. Refer to View Scheduled Validation Results.

View On-demand Validation Results

The On-demand Validation Result card workflow enables you to view the results of on-demand validation requests. When a request has started processing, the associated medium Validation Result card is displayed on your workbench. When multiple network protocols or services are included in a validation, a validation result card is opened for each protocol and service.

On-Demand Validation Result Card Workflow

The small Validation Result card displays:

ItemDescription
Indicates an on-demand validation result
TitleOn-demand Result <Network Protocol or Service Name> Validation
TimestampDate and time the validation was completed
, Status of the validation job, where:
  • Good: Job ran successfully. One or more warnings may have occurred during the run.
  • Failed: Job encountered errors which prevented the job from completing, or job ran successfully, but errors occurred during the run.

The medium Validation Result card displays:

ItemDescription
Indicates an on-demand validation result
TitleOn-demand Validation Result | <Network Protocol or Service Name>
TimestampDate and time the validation was completed
, , Status of the validation job, where:
  • Good: Job ran successfully.
  • Warning: Job encountered issues, but it did complete its run.
  • Failed: Job encountered errors which prevented the job from completing.
Devices TestedChart with the total number of devices included in the validation and the distribution of the results.
  • Pass: Number of devices tested that had successful results
  • Warn: Number of devices tested that had successful results, but also had at least one warning event
  • Fail: Number of devices tested that had one or more protocol or service failures

Hover over chart to view the number of devices and the percentage of all tested devices for each result category.

Sessions Tested

For BGP, chart with total number of BGP sessions included in the validation and the distribution of the overall results.

For EVPN, chart with total number of BGP sessions included in the validation and the distribution of the overall results.

For Interfaces, chart with total number of ports included in the validation and the distribution of the overall results.

In each of these charts:

  • Pass: Number of sessions or ports tested that had successful results
  • Warn: Number of sessions or ports tested that had successful results, but also had at least one warning event
  • Fail: Number of sessions or ports tested that had one or more failure events

Hover over chart to view the number of devices, sessions, or ports and the percentage of all tested devices, sessions, or ports for each result category.

This chart does not apply to other Network Protocols and Services, and thus is not displayed for those cards.

Open <Service> CardClick to open the corresponding medium Network Services card, where available. Refer to Monitor Network Performance for details about these cards and workflows.

The large Validation Result card contains two tabs.

The Summary tab displays:

ItemDescription
Indicates an on-demand validation result
TitleOn-demand Validation Result | Summary | <Network Protocol or Service Name>
DateDay and time when the validation completed
, , Status of the validation job, where:
  • Good: Job ran successfully.
  • Warning: Job encountered issues, but it did complete its run.
  • Failed: Job encountered errors which prevented the job from completing.
Devices TestedChart with the total number of devices included in the validation and the distribution of the results.
  • Pass: Number of devices tested that had successful results
  • Warn: Number of devices tested that had successful results, but also had at least one warning event
  • Fail: Number of devices tested that had one or more protocol or service failures

Hover over chart to view the number of devices and the percentage of all tested devices for each result category.

Sessions Tested

For BGP, chart with total number of BGP sessions included in the validation and the distribution of the overall results.

For EVPN, chart with total number of BGP sessions included in the validation and the distribution of the overall results.

For Interfaces, chart with total number of ports included in the validation and the distribution of the overall results.

For OSPF, chart with total number of OSPF sessions included in the validation and the distribution of the overall results.

In each of these charts:

  • Pass: Number of sessions or ports tested that had successful results
  • Warn: Number of sessions or ports tested that had successful results, but also had at least one warning event
  • Fail: Number of sessions or ports tested that had one or more failure events

Hover over chart to view the number of devices, sessions, or ports and the percentage of all tested devices, sessions, or ports for each result category.

This chart does not apply to other Network Protocols and Services, and thus is not displayed for those cards.

Open <Service> CardClick to open the corresponding medium Network Services card, when available. Refer to Monitor Network Performance for details about these cards and workflows.
Table/Filter options

When the Most Active filter option is selected, the table displays switches and hosts running the given service or protocol in decreasing order of alarm counts-devices with the largest number of warnings and failures are listed first.

When the Most Recent filter option is selected, the table displays switches and hosts running the given service or protocol sorted by timestamp, with the device with the most recent warning or failure listed first. The table provides the following additional information:

  • Hostname: User-defined name for switch or host
  • Message Type: Network protocol or service which triggered the event
  • Message: Short description of the event
  • Severity: Indication of importance of event; values in decreasing severity include critical, warning, error, info, debug
Show All ResultsClick to open the full screen card with all on-demand validation results sorted by timestamp.

The Configuration tab displays:

ItemDescription
Indicates an on-demand validation request configuration
TitleOn-demand Validation Result | Configuration | <Network Protocol or Service Name>
ValidationsList of network protocols or services included in the request that produced these results
ScheduleNot relevant to on-demand validation results. Value is always N/A.

The full screen Validation Result card provides a tab for all on-demand validation results.

ItemDescription
TitleValidation Results | On-demand
Closes full screen card and returns to workbench
Time periodRange of time in which the displayed data was collected; applies to all card sizes; select an alternate time period by clicking
Displays data refresh status. Click to pause data refresh. Click to resume data refresh. Current refresh rate is visible by hovering over icon.
ResultsNumber of results found for the selected tab
On-demand Validation Result | <network protocol or service>Displays all unscheduled validation results. By default, the results list is sorted by Timestamp. This tab provides the following additional data about each result:
  • Job ID: Internal identifier of the validation job that produced the given results
  • Timestamp: Date and time the validation completed
  • Type: Network protocol or service type
  • Total Node Count: Total number of nodes running the given network protocol or service
  • Checked Node Count: Number of nodes on which the validation ran
  • Failed Node Count: Number of checked nodes that had protocol or service failures
  • Rotten Node Count: Number of nodes that could not be reached during the validation
  • Unknown Node Count: Applies only to the Interfaces service. Number of nodes with unknown port states.
  • Failed Adjacent Count: Number of adjacent nodes that had protocol or service failures
  • Total Session Count: Total number of sessions running for the given network protocol or service
  • Failed Session Count: Number of sessions that had session failures
Table ActionsSelect, export, or filter the list. Refer to Table Settings.

View On-demand Validation Results

Once an on-demand validation request has completed, the results are available in the corresponding Validation Result card.

It may take a few minutes for all results to be presented if the load on the NetQ Platform is heavy at the time of the run.

To view the results:

  1. Locate the medium on-demand Validation Result card on your workbench for the protocol or service that was run.

    You can identify it by the on-demand result icon, , protocol or service name, and the date and time that it was run.

    Note: You may have more than one card open for a given protocol or service, so be sure to use the date and time on the card to ensure you are viewing the correct card.

  2. Note the total number and distribution of results for the tested devices and sessions (when appropriate). Are there many failures?

  3. Hover over the charts to view the total number of warnings or failures and what percentage of the total results that represents for both devices and sessions.

  4. Switch to the large on-demand Validation Result card.

  5. If there are a large number of device warnings or failures, view the devices with the most issues in the table on the right. By default, this table displays the Most Active devices.

  6. To view the most recent issues, select Most Recent from the filter above the table.

  7. If there are a large number of devices or sessions with warnings or failures, the protocol or service may be experiencing issues. View the health of the protocol or service as a whole by clicking Open <network service> Card when available.

  8. To view all data available for all on-demand validation results for a given protocol, switch to the full screen card.

  9. Double-click in a given result row to open details about the validation.

    From this view you can:

    • See a summary of the validation results by clicking in the banner under the title. Toggle the arrow to close the summary.

    • See detailed results of each test run to validate the protocol or service. When errors or warnings are present, the nodes and relevant detail is provided.

    • Export the data by clicking Export.

    • Return to the validation jobs list by clicking .

    You may find that comparing various results gives you a clue as to why certain devices are experiencing more warnings or failures. For example, more failures occurred between certain times or on a particular device.

View Scheduled Validation Results

The Scheduled Validation Result card workflow enables you to view the results of scheduled validation requests. When a request has completed processing, you can access the Validation Result card from the full screen Validation Request card. Each protocol and service has its own validation result card, but the content is similar on each.

Scheduled Validation Result Card Workflow Summary

The small Scheduled Validation Result card displays:

ItemDescription
Indicates a scheduled validation result
TitleScheduled Result <Network Protocol or Service Name> Validation
ResultsSummary of validation results:
  • Number of validation runs completed in the designated time period
  • Number of runs with warnings
  • Number of runs with errors
, Status of the validation job, where:
  • Pass: Job ran successfully. One or more warnings may have occurred during the run.
  • Failed: Job encountered errors which prevented the job from completing, or job ran successfully, but errors occurred during the run.

The medium Scheduled Validation Result card displays:

ItemDescription
Time periodRange of time in which the displayed data was collected; applies to all card sizes
Indicates a scheduled validation result
TitleScheduled Validation Result | <Network Protocol or Service Name>
SummarySummary of validation results:
  • Name of scheduled validation
  • Status of the validation job, where:
    • Pass: Job ran successfully. One or more warnings may have occurred during the run.
    • Failed: Job encountered errors which prevented the job from completing, or job ran successfully, but errors occurred during the run.
ChartValidation results, where:
  • Time period: Range of time in which the data on the heat map was collected
  • Heat map: A time segmented view of the results. For each time segment, the color represents the percentage of warning, passing, and failed results. Refer to Validate Network Protocol and Service Operations for details on how to interpret the results.
Open <Service> CardClick to open the corresponding medium Network Services card, when available. Refer to Monitor Network Performance for details about these cards and workflows.

The large Scheduled Validation Result card contains two tabs.

The Summary tab displays:

ItemDescription
Indicates a scheduled validation result
TitleValidation Summary (Scheduled Validation Result | <Network Protocol or Service Name>)
SummarySummary of validation results:
  • Name of scheduled validation
  • Status of the validation job, where:
    • Pass: Job ran successfully. One or more warnings may have occurred during the run.
    • Failed: Job encountered errors which prevented the job from completing, or job ran successfully, but errors occurred during the run.
  • Expand/Collapse: Expand the heat map to full width of card, collapse the heat map to the left
ChartValidation results, where:
  • Time period: Range of time in which the data on the heat map was collected
  • Heat map: A time segmented view of the results. For each time segment, the color represents the percentage of warning, passing, and failed results. Refer to Validate Network Protocol and Service Operations for details on how to interpret the results.
Open <Service> CardClick to open the corresponding medium Network Services card, when available. Refer to Monitor Network Performance for details about these cards and workflows.
Table/Filter options

When the Most Active filter option is selected, the table displays switches and hosts running the given service or protocol in decreasing order of alarm counts-devices with the largest number of warnings and failures are listed first.

When the Most Recent filter option is selected, the table displays switches and hosts running the given service or protocol sorted by timestamp, with the device with the most recent warning or failure listed first. The table provides the following additional information:

  • Hostname: User-defined name for switch or host
  • Message Type: Network protocol or service which triggered the event
  • Message: Short description of the event
  • Severity: Indication of importance of event; values in decreasing severity include critical, warning, error, info, debug
Show All ResultsClick to open the full screen card with all scheduled validation results sorted by timestamp.

The Configuration tab displays:

ItemDescription
Indicates a scheduled validation configuration
TitleConfiguration (Scheduled Validation Result | <Network Protocol or Service Name>)
NameUser-defined name for this scheduled validation
ValidationsList of validations included in the validation request that created this result
ScheduleUser-defined schedule for the validation request that created this result
Open Schedule CardOpens the large Validation Request card for editing this configuration

The full screen Scheduled Validation Result card provides tabs for all scheduled validation results for the service.

ItemDescription
TitleScheduled Validation Results | <Network Protocol or Service>
Closes full screen card and returns to workbench
Time periodRange of time in which the displayed data was collected; applies to all card sizes; select an alternate time period by clicking
Displays data refresh status. Click to pause data refresh. Click to resume data refresh. Current refresh rate is visible by hovering over icon.
ResultsNumber of results found for the selected tab
Scheduled Validation Result | <network protocol or service>Displays all unscheduled validation results. By default, the results list is sorted by timestamp. This tab provides the following additional data about each result:
  • Job ID: Internal identifier of the validation job that produced the given results
  • Timestamp: Date and time the validation completed
  • Type: Protocol of Service Name
  • Total Node Count: Total number of nodes running the given network protocol or service
  • Checked Node Count: Number of nodes on which the validation ran
  • Failed Node Count: Number of checked nodes that had protocol or service failures
  • Rotten Node Count: Number of nodes that could not be reached during the validation
  • Unknown Node Count: Applies only to the Interfaces service. Number of nodes with unknown port states.
  • Failed Adjacent Count: Number of adjacent nodes that had protocol or service failures
  • Total Session Count: Total number of sessions running for the given network protocol or service
  • Failed Session Count: Number of sessions that had session failures
Table ActionsSelect, export, or filter the list. Refer to Table Settings.

Granularity of Data Shown Based on Time Period

On the medium and large Validation Result cards, the status of the runs is represented in heat maps stacked vertically; one for passing runs, one for runs with warnings, and one for runs with failures. Depending on the time period of data on the card, the number of smaller time blocks used to indicate the status varies. A vertical stack of time blocks, one from each map, includes the results from all checks during that time. The results are shown by how saturated the color is for each block. If all validations during that time period pass, then the middle block is 100% saturated (white) and the warning and failure blocks are zero % saturated (gray). As warnings and errors increase in saturation, the passing block is proportionally reduced in saturation. An example heat map for a time period of 24 hours is shown here with the most common time periods in the table showing the resulting time blocks and regions.

Time PeriodNumber of RunsNumber Time BlocksAmount of Time in Each Block
6 hours1861 hour
12 hours36121 hour
24 hours72241 hour
1 week50471 day
1 month2,086301 day
1 quarter7,000131 week

View Scheduled Validation Results

Once a scheduled validation request has completed, the results are available in the corresponding Validation Result card.

To view the results:

  1. Open the full size Validation Request card to view all scheduled validations.

  2. Select the validation results you want to view by clicking in the first column of the result and clicking the check box.

  3. On the Edit Menu that appears at the bottom of the window, click (Open Cards). This opens the medium Scheduled Validation Results card(s) for the selected items.

  4. Note the distribution of results. Are there many failures? Are they concentrated together in time? Has the protocol or service recovered after the failures?

  5. Hover over the heat maps to view the status numbers and what percentage of the total results that represents for a given region. The tooltip also shows the number of devices included in the validation and the number with warnings and/or failures. This is useful when you see the failures occurring on a small set of devices, as it might point to an issue with the devices rather than the network service.

  6. Optionally, click Open <network service> Card link to open the medium individual Network Services card. Your current card is not closed.

  7. Switch to the large Scheduled Validation card.

  8. Click to expand the chart.

  9. Collapse the heat map by clicking .

  10. If there are a large number of warnings or failures, view the devices with the most issues by clicking Most Active in the filter above the table. This might help narrow the failures down to a particular device or small set of devices that you can investigate further.

  11. Select the Most Recent filter above the table to see the events that have occurred in the near past at the top of the list.

  12. Optionally, view the health of the protocol or service as a whole by clicking Open <network service> Card (when available).

  13. You can view the configuration of the request that produced the results shown on this card workflow, by hovering over the card and clicking . If you want to change the configuration, click Edit Config to open the large Validation Request card, pre-populated with the current configuration. Follow the instructions in Modify an Existing Scheduled Validation Request to make your changes.

  14. To view all data available for all scheduled validation results for the given protocol or service, click Show All Results or switch to the full screen card.

  15. Look for changes and patterns in the results. Scroll to the right. Are there more failed sessions or nodes during one or more validations?

  16. Double-click in a given result row to open details about the validation.

    From this view you can:

    • See a summary of the validation results by clicking in the banner under the title. Toggle the arrow to close the summary.

    • See detailed results of each test run to validate the protocol or service. When errors or warnings are present, the nodes and relevant detail is provided.

    • Export the data by clicking Export.

    • Return to the validation jobs list by clicking .

    You may find that comparing various results gives you a clue as to why certain devices are experiencing more warnings or failures. For example, more failures occurred between certain times or on a particular device.

Monitor Network Inventory

With NetQ, a network administrator can monitor both the switch hardware and its operating system for misconfigurations or misbehaving services. The Devices Inventory card workflow provides a view into the switches and hosts installed in your network and their various hardware and software components. The workflow contains a small card with a count of each device type in your network, a medium card displaying the operating systems running on each set of devices, large cards with component information statistics, and full-screen cards displaying tables with attributes of all switches and all hosts in your network.

The Devices Inventory card workflow helps answer questions such as:

For monitoring inventory and performance on a switch-by-switch basis, refer to Monitor Switches.

Devices Inventory Card Workflow Summary

The small Devices Inventory card displays:

ItemDescription
Indicates data is for device inventory
Total number of switches in inventory during the designated time period
Total number of hosts in inventory during the designated time period

The medium Devices Inventory card displays:

ItemDescription
Indicates data is for device inventory
TitleInventory | Devices
Total number of switches in inventory during the designated time period
Total number of hosts in inventory during the designated time period
ChartsDistribution of operating systems deployed on switches and hosts, respectively

The large Devices Inventory card has one tab.

The Switches tab displays:

ItemDescription
Time periodAlways Now for inventory by default
Indicates data is for device inventory
TitleInventory | Devices
Total number of switches in inventory during the designated time period
Link to full screen listing of all switches
ComponentSwitch components monitored-ASIC, Operating System (OS), Cumulus Linux license, NetQ Agent version, and Platform
Distribution chartsDistribution of switch components across the network
UniqueNumber of unique items of each component type. For example, for License, you might have CL 2.7.2 and CL 2.7.4, giving you a unique count of two.

The full screen Devices Inventory card provides tabs for all switches and all hosts.

ItemDescription
TitleInventory | Devices | Switches
Closes full screen card and returns to workbench
Time periodTime period does not apply to the Inventory cards. This is always Default Time.
Displays data refresh status. Click to pause data refresh. Click to resume data refresh. Current refresh rate is visible by hovering over icon.
ResultsNumber of results found for the selected tab
All Switches and All Hosts tabsDisplays all monitored switches and hosts in your network. By default, the device list is sorted by hostname. These tabs provide the following additional data about each device:
  • Agent
    • State: Indicates communication state of the NetQ Agent on a given device. Values include Fresh (heard from recently) and Rotten (not heard from recently).
    • Version: Software version number of the NetQ Agent on a given device. This should match the version number of the NetQ software loaded on your server or appliance; for example, 2.1.0.
  • ASIC
    • Core BW: Maximum sustained/rated bandwidth. Example values include 2.0 T and 720 G.
    • Model: Chip family. Example values include Tomahawk, Trident, and Spectrum.
    • Model Id: Identifier of networking ASIC model. Example values include BCM56960 and BCM56854.
    • Ports: Indicates port configuration of the switch. Example values include 32 x 100G-QSFP28, 48 x 10G-SFP+, and 6 x 40G-QSFP+.
    • Vendor: Manufacturer of the chip. Example values include Broadcom and Mellanox.
  • CPU
    • Arch: Microprocessor architecture type. Values include x86_64 (Intel), ARMv7 (AMD), and PowerPC.
    • Max Freq: Highest rated frequency for CPU. Example values include 2.40 GHz and 1.74 GHz.
    • Model: Chip family. Example values include Intel Atom C2538 and Intel Atom C2338.
    • Nos: Number of cores. Example values include 2, 4, and 8.
  • Disk Total Size: Total amount of storage space in physical disks (not total available). Example values: 10 GB, 20 GB, 30 GB.
  • License State: Indicator of validity. Values include ok and bad.
  • Memory Size: Total amount of local RAM. Example values include 8192 MB and 2048 MB.
  • OS
    • Vendor: Operating System manufacturer. Values include Cumulus Networks, RedHat, Ubuntu, and CentOS.
    • Version: Software version number of the OS. Example values include 3.7.3, 2.5.x, 16.04, 7.1.
    • Version Id: Identifier of the OS version. For Cumulus, this is the same as the Version (3.7.x).
  • Platform
    • Date: Date and time the platform was manufactured. Example values include 7/12/18 and 10/29/2015.
    • MAC: System MAC address. Example value: 17:01:AB:EE:C3:F5.
    • Model: Manufacturer's model name. Examples values include AS7712-32X and S4048-ON.
    • Number: Manufacturer part number. Examples values include FP3ZZ7632014A, 0J09D3.
    • Revision: Release version of the platform
    • Series: Manufacturer serial number. Example values include D2060B2F044919GD000060, CN046MRJCES0085E0004.
    • Vendor: Manufacturer of the platform. Example values include Cumulus Express, Dell, EdgeCore, Lenovo, Mellanox.
  • Time: Date and time the data was collected from device.

View the Number of Each Device Type in Your Network

You can view the number of switches and hosts deployed in your network. As you grow your network this can be useful for validating that devices have been added as scheduled.

To view the quantity of devices in your network, open the small Devices Inventory card.

Chassis are not monitored in this release, so an N/A (not applicable) value is displayed for these devices, even if you have chassis in your network.

View Which Operating Systems Are Running on Your Network Devices

You can view the distribution of operating systems running on your switches and hosts. This is useful for verifying which versions of the OS are deployed and for upgrade planning. It also provides a view into the relative dependence on a given OS in your network.

To view the OS distribution, open the medium Devices Inventory card if it is not already on your workbench.

View Switch Components

To view switch components, open the large Devices Inventory card. By default the Switches tab is shown displaying the total number of switches, ASIC vendor, OS versions, license status, NetQ Agent versions, and specific platforms deployed on all of your switches.

Highlight a Selected Component Type

You can hover over any of the segments in a component distribution chart to highlight a specific type of the given component. When you hover, a tooltip appears displaying:

Additionally, sympathetic highlighting is used to show the related component types relevant to the highlighted segment and the number of unique component types associated with this type (shown in blue here).

Focus on a Selected Component Type

To dig deeper on a particular component type, you can filter the card data by that type. In this procedure, the result of filtering on the OS is shown.

To view component type data:

  1. Click a segment of the component distribution charts.

  2. Select the first option from the popup, Filter <component name>. The card data is filtered to show only the components associated with selected component type. A filter tag appears next to the total number of switches indicating the filter criteria.

  3. Hover over the segments to view the related components.

  4. To return to the full complement of components, click the in the filter tag.

While the Device Inventory cards provide a network-wide view, you may want to see more detail about your switch inventory. This can be found in the Switches Inventory card workflow. To open that workflow, click the Switch Inventory button at the top right of the Switches card.

View All Switches

You can view all stored attributes for all switches in your network. To view all switch details, open the full screen Devices Inventory card and click the All Switches tab in the navigation panel.

To return to your workbench, click in the top right corner of the card.

View All Hosts

You can view all stored attributes for all hosts in your network. To view all hosts details, open the full screen Devices Inventory card and click the All Hosts tab in the navigation panel.

To return to your workbench, click in the top right corner of the card.

Monitor the BGP Service

The Cumulus NetQ UI enables operators to view the health of the BGP service on a network-wide and a per session basis, giving greater insight into all aspects of the service. This is accomplished through two card workflows, one for the service and one for the session. They are described separately here.

Monitor the BGP Service (All Sessions)

With NetQ, you can monitor the number of nodes running the BGP service, view switches with the most established and unestablished BGP sessions, and view alarms triggered by the BGP service. For an overview and how to configure BGP to run in your data center network, refer to Border Gateway Protocol - BGP.

BGP Service Card Workflow

The small BGP Service card displays:

ItemDescription
Indicates data is for all sessions of a Network Service or Protocol
TitleBGP: All BGP Sessions, or the BGP Service
Total number of switches and hosts with the BGP service enabled during the designated time period
Total number of BGP-related alarms received during the designated time period
ChartDistribution of new BGP-related alarms received during the designated time period

The medium BGP Service card displays:

ItemDescription
Time periodRange of time in which the displayed data was collected; applies to all card sizes
Indicates data is for all sessions of a Network Service or Protocol
TitleNetwork Services | All BGP Sessions
Total number of switches and hosts with the BGP service enabled during the designated time period
Total number of BGP-related alarms received during the designated time period
Total Nodes Running chart

Distribution of switches and hosts with the BGP service enabled during the designated time period, and a total number of nodes running the service currently.

Note: The node count here may be different than the count in the summary bar. For example, the number of nodes running BGP last week or last month might be more or less than the number of nodes running BGP currently.

Total Open Alarms chart

Distribution of BGP-related alarms received during the designated time period, and the total number of current BGP-related alarms in the network.

Note: The alarm count here may be different than the count in the summary bar. For example, the number of new alarms received in this time period does not take into account alarms that have already been received and are still active. You might have no new alarms, but still have a total number of alarms present on the network of 10.

Total Nodes Not Est. chart

Distribution of switches and hosts with unestablished BGP sessions during the designated time period, and the total number of unestablished sessions in the network currently.

Note: The node count here may be different than the count in the summary bar. For example, the number of unestablished session last week or last month might be more of less than the number of nodes with unestablished sessions currently.

The large BGP service card contains two tabs.

The Sessions Summary tab displays:

ItemDescription
Time periodRange of time in which the displayed data was collected; applies to all card sizes
Indicates data is for all sessions of a Network Service or Protocol
TitleSessions Summary (visible when you hover over card)
Total number of switches and hosts with the BGP service enabled during the designated time period
Total number of BGP-related alarms received during the designated time period
Total Nodes Running chart

Distribution of switches and hosts with the BGP service enabled during the designated time period, and a total number of nodes running the service currently.

Note: The node count here may be different than the count in the summary bar. For example, the number of nodes running BGP last week or last month might be more or less than the number of nodes running BGP currently.

Total Nodes Not Est. chart

Distribution of switches and hosts with unestablished BGP sessions during the designated time period, and the total number of unestablished sessions in the network currently.

Note: The node count here may be different than the count in the summary bar. For example, the number of unestablished session last week or last month might be more of less than the number of nodes with unestablished sessions currently.

Table/Filter options

When the Switches with Most Sessions filter option is selected, the table displays the switches and hosts running BGP sessions in decreasing order of session count-devices with the largest number of sessions are listed first

When the Switches with Most Unestablished Sessions filter option is selected, the table switches and hosts running BGP sessions in decreasing order of unestablished sessions-devices with the largest number of unestablished sessions are listed first

Show All SessionsLink to view data for all BGP sessions in the full screen card

The Alarms tab displays:

ItemDescription
Time periodRange of time in which the displayed data was collected; applies to all card sizes
(in header)Indicates data is for all alarms for all BGP sessions
TitleAlarms (visible when you hover over card)
Total number of switches and hosts with the BGP service enabled during the designated time period
(in summary bar)Total number of BGP-related alarms received during the designated time period
Total Alarms chart

Distribution of BGP-related alarms received during the designated time period, and the total number of current BGP-related alarms in the network.

Note: The alarm count here may be different than the count in the summary bar. For example, the number of new alarms received in this time period does not take into account alarms that have already been received and are still active. You might have no new alarms, but still have a total number of alarms present on the network of 10.

Table/Filter optionsWhen the selected filter option is Switches with Most Alarms, the table displays switches and hosts running BGP in decreasing order of the count of alarms-devices with the largest number of BGP alarms are listed first
Show All SessionsLink to view data for all BGP sessions in the full screen card

The full screen BGP Service card provides tabs for all switches, all sessions, and all alarms.

ItemDescription
TitleNetwork Services | BGP
Closes full screen card and returns to workbench
Time periodRange of time in which the displayed data was collected; applies to all card sizes; select an alternate time period by clicking
Displays data refresh status. Click to pause data refresh. Click to resume data refresh. Current refresh rate is visible by hovering over icon.
ResultsNumber of results found for the selected tab
All Switches tabDisplays all switches and hosts running the BGP service. By default, the device list is sorted by hostname. This tab provides the following additional data about each device:
  • Agent
    • State: Indicates communication state of the NetQ Agent on a given device. Values include Fresh (heard from recently) and Rotten (not heard from recently).
    • Version: Software version number of the NetQ Agent on a given device. This should match the version number of the NetQ software loaded on your server or appliance; for example, 2.2.0.
  • ASIC
    • Core BW: Maximum sustained/rated bandwidth. Example values include 2.0 T and 720 G.
    • Model: Chip family. Example values include Tomahawk, Trident, and Spectrum.
    • Model Id: Identifier of networking ASIC model. Example values include BCM56960 and BCM56854.
    • Ports: Indicates port configuration of the switch. Example values include 32 x 100G-QSFP28, 48 x 10G-SFP+, and 6 x 40G-QSFP+.
    • Vendor: Manufacturer of the chip. Example values include Broadcom and Mellanox.
  • CPU
    • Arch: Microprocessor architecture type. Values include x86_64 (Intel), ARMv7 (AMD), and PowerPC.
    • Max Freq: Highest rated frequency for CPU. Example values include 2.40 GHz and 1.74 GHz.
    • Model: Chip family. Example values include Intel Atom C2538 and Intel Atom C2338.
    • Nos: Number of cores. Example values include 2, 4, and 8.
  • Disk Total Size: Total amount of storage space in physical disks (not total available). Example values: 10 GB, 20 GB, 30 GB.
  • License State: Indicator of validity. Values include ok and bad.
  • Memory Size: Total amount of local RAM. Example values include 8192 MB and 2048 MB.
  • OS
    • Vendor: Operating System manufacturer. Values include Cumulus Networks, RedHat, Ubuntu, and CentOS.
    • Version: Software version number of the OS. Example values include 3.7.3, 2.5.x, 16.04, 7.1.
    • Version Id: Identifier of the OS version. For Cumulus, this is the same as the Version (3.7.x).
  • Platform
    • Date: Date and time the platform was manufactured. Example values include 7/12/18 and 10/29/2015.
    • MAC: System MAC address. Example value: 17:01:AB:EE:C3:F5.
    • Model: Manufacturer's model name. Examples values include AS7712-32X and S4048-ON.
    • Number: Manufacturer part number. Examples values include FP3ZZ7632014A, 0J09D3.
    • Revision: Release version of the platform
    • Series: Manufacturer serial number. Example values include D2060B2F044919GD000060, CN046MRJCES0085E0004.
    • Vendor: Manufacturer of the platform. Example values include Cumulus Express, Dell, EdgeCore, Lenovo, Mellanox.
  • Time: Date and time the data was collected from device.
All Sessions tabDisplays all BGP sessions network-wide. By default, the session list is sorted by hostname. This tab provides the following additional data about each session:
  • ASN: Autonomous System Number, identifier for a collection of IP networks and routers. Example values include 633284,655435.
  • Conn Dropped: Number of dropped connections for a given session
  • Conn Estd: Number of connections established for a given session
  • DB State: Session state of DB
  • Evpn Pfx Rcvd: Address prefix received for EVPN traffic. Examples include 115, 35.
  • Ipv4, and Ipv6 Pfx Rcvd: Address prefix received for IPv4 or IPv6 traffic. Examples include 31, 14, 12.
  • Last Reset Time: Date and time at which the session was last established or reset
  • Objid: Object identifier for service
  • OPID: Customer identifier. This is always zero.
  • Peer
    • ASN: Autonomous System Number for peer device
    • Hostname: User-defined name for peer device
    • Name: Interface name or hostname of peer device
    • Router Id: IP address of router with access to the peer device
  • Reason: Text describing the cause of, or trigger for, an event
  • Rx and Tx Families: Address families supported for the receive and transmit session channels. Values include ipv4, ipv6, and evpn.
  • State: Current state of the session. Values include Established and NotEstd (not established).
  • Timestamp: Date and time session was started, deleted, updated or marked dead (device is down)
  • Upd8 Rx: Count of protocol messages received
  • Upd8 Tx: Count of protocol messages transmitted
  • Up Time: Number of seconds the session has been established, in EPOCH notation. Example: 1550147910000
  • Vrf: Name of the Virtual Route Forwarding interface. Examples: default, mgmt, DataVrf1081
  • Vrfid: Integer identifier of the VRF interface when used. Examples: 14, 25, 37
All Alarms tabDisplays all BGP events network-wide. By default, the event list is sorted by time, with the most recent events listed first. The tab provides the following additional data about each event:
  • Source: Hostname of network device that generated the event
  • Message: Text description of a BGP-related event. Example: BGP session with peer tor-1 swp7 vrf default state changed from failed to Established
  • Type: Network protocol or service generating the event. This always has a value of bgp in this card workflow.
  • Severity: Importance of the event. Values include critical, warning, info, and debug.
Table ActionsSelect, export, or filter the list. Refer to Table Settings.

View Service Status Summary

A summary of the BGP service is available from the Network Services card workflow, including the number of nodes running the service, the number of BGP-related alarms, and a distribution of those alarms.

To view the summary, open the small BGP Service card.

For more detail, select a different size BGP Service card.

View the Distribution of Sessions and Alarms

It is useful to know the number of network nodes running the BGP protocol over a period of time, as it gives you insight into the amount of traffic associated with and breadth of use of the protocol. It is also useful to compare the number of nodes running BGP with unestablished sessions with the alarms present at the same time to determine if there is any correlation between the issues and the ability to establish a BGP session.

To view these distributions, open the medium BGP Service card.

If a visual correlation is apparent, you can dig a little deeper with the large BGP Service card tabs.

View Devices with the Most BGP Sessions

You can view the load from BGP on your switches and hosts using the large Network Services card. This data enables you to see which switches are handling the most BGP traffic currently, validate that is what is expected based on your network design, and compare that with data from an earlier time to look for any differences.

To view switches and hosts with the most BGP sessions:

  1. Open the large BGP Service card.

  2. Select Switches With Most Sessions from the filter above the table.

    The table content is sorted by this characteristic, listing nodes running the most BGP sessions at the top. Scroll down to view those with the fewest sessions.

To compare this data with the same data at a previous time:

  1. Open another large BGP Service card.

  2. Move the new card next to the original card if needed.

  3. Change the time period for the data on the new card by hovering over the card and clicking .

  4. Select the time period that you want to compare with the original time. We chose Past Week for this example.

    You can now see whether there are significant differences between this time and the original time. If the changes are unexpected, you can investigate further by looking at another time frame, determining if more nodes are now running BGP than previously, looking for changes in the topology, and so forth.

View Devices with the Most Unestablished BGP Sessions

You can identify switches and hosts that are experiencing difficulties establishing BGP sessions; both currently and in the past.

To view switches with the most unestablished BGP sessions:

  1. Open the large BGP Service card.

  2. Select Switches with Most Unestablished Sessions from the filter above the table.

    The table content is sorted by this characteristic, listing nodes with the most unestablished BGP sessions at the top. Scroll down to view those with the fewest unestablished sessions.

Where to go next depends on what data you see, but a couple of options include:

Switches or hosts experiencing a large number of BGP alarms may indicate a configuration or performance issue that needs further investigation. You can view the devices sorted by the number of BGP alarms and then use the Switches card workflow or the Alarms card workflow to gather more information about possible causes for the alarms.

To view switches with the most BGP alarms:

  1. Open the large BGP Service card.

  2. Hover over the header and click .

  3. Select Switches with Most Alarms from the filter above the table.

    The table content is sorted by this characteristic, listing nodes with the most BGP alarms at the top. Scroll down to view those with the fewest alarms.

Where to go next depends on what data you see, but a few options include:

View All BGP Events

The BGP Network Services card workflow enables you to view all of the BGP events in the designated time period.

To view all BGP events:

  1. Open the full screen BGP Service card.

  2. Click All Alarms tab in the navigation panel.

    By default, events are listed in most recent to least recent order.

Where to go next depends on what data you see, but a couple of options include:

To return to your workbench, click in the top right corner.

View Details for All Devices Running BGP

You can view all stored attributes of all switches and hosts running BGP in your network in the full screen card.

To view all device details, open the full screen BGP Service card and click the All Switches tab.

To return to your workbench, click in the top right corner.

View Details for All BGP Sessions

You can view all stored attributes of all BGP sessions in your network in the full-screen card.

To view all session details, open the full screen BGP Service card and click the All Sessions tab.

To return to your workbench, click in the top right corner.

Use the icons above the table to select/deselect, filter, and export items in the list. Refer to Table Settings for more detail.

To return to original display of results, click the associated tab.

Monitor a Single BGP Session

With NetQ, you can monitor a single session of the BGP service, view session state changes, and compare with alarms occurring at the same time, as well as monitor the running BGP configuration and changes to the configuration file. For an overview and how to configure BGP to run in your data center network, refer to Border Gateway Protocol - BGP.

To access the single session cards, you must open the full screen BGP Service, click the All Sessions tab, select the desired session, then click (Open Cards).

Granularity of Data Shown Based on Time Period

On the medium and large single BGP session cards, the status of the sessions is represented in heat maps stacked vertically; one for established sessions, and one for unestablished sessions. Depending on the time period of data on the card, the number of smaller time blocks used to indicate the status varies. A vertical stack of time blocks, one from each map, includes the results from all checks during that time. The results are shown by how saturated the color is for each block. If all sessions during that time period were established for the entire time block, then the top block is 100% saturated (white) and the not established block is zero percent saturated (gray). As sessions that are not established increase in saturation, the sessions that are established block is proportionally reduced in saturation. An example heat map for a time period of 24 hours is shown here with the most common time periods in the table showing the resulting time blocks.

Time PeriodNumber of RunsNumber Time BlocksAmount of Time in Each Block
6 hours1861 hour
12 hours36121 hour
24 hours72241 hour
1 week50471 day
1 month2,086301 day
1 quarter7,000131 week

BGP Session Card Workflow Summary

The small BGP Session card displays:

ItemDescription
Indicates data is for a single session of a Network Service or Protocol
TitleBGP Session

Hostnames of the two devices in a session. Arrow points from the host to the peer.
, Current status of the session, either established or not established

The medium BGP Session card displays:

ItemDescription
Time periodRange of time in which the displayed data was collected; applies to all card sizes
Indicates data is for a single session of a Network Service or Protocol
TitleNetwork Services | BGP Session

Hostnames of the two devices in a session. Arrow points in the direction of the session.
, Current status of the session, either established or not established
Time period for chartTime period for the chart data
Session State Changes ChartHeat map of the state of the given session over the given time period. The status is sampled at a rate consistent with the time period. For example, for a 24 hour period, a status is collected every hour. Refer to Granularity of Data Shown Based on Time Period.
Peer NameInterface name on or hostname for peer device
Peer ASNAutonomous System Number for peer device
Peer Router IDIP address of router with access to the peer device
Peer HostnameUser-defined name for peer device

The large BGP Session card contains two tabs.

The Session Summary tab displays:

ItemDescription
Time periodRange of time in which the displayed data was collected; applies to all card sizes
Indicates data is for a single session of a Network Service or Protocol
TitleSession Summary (Network Services | BGP Session)
Summary bar

Hostnames of the two devices in a session.

Current status of the session-either established , or not established

Session State Changes ChartHeat map of the state of the given session over the given time period. The status is sampled at a rate consistent with the time period. For example, for a 24 hour period, a status is collected every hour. Refer to Granularity of Data Shown Based on Time Period.
Alarm Count ChartDistribution and count of BGP alarm events over the given time period.
Info Count ChartDistribution and count of BGP info events over the given time period.
Connection Drop CountNumber of times the session entered the not established state during the time period
ASNAutonomous System Number for host device
RX/TX FamiliesReceive and Transmit address types supported. Values include IPv4, IPv6, and EVPN.
Peer HostnameUser-defined name for peer device
Peer InterfaceInterface on which the session is connected
Peer ASNAutonomous System Number for peer device
Peer Router IDIP address of router with access to the peer device

The Configuration File Evolution tab displays:

ItemDescription
Time periodRange of time in which the displayed data was collected; applies to all card sizes
Indicates configuration file information for a single session of a Network Service or Protocol
Title(Network Services | BGP Session) Configuration File Evolution
Device identifiers (hostname, IP address, or MAC address) for host and peer in session. Click on to open associated device card.
, Indication of host role, primary or secondary
TimestampsWhen changes to the configuration file have occurred, the date and time are indicated. Click the time to see the changed file.
Configuration File

When File is selected, the configuration file as it was at the selected time is shown.

When Diff is selected, the configuration file at the selected time is shown on the left and the configuration file at the previous timestamp is shown on the right. Differences are highlighted.

Note: If no configuration file changes have been made, only the original file date is shown.

The full screen BGP Session card provides tabs for all BGP sessions and all events.

ItemDescription
TitleNetwork Services | BGP
Closes full screen card and returns to workbench
Time periodRange of time in which the displayed data was collected; applies to all card sizes
Displays data refresh status. Click to pause data refresh. Click to resume data refresh. Current refresh rate is visible by hovering over icon.
ResultsNumber of results found for the selected tab
All BGP Sessions tabDisplays all BGP sessions running on the host device. This tab provides the following additional data about each session:
  • ASN: Autonomous System Number, identifier for a collection of IP networks and routers. Example values include 633284,655435.
  • Conn Dropped: Number of dropped connections for a given session
  • Conn Estd: Number of connections established for a given session
  • DB State: Session state of DB
  • Evpn Pfx Rcvd: Address prefix for EVPN traffic. Examples include 115, 35.
  • Ipv4, and Ipv6 Pfx Rcvd: Address prefix for IPv4 or IPv6 traffic. Examples include 31, 14, 12.
  • Last Reset Time: Time at which the session was last established or reset
  • Objid: Object identifier for service
  • OPID: Customer identifier. This is always zero.
  • Peer
    • ASN: Autonomous System Number for peer device
    • Hostname: User-defined name for peer device
    • Name: Interface name or hostname of peer device
    • Router Id: IP address of router with access to the peer device
  • Reason: Event or cause of failure
  • Rx and Tx Families: Address families supported for the receive and transmit session channels. Values include ipv4, ipv6, and evpn.
  • State: Current state of the session. Values include Established and NotEstd (not established).
  • Timestamp: Date and time session was started, deleted, updated or marked dead (device is down)
  • Upd8 Rx: Count of protocol messages received
  • Upd8 Tx: Count of protocol messages transmitted
  • Up Time: Number of seconds the session has be established, in EPOC notation. Example: 1550147910000
  • Vrf: Name of the Virtual Route Forwarding interface. Examples: default, mgmt, DataVrf1081
  • Vrfid: Integer identifier of the VRF interface when used. Examples: 14, 25, 37
All Events tabDisplays all events network-wide. By default, the event list is sorted by time, with the most recent events listed first. The tab provides the following additional data about each event:
  • Message: Text description of a BGP-related event. Example: BGP session with peer tor-1 swp7 vrf default state changed from failed to Established
  • Source: Hostname of network device that generated the event
  • Severity: Importance of the event. Values include critical, warning, info, and debug.
  • Type: Network protocol or service generating the event. This always has a value of bgp in this card workflow.
Table ActionsSelect, export, or filter the list. Refer to Table Settings.

View Session Status Summary

A summary of the BGP session is available from the BGP Session card workflow, showing the node and its peer and current status.

To view the summary:

  1. Add the Network Services | All BGP Sessions card.

  2. Switch to the full screen card.

  3. Click the All Sessions tab.

  4. Double-click the session of interest. The full screen card closes automatically.

  5. Optionally, switch to the small BGP Session card.

View BGP Session State Changes

You can view the state of a given BGP session from the medium and large BGP Session Network Service cards. For a given time period, you can determine the stability of the BGP session between two devices. If you experienced connectivity issues at a particular time, you can use these cards to help verify the state of the session. If it was not established more than it was established, you can then investigate further into possible causes.

To view the state transitions for a given BGP session, on the medium BGP Session card:

  1. Add the Network Services | All BGP Sessions card.

  2. Switch to the full screen card.

  3. Open the large BGP Service card.

  4. Click the All Sessions tab.

  5. Double-click the session of interest. The full screen card closes automatically.

The heat map indicates the status of the session over the designated time period. In this example, the session has been established for the entire time period.

From this card, you can also view the Peer ASN, name, hostname and router id identifying the session in more detail.

To view the state transitions for a given BGP session on the large BGP Session card, follow the same steps to open the medium BGP Session card and then switch to the large card.

From this card, you can view the alarm and info event counts, Peer ASN, hostname, and router id, VRF, and Tx/Rx families identifying the session in more detail. The Connection Drop Count gives you a sense of the session performance.

View Changes to the BGP Service Configuration File

Each time a change is made to the configuration file for the BGP service, NetQ logs the change and enables you to compare it with the last version. This can be useful when you are troubleshooting potential causes for alarms or sessions losing their connections.

To view the configuration file changes:

  1. Open the large BGP Session card.

  2. Hover over the card and click to open the BGP Configuration File Evolution tab.

  3. Select the time of interest on the left; when a change may have impacted the performance. Scroll down if needed.

  4. Choose between the File view and the Diff view (selected option is dark; File by default).

    The File view displays the content of the file for you to review.

    The Diff view displays the changes between this version (on left) and the most recent version (on right) side by side. The changes are highlighted, as seen in this example.

View All BGP Session Details

You can view all stored attributes of all of the BGP sessions associated with the two devices on this card.

To view all session details, open the full screen BGP Session card, and click the All BGP Sessions tab.

To return to your workbench, click in the top right corner.

View All Events

You can view all of the alarm and info events for the two devices on this card.

To view all events, open the full screen BGP Session card, and click the All Events tab.

To return to your workbench, click in the top right corner.

Monitor the EVPN Service

The Cumulus NetQ UI enables operators to view the health of the EVPN service on a network-wide and a per session basis, giving greater insight into all aspects of the service. This is accomplished through two card workflows, one for the service and one for the session. They are described separately here.

Monitor the EVPN Service (All Sessions)

With NetQ, you can monitor the number of nodes running the EVPN service, view switches with the sessions, total number of VNIs, and alarms triggered by the EVPN service. For an overview and how to configure EVPN in your data center network, refer to Ethernet Virtual Private Network-EVPN.

EVPN Service Card Workflow Summary

The small EVPN Service card displays:

ItemDescription
Indicates data is for all sessions of a Network Service or Protocol
TitleEVPN: All EVPN Sessions, or the EVPN Service
Total number of switches and hosts with the EVPN service enabled during the designated time period
Total number of EVPN-related alarms received during the designated time period
ChartDistribution of EVPN-related alarms received during the designated time period

The medium EVPN Service card displays:

ItemDescription
Time periodRange of time in which the displayed data was collected; applies to all card sizes
Indicates data is for all sessions of a Network Service or Protocol
TitleNetwork Services | All EVPN Sessions
Total number of switches and hosts with the EVPN service enabled during the designated time period
Total number of EVPN-related alarms received during the designated time period
Total Nodes Running chart

Distribution of switches and hosts with the EVPN service enabled during the designated time period, and a total number of nodes running the service currently.

Note: The node count here may be different than the count in the summary bar. For example, the number of nodes running EVPN last week or last month might be more or less than the number of nodes running EVPN currently.

Total Open Alarms chart

Distribution of EVPN-related alarms received during the designated time period, and the total number of current EVPN-related alarms in the network.

Note: The alarm count here may be different than the count in the summary bar. For example, the number of new alarms received in this time period does not take into account alarms that have already been received and are still active. You might have no new alarms, but still have a total number of alarms present on the network of 10.

Total Sessions chartDistribution of EVPN sessions during the designated time period, and the total number of sessions running on the network currently.

The large EVPN service card contains two tabs.

The Sessions Summary tab which displays:

ItemDescription
Time periodRange of time in which the displayed data was collected; applies to all card sizes
Indicates data is for all sessions of a Network Service or Protocol
TitleSessions Summary (visible when you hover over card)
Total number of switches and hosts with the EVPN service enabled during the designated time period
Total number of EVPN-related alarms received during the designated time period
Total Nodes Running chart

Distribution of switches and hosts with the EVPN service enabled during the designated time period, and a total number of nodes running the service currently.

Note: The node count here may be different than the count in the summary bar. For example, the number of nodes running EVPN last week or last month might be more or less than the number of nodes running EVPN currently.

Total Sessions chartDistribution of EVPN sessions during the designated time period, and the total number of sessions running on the network currently.
Total L3 VNIs chartDistribution of layer 3 VXLAN Network Identifiers during this time period, and the total number of VNIs in the network currently.
Table/Filter options

When the Top Switches with Most Sessions filter is selected, the table displays devices running EVPN sessions in decreasing order of session count-devices with the largest number of sessions are listed first.

When the Switches with Most L2 EVPN filter is selected, the table displays devices running layer 2 EVPN sessions in decreasing order of session count-devices with the largest number of sessions are listed first.

When the Switches with Most L3 EVPN filter is selected, the table displays devices running layer 3 EVPN sessions in decreasing order of session count-devices with the largest number of sessions are listed first.

Show All SessionsLink to view data for all EVPN sessions network-wide in the full screen card

The Alarms tab which displays:

ItemDescription
Time periodRange of time in which the displayed data was collected; applies to all card sizes
(in header)Indicates data is for all alarms for all sessions of a Network Service or Protocol
TitleAlarms (visible when you hover over card)
Total number of switches and hosts with the EVPN service enabled during the designated time period
(in summary bar)Total number of EVPN-related alarms received during the designated time period
Total Alarms chart

Distribution of EVPN-related alarms received during the designated time period, and the total number of current BGP-related alarms in the network.

Note: The alarm count here may be different than the count in the summary bar. For example, the number of new alarms received in this time period does not take into account alarms that have already been received and are still active. You might have no new alarms, but still have a total number of alarms present on the network of 10.

Table/Filter optionsWhen the Events by Most Active Device filter is selected, the table displays devices running EVPN sessions in decreasing order of alarm count-devices with the largest number of alarms are listed first
Show All SessionsLink to view data for all EVPN sessions in the full screen card

The full screen EVPN Service card provides tabs for all switches, all sessions, all alarms.

ItemDescription
TitleNetwork Services | EVPN
Closes full screen card and returns to workbench
Time periodRange of time in which the displayed data was collected; applies to all card sizes; select an alternate time period by clicking
Displays data refresh status. Click to pause data refresh. Click to resume data refresh. Current refresh rate is visible by hovering over icon.
ResultsNumber of results found for the selected tab
All Switches tabDisplays all switches and hosts running the EVPN service. By default, the device list is sorted by hostname. This tab provides the following additional data about each device:
  • Agent
    • State: Indicates communication state of the NetQ Agent on a given device. Values include Fresh (heard from recently) and Rotten (not heard from recently).
    • Version: Software version number of the NetQ Agent on a given device. This should match the version number of the NetQ software loaded on your server or appliance; for example, 2.1.0.
  • ASIC
    • Core BW: Maximum sustained/rated bandwidth. Example values include 2.0 T and 720 G.
    • Model: Chip family. Example values include Tomahawk, Trident, and Spectrum.
    • Model Id: Identifier of networking ASIC model. Example values include BCM56960 and BCM56854.
    • Ports: Indicates port configuration of the switch. Example values include 32 x 100G-QSFP28, 48 x 10G-SFP+, and 6 x 40G-QSFP+.
    • Vendor: Manufacturer of the chip. Example values include Broadcom and Mellanox.
  • CPU
    • Arch: Microprocessor architecture type. Values include x86_64 (Intel), ARMv7 (AMD), and PowerPC.
    • Max Freq: Highest rated frequency for CPU. Example values include 2.40 GHz and 1.74 GHz.
    • Model: Chip family. Example values include Intel Atom C2538 and Intel Atom C2338.
    • Nos: Number of cores. Example values include 2, 4, and 8.
  • Disk Total Size: Total amount of storage space in physical disks (not total available). Example values: 10 GB, 20 GB, 30 GB.
  • License State: Indicator of validity. Values include ok and bad.
  • Memory Size: Total amount of local RAM. Example values include 8192 MB and 2048 MB.
  • OS
    • Vendor: Operating System manufacturer. Values include Cumulus Networks, RedHat, Ubuntu, and CentOS.
    • Version: Software version number of the OS. Example values include 3.7.3, 2.5.x, 16.04, 7.1.
    • Version Id: Identifier of the OS version. For Cumulus, this is the same as the Version (3.7.x).
  • Platform
    • Date: Date and time the platform was manufactured. Example values include 7/12/18 and 10/29/2015.
    • MAC: System MAC address. Example value: 17:01:AB:EE:C3:F5.
    • Model: Manufacturer's model name. Examples include AS7712-32X and S4048-ON.
    • Number: Manufacturer part number. Examples values include FP3ZZ7632014A, 0J09D3.
    • Revision: Release version of the platform
    • Series: Manufacturer serial number. Example values include D2060B2F044919GD000060, CN046MRJCES0085E0004.
    • Vendor: Manufacturer of the platform. Example values include Cumulus Express, Dell, EdgeCore, Lenovo, Mellanox.
  • Time: Date and time the data was collected from device.
All Sessions tabDisplays all EVPN sessions network-wide. By default, the session list is sorted by hostname. This tab provides the following additional data about each session:
  • Adv All Vni: Indicates whether the VNI state is advertising all VNIs (true) or not (false)
  • Adv Gw Ip: Indicates whether the host device is advertising the gateway IP address (true) or not (false)
  • DB State: Session state of the DB
  • Export RT: IP address and port of the export route target used in the filtering mechanism for BGP route exchange
  • Import RT: IP address and port of the import route target used in the filtering mechanism for BGP route exchange
  • In Kernel: Indicates whether the associated VNI is in the kernel (in kernel) or not (not in kernel)
  • Is L3: Indicates whether the session is part of a layer 3 configuration (true) or not (false)
  • Origin Ip: Host device's local VXLAN tunnel IP address for the EVPN instance
  • OPID: LLDP service identifier
  • Rd: Route distinguisher used in the filtering mechanism for BGP route exchange
  • Timestamp: Date and time the session was started, deleted, updated or marked as dead (device is down)
  • Vni: Name of the VNI where session is running
All Alarms tabDisplays all EVPN events network-wide. By default, the event list is sorted by time, with the most recent events listed first. The tab provides the following additional data about each event:
  • Message: Text description of a EVPN-related event. Example: VNI 3 kernel state changed from down to up
  • Source: Hostname of network device that generated the event
  • Severity: Importance of the event. Values include critical, warning, info, and debug.
  • Type: Network protocol or service generating the event. This always has a value of evpn in this card workflow.
Table ActionsSelect, export, or filter the list. Refer to Table Settings.

View Service Status Summary

A summary of the EVPN service is available from the Network Services card workflow, including the number of nodes running the service, the number of EVPN-related alarms, and a distribution of those alarms.

To view the summary, open the small EVPN Network Service card.

For more detail, select a different size EVPN Network Service card.

View the Distribution of Sessions and Alarms

It is useful to know the number of network nodes running the EVPN protocol over a period of time, as it gives you insight into the amount of traffic associated with and breadth of use of the protocol. It is also useful to compare the number of nodes running EVPN with the alarms present at the same time to determine if there is any correlation between the issues and the ability to establish an EVPN session.

To view these distributions, open the medium EVPN Service card.

If a visual correlation is apparent, you can dig a little deeper with the large EVPN Service card tabs.

View the Distribution of Layer 3 VNIs

It is useful to know the number of layer 3 VNIs, as it gives you insight into the complexity of the VXLAN.

To view this distribution, open the large EVPN Service card and view the bottom chart on the left.

View Devices with the Most EVPN Sessions

You can view the load from EVPN on your switches and hosts using the large EVPN Service card. This data enables you to see which switches are handling the most EVPN traffic currently, validate that is what is expected based on your network design, and compare that with data from an earlier time to look for any differences.

To view switches and hosts with the most EVPN sessions:

  1. Open the large EVPN Service card.

  2. Select Top Switches with Most Sessions from the filter above the table.

    The table content is sorted by this characteristic, listing nodes running the most EVPN sessions at the top. Scroll down to view those with the fewest sessions.

To compare this data with the same data at a previous time:

  1. Open another large EVPN Service card.

  2. Move the new card next to the original card if needed.

  3. Change the time period for the data on the new card by hovering over the card and clicking .

  4. Select the time period that you want to compare with the current time.

    You can now see whether there are significant differences between this time period and the previous time period.

If the changes are unexpected, you can investigate further by looking at another time frame, determining if more nodes are now running EVPN than previously, looking for changes in the topology, and so forth.

View Devices with the Most Layer 2 EVPN Sessions

You can view the number layer 2 EVPN sessions on your switches and hosts using the large EVPN Service card. This data enables you to see which switches are handling the most EVPN traffic currently, validate that is what is expected based on your network design, and compare that with data from an earlier time to look for any differences.

To view switches and hosts with the most layer 2 EVPN sessions:

  1. Open the large EVPN Service card.

  2. Select Switches with Most L2 EVPN from the filter above the table.

    The table content is sorted by this characteristic, listing nodes running the most layer 2 EVPN sessions at the top. Scroll down to view those with the fewest sessions.

To compare this data with the same data at a previous time:

  1. Open another large EVPN Service card.

  2. Move the new card next to the original card if needed.

  3. Change the time period for the data on the new card by hovering over the card and clicking .

  4. Select the time period that you want to compare with the current time.

    You can now see whether there are significant differences between this time period and the previous time period.

If the changes are unexpected, you can investigate further by looking at another time frame, determining if more nodes are now running EVPN than previously, looking for changes in the topology, and so forth.

View Devices with the Most Layer 3 EVPN Sessions

You can view the number layer 3 EVPN sessions on your switches and hosts using the large EVPN Service card. This data enables you to see which switches are handling the most EVPN traffic currently, validate that is what is expected based on your network design, and compare that with data from an earlier time to look for any differences.

To view switches and hosts with the most layer 3 EVPN sessions:

  1. Open the large EVPN Service card.

  2. Select Switches with Most L3 EVPN from the filter above the table.

    The table content is sorted by this characteristic, listing nodes running the most layer 3 EVPN sessions at the top. Scroll down to view those with the fewest sessions.

To compare this data with the same data at a previous time:

  1. Open another large EVPN Service card.

  2. Move the new card next to the original card if needed.

  3. Change the time period for the data on the new card by hovering over the card and clicking .

  4. Select the time period that you want to compare with the current time.

    You can now see whether there are significant differences between this time period and the previous time period.

If the changes are unexpected, you can investigate further by looking at another time frame, determining if more nodes are now running EVPN than previously, looking for changes in the topology, and so forth.

Switches experiencing a large number of EVPN alarms may indicate a configuration or performance issue that needs further investigation. You can view the switches sorted by the number of BGP alarms and then use the Switches card workflow or the Alarms card workflow to gather more information about possible causes for the alarms.

To view switches with the most EVPN alarms:

  1. Open the large EVPN Service card.

  2. Hover over the header and click .

  3. Select Events by Most Active Device from the filter above the table.

    The table content is sorted by this characteristic, listing nodes with the most EVPN alarms at the top. Scroll down to view those with the fewest alarms.

Where to go next depends on what data you see, but a few options include:

View All EVPN Events

The EVPN Service card workflow enables you to view all of the EVPN events in the designated time period.

To view all EVPN events:

  1. Open the full screen EVPN Service card.

  2. Click All Alarms tab in the navigation panel. By default, events are sorted by Time, with most recent events listed first.

Where to go next depends on what data you see, but a few options include:

View Details for All Devices Running EVPN

You can view all stored attributes of all switches running EVPN in your network in the full screen card.

To view all switch and host details, open the full screen EVPN Service card, and click the All Switches tab.

To return to your workbench, click at the top right.

View Details for All EVPN Sessions

You can view all stored attributes of all EVPN sessions in your network in the full screen card.

To view all session details, open the full screen EVPN Service card, and click the All Sessions tab.

To return to your workbench, click at the top right.

Use the icons above the table to select/deselect, filter, and export items in the list. Refer to Table Settings for more detail.

To return to original display of results, click the associated tab.

Monitor a Single EVPN Session

With NetQ, you can monitor the performance of a single EVPN session, including the number of associated VNI, VTEPs and type. For an overview and how to configure EVPN in your data center network, refer to Ethernet Virtual Private Network - EVPN.

To access the single session cards, you must open the full screen EVPN Service, click the All Sessions tab, select the desired session, then click (Open Cards).

EVPN Session Card Workflow Summary

The small EVPN Session card displays:

ItemDescription
Indicates data is for an EVPN session
TitleEVPN Session
VNI NameName of the VNI (virtual network instance) used for this EVPN session
Current VNI NodesTotal number of VNI nodes participating in the EVPN session currently

The medium EVPN Session card displays:

ItemDescription
Time periodRange of time in which the displayed data was collected; applies to all card sizes
Indicates data is for an EVPN session
TitleNetwork Services | EVPN Session
Summary barVTEP (VXLAN Tunnel EndPoint) Count: Total number of VNI nodes participating in the EVPN session currently
VTEP Count Over Time chartDistribution of VTEP counts during the designated time period
VNI NameName of the VNI used for this EVPN session
TypeIndicates whether the session is established as part of a layer 2 or layer 3 overlay network

The large EVPN Session card contains two tabs.

The Session Summary tab displays:

ItemDescription
Time periodRange of time in which the displayed data was collected; applies to all card sizes
Indicates data is for an EVPN session
TitleSession Summary (Network Services | EVPN Session)
Summary barVTEP (VXLAN Tunnel EndPoint) Count: Total number of VNI devices participating in the EVPN session currently
VTEP Count Over Time chartDistribution of VTEPs during the designated time period
Alarm Count chartDistribution of alarms during the designated time period
Info Count chartDistribution of info events during the designated time period
TableVRF (for layer 3) or VLAN (for layer 2) identifiers by device

The Configuration File Evolution tab displays:

ItemDescription
Time periodRange of time in which the displayed data was collected; applies to all card sizes
Indicates configuration file information for a single session of a Network Service or Protocol
Title(Network Services | EVPN Session) Configuration File Evolution
VTEP count (currently)
TimestampsWhen changes to the configuration file have occurred, the date and time are indicated. Click the time to see the changed file.
Configuration File

When File is selected, the configuration file as it was at the selected time is shown.

When Diff is selected, the configuration file at the selected time is shown on the left and the configuration file at the previous timestamp is shown on the right. Differences are highlighted.

Note: If no configuration file changes have been made, only the original file date is shown.

The full screen EVPN Session card provides tabs for all EVPN sessions and all events.

ItemDescription
TitleNetwork Services | EVPN
Closes full screen card and returns to workbench
Time periodRange of time in which the displayed data was collected; applies to all card sizes; select an alternate time period by clicking
Displays data refresh status. Click to pause data refresh. Click to resume data refresh. Current refresh rate is visible by hovering over icon.
ResultsNumber of results found for the selected tab
All EVPN Sessions tabDisplays all EVPN sessions network-wide. By default, the session list is sorted by hostname. This tab provides the following additional data about each session:
  • Adv All Vni: Indicates whether the VNI state is advertising all VNIs (true) or not (false)
  • Adv Gw Ip: Indicates whether the host device is advertising the gateway IP address (true) or not (false)
  • DB State: Session state of the DB
  • Export RT: IP address and port of the export route target used in the filtering mechanism for BGP route exchange
  • Import RT: IP address and port of the import route target used in the filtering mechanism for BGP route exchange
  • In Kernel: Indicates whether the associated VNI is in the kernel (in kernel) or not (not in kernel)
  • Is L3: Indicates whether the session is part of a layer 3 configuration (true) or not (false)
  • Origin Ip: Host device's local VXLAN tunnel IP address for the EVPN instance
  • OPID: LLDP service identifier
  • Rd: Route distinguisher used in the filtering mechanism for BGP route exchange
  • Timestamp: Date and time the session was started, deleted, updated or marked as dead (device is down)
  • Vni: Name of the VNI where session is running
All Events tabDisplays all events network-wide. By default, the event list is sorted by time, with the most recent events listed first. The tab provides the following additional data about each event:
  • Message: Text description of a EVPN-related event. Example: VNI 3 kernel state changed from down to up
  • Source: Hostname of network device that generated the event
  • Severity: Importance of the event. Values include critical, warning, info, and debug.
  • Type: Network protocol or service generating the event. This always has a value of evpn in this card workflow.
Table ActionsSelect, export, or filter the list. Refer to Table Settings.

View Session Status Summary

A summary of the EVPN session is available from the EVPN Session card workflow, showing the node and its peer and current status.

To view the summary:

  1. Add the Network Services | All EVPN Sessions card.

  2. Switch to the full screen card.

  3. Click the All Sessions tab.

  4. Double-click the session of interest. The full screen card closes automatically.

  5. Optionally, switch to the small EVPN Session card.

For more detail, select a different size EVPN Session card.

View VTEP Count

You can view the count of VTEPs for a given EVPN session from the medium and large EVPN Session cards.

To view the count for a given EVPN session, on the medium EVPN Session card:

  1. Add the Network Services | All EVPN Sessions card.

  2. Switch to the full screen card.

  3. Click the All Sessions tab.

  4. Double-click the session of interest. The full screen card closes automatically.

To view the count for a given EVPN session on the large EVPN Session card, follow the same steps as for the medium card and then switch to the large card.

View All EVPN Session Details

You can view all stored attributes of all of the EVPN sessions running network-wide.

To view all session details, open the full screen EVPN Session card and click the All EVPN Sessions tab.

To return to your workbench, click in the top right of the card.

View All Events

You can view all of the alarm and info events occurring network wide.

To view all events, open the full screen EVPN Session card and click the All Events tab.

Where to go next depends on what data you see, but a few options include:

Monitor the LLDP Service

The Cumulus NetQ UI enables operators to view the health of the LLDP service on a network-wide and a per session basis, giving greater insight into all aspects of the service. This is accomplished through two card workflows, one for the service and one for the session. They are described separately here.

Monitor the LLDP Service (All Sessions)

With NetQ, you can monitor the number of nodes running the LLDP service, view nodes with the most LLDP neighbor nodes, those nodes with the least neighbor nodes, and view alarms triggered by the LLDP service. For an overview and how to configure LLDP in your data center network, refer to Link Layer Discovery Protocol.

LLDP Service Card Workflow Summary

The small LLDP Service card displays:

ItemDescription
Indicates data is for all sessions of a Network Service or Protocol
TitleLLDP: All LLDP Sessions, or the LLDP Service
Total number of switches with the LLDP service enabled during the designated time period
Total number of LLDP-related alarms received during the designated time period
ChartDistribution of LLDP-related alarms received during the designated time period

The medium LLDP Service card displays:

ItemDescription
Time periodRange of time in which the displayed data was collected; applies to all card sizes
Indicates data is for all sessions of a Network Service or Protocol
TitleLLDP: All LLDP Sessions, or the LLDP Service
Total number of switches with the LLDP service enabled during the designated time period
Total number of LLDP-related alarms received during the designated time period
Total Nodes Running chart

Distribution of switches and hosts with the LLDP service enabled during the designated time period, and a total number of nodes running the service currently.

Note: The node count here may be different than the count in the summary bar. For example, the number of nodes running LLDP last week or last month might be more or less than the number of nodes running LLDP currently.

Total Open Alarms chart

Distribution of LLDP-related alarms received during the designated time period, and the total number of current LLDP-related alarms in the network.

Note: The alarm count here may be different than the count in the summary bar. For example, the number of new alarms received in this time period does not take into account alarms that have already been received and are still active. You might have no new alarms, but still have a total number of alarms present on the network of 10.

Total Sessions chartDistribution of LLDP sessions running during the designated time period, and the total number of sessions running on the network currently.

The large LLDP service card contains two tabs.

The Sessions Summary tab which displays:

ItemDescription
Time periodRange of time in which the displayed data was collected; applies to all card sizes
Indicates data is for all sessions of a Network Service or Protocol
TitleSessions Summary (Network Services | All LLDP Sessions)
Total number of switches with the LLDP service enabled during the designated time period
Total number of LLDP-related alarms received during the designated time period
Total Nodes Running chart

Distribution of switches and hosts with the LLDP service enabled during the designated time period, and a total number of nodes running the service currently.

Note: The node count here may be different than the count in the summary bar. For example, the number of nodes running LLDP last week or last month might be more or less than the number of nodes running LLDP currently.

Total Sessions chartDistribution of LLDP sessions running during the designated time period, and the total number of sessions running on the network currently
Total Sessions with No Nbr chartDistribution of LLDP sessions missing neighbor information during the designated time period, and the total number of session missing neighbors in the network currently
Table/Filter options

When the Switches with Most Sessions filter is selected, the table displays switches running LLDP sessions in decreasing order of session count-devices with the largest number of sessions are listed first

When the Switches with Most Unestablished Sessions filter is selected, the table displays switches running LLDP sessions in decreasing order of unestablished session count-devices with the largest number of unestablished sessions are listed first

Show All SessionsLink to view all LLDP sessions in the full screen card

The Alarms tab which displays:

ItemDescription
Time periodRange of time in which the displayed data was collected; applies to all card sizes
(in header)Indicates data is all alarms for all LLDP sessions
TitleAlarms (visible when you hover over card)
Total number of switches with the LLDP service enabled during the designated time period
(in summary bar)Total number of LLDP-related alarms received during the designated time period
Total Alarms chart

Distribution of LLDP-related alarms received during the designated time period, and the total number of current LLDP-related alarms in the network.

Note: The alarm count here may be different than the count in the summary bar. For example, the number of new alarms received in this time period does not take into account alarms that have already been received and are still active. You might have no new alarms, but still have a total number of alarms present on the network of 10.

Table/Filter optionsWhen the Events by Most Active Device filter is selected, the table displays switches running LLDP sessions in decreasing order of alarm count-devices with the largest number of sessions are listed first
Show All SessionsLink to view all LLDP sessions in the full screen card

The full screen LLDP Service card provides tabs for all switches, all sessions, and all alarms.

ItemDescription
TitleNetwork Services | LLDP
Closes full screen card and returns to workbench
Time periodRange of time in which the displayed data was collected; applies to all card sizes; select an alternate time period by clicking
Displays data refresh status. Click to pause data refresh. Click to resume data refresh. Current refresh rate is visible by hovering over icon.
ResultsNumber of results found for the selected tab
All Switches tabDisplays all switches and hosts running the LLDP service. By default, the device list is sorted by hostname. This tab provides the following additional data about each device:
  • Agent
    • State: Indicates communication state of the NetQ Agent on a given device. Values include Fresh (heard from recently) and Rotten (not heard from recently).
    • Version: Software version number of the NetQ Agent on a given device. This should match the version number of the NetQ software loaded on your server or appliance; for example, 2.1.0.
  • ASIC
    • Core BW: Maximum sustained/rated bandwidth. Example values include 2.0 T and 720 G.
    • Model: Chip family. Example values include Tomahawk, Trident, and Spectrum.
    • Model Id: Identifier of networking ASIC model. Example values include BCM56960 and BCM56854.
    • Ports: Indicates port configuration of the switch. Example values include 32 x 100G-QSFP28, 48 x 10G-SFP+, and 6 x 40G-QSFP+.
    • Vendor: Manufacturer of the chip. Example values include Broadcom and Mellanox.
  • CPU
    • Arch: Microprocessor architecture type. Values include x86_64 (Intel), ARMv7 (AMD), and PowerPC.
    • Max Freq: Highest rated frequency for CPU. Example values include 2.40 GHz and 1.74 GHz.
    • Model: Chip family. Example values include Intel Atom C2538 and Intel Atom C2338.
    • Nos: Number of cores. Example values include 2, 4, and 8.
  • Disk Total Size: Total amount of storage space in physical disks (not total available). Example values: 10 GB, 20 GB, 30 GB.
  • License State: Indicator of validity. Values include ok and bad.
  • Memory Size: Total amount of local RAM. Example values include 8192 MB and 2048 MB.
  • OS
    • Vendor: Operating System manufacturer. Values include Cumulus Networks, RedHat, Ubuntu, and CentOS.
    • Version: Software version number of the OS. Example values include 3.7.3, 2.5.x, 16.04, 7.1.
    • Version Id: Identifier of the OS version. For Cumulus, this is the same as the Version (3.7.x).
  • Platform
    • Date: Date and time the platform was manufactured. Example values include 7/12/18 and 10/29/2015.
    • MAC: System MAC address. Example value: 17:01:AB:EE:C3:F5.
    • Model: Manufacturer's model name. Examples include AS7712-32X and S4048-ON.
    • Number: Manufacturer part number. Examples values include FP3ZZ7632014A, 0J09D3.
    • Revision: Release version of the platform
    • Series: Manufacturer serial number. Example values include D2060B2F044919GD000060, CN046MRJCES0085E0004.
    • Vendor: Manufacturer of the platform. Example values include Cumulus Express, Dell, EdgeCore, Lenovo, Mellanox.
  • Time: Date and time the data was collected from device.
All Sessions tabDisplays all LLDP sessions network-wide. By default, the session list is sorted by hostname. This tab provides the following additional data about each session:
  • Ifname: Name of the host interface where LLDP session is running
  • LLDP Peer:
    • Os: Operating system (OS) used by peer device. Values include Cumulus Linux, RedHat, Ubuntu, and CentOS.
    • Osv: Version of the OS used by peer device. Example values include 3.7.3, 2.5.x, 16.04, 7.1.
    • Bridge: Indicates whether the peer device is a bridge (true) or not (false)
    • Router: Indicates whether the peer device is a router (true) or not (false)
    • Station: Indicates whether the peer device is a station (true) or not (false)
  • Peer:
    • Hostname: User-defined name for the peer device
    • Ifname: Name of the peer interface where the session is running
  • Timestamp: Date and time that the session was started, deleted, updated, or marked dead (device is down)
All Alarms tabDisplays all LLDP events network-wide. By default, the event list is sorted by time, with the most recent events listed first. The tab provides the following additional data about each event:
  • Message: Text description of a LLDP-related event. Example: LLDP Session with host leaf02 swp6 modified fields leaf06 swp21
  • Source: Hostname of network device that generated the event
  • Severity: Importance of the event. Values include critical, warning, info, and debug.
  • Type: Network protocol or service generating the event. This always has a value of lldp in this card workflow.
Table ActionsSelect, export, or filter the list. Refer to Table Settings.

View Service Status Summary

A summary of the LLDP service is available from the Network Services card workflow, including the number of nodes running the service, the number of LLDP-related alarms, and a distribution of those alarms.

To view the summary, open the small LLDP Service card.

In this example, there are no LLDP alarms present on the network of 14 devices.

For more detail, select a different size LLDP Network Services card.

View the Distribution of Nodes, Alarms, and Sessions

It is useful to know the number of network nodes running the LLDP protocol over a period of time, as it gives you insight into nodes that might be misconfigured or experiencing communication issues. Additionally, if there are a large number of alarms, it is worth investigating either the service or particular devices.

To view the distribution, open the medium LLDP Service card.

In this example, we see that 13 nodes are running the LLDP protocol, that there are 52 sessions established, and that no LLDP-related alarms have occurred in the last 24 hours.

View the Distribution of Missing Neighbors

You can view the number of missing neighbors in any given time period and how that number has changed over time. This is a good indicator of link communication issues.

To view the distribution, open the large LLDP Service card and view the bottom chart on the left, Total Sessions with No Nbr.

In this example, we see that 16 of the 52 sessions are missing the neighbor (peer) device.

View Devices with the Most LLDP Sessions

You can view the load from LLDP on your switches using the large LLDP Service card. This data enables you to see which switches are handling the most LLDP traffic currently, validate that is what is expected based on your network design, and compare that with data from an earlier time to look for any differences.

To view switches and hosts with the most LLDP sessions:

  1. Open the large LLDP Service card.

  2. Select Switches with Most Sessions from the filter above the table.

    The table content is sorted by this characteristic, listing nodes running the most LLDP sessions at the top. Scroll down to view those with the fewest sessions.

To compare this data with the same data at a previous time:

  1. Open another large LLDP Service card.

  2. Move the new card next to the original card if needed.

  3. Change the time period for the data on the new card by hovering over the card and clicking .

  4. Select the time period that you want to compare with the current time. You can now see whether there are significant differences between this time period and the previous time period.

    In this case, notice that the alarms have reduced significantly in the last week. If the changes are unexpected, you can investigate further by looking at another time frame, determining if more nodes are now running LLDP than previously, looking for changes in the topology, and so forth.

View Devices with the Most Unestablished LLDP Sessions

You can identify switches that are experiencing difficulties establishing LLDP sessions; both currently and in the past.

To view switches with the most unestablished LLDP sessions:

  1. Open the large LLDP Service card.

  2. Select Switches with Most Unestablished Sessions from the filter above the table.

    The table content is sorted by this characteristic, listing nodes with the most unestablished CLAG sessions at the top. Scroll down to view those with the fewest unestablished sessions.

Where to go next depends on what data you see, but a few options include:

Switches experiencing a large number of LLDP alarms may indicate a configuration or performance issue that needs further investigation. You can view the switches sorted by the number of LLDP alarms and then use the Switches card workflow or the Alarms card workflow to gather more information about possible causes for the alarms.

To view switches with most LLDP alarms:

  1. Open the large LLDP Service card.

  2. Hover over the header and click .

  3. Select Events by Most Active Device from the filter above the table.

    The table content is sorted by this characteristic, listing nodes with the most BGP alarms at the top. Scroll down to view those with the fewest alarms.

Where to go next depends on what data you see, but a few options include:

View All LLDP Events

The LLDP Network Services card workflow enables you to view all of the LLDP events in the designated time period.

To view all LLDP events:

  1. Open the full screen LLDP Service card.

  2. Click the All Alarms tab.

Where to go next depends on what data you see, but a few options include:

View Details About All Switches Running LLDP

You can view all stored attributes of all switches running LLDP in your network in the full screen card.

To view all switch details, open the LLDP Service card, and click the All Switches tab.

Return to your workbench by clicking in the top right corner.

View Detailed Information About All LLDP Sessions

You can view all stored attributes of all LLDP sessions in your network in the full screen card.

To view all session details, open the LLDP Service card, and click the All Sessions tab.

Return to your workbench by clicking in the top right corner.

Use the icons above the table to select/deselect, filter, and export items in the list. Refer to Table Settings for more detail. To return to original display of results, click the associated tab.

Monitor a Single LLDP Session

With NetQ, you can monitor the number of nodes running the LLDP service, view neighbor state changes, and compare with events occurring at the same time, as well as monitor the running LLDP configuration and changes to the configuration file. For an overview and how to configure LLDP in your data center network, refer to Link Layer Discovery Protocol.

To access the single session cards, you must open the full screen LLDP Service card, click the All Sessions tab, select the desired session, then click (Open Cards).

Granularity of Data Shown Based on Time Period

On the medium and large single LLDP session cards, the status of the neighboring peers is represented in heat maps stacked vertically; one for peers that are reachable (neighbor detected), and one for peers that are unreachable (neighbor not detected). Depending on the time period of data on the card, the number of smaller time blocks used to indicate the status varies. A vertical stack of time blocks, one from each map, includes the results from all checks during that time. The results are shown by how saturated the color is for each block. If all peers during that time period were detected for the entire time block, then the top block is 100% saturated (white) and the neighbor not detected block is zero percent saturated (gray). As peers become reachable, the neighbor detected block increases in saturation, the peers that are unreachable (neighbor not detected) block is proportionally reduced in saturation. An example heat map for a time period of 24 hours is shown here with the most common time periods in the table showing the resulting time blocks.

Time PeriodNumber of RunsNumber Time BlocksAmount of Time in Each Block
6 hours1861 hour
12 hours36121 hour
24 hours72241 hour
1 week50471 day
1 month2,086301 day
1 quarter7,000131 week

LLDP Session Card Workflow Summary

The small LLDP Session card displays:

ItemDescription
Indicates data is for a single session of a Network Service or Protocol
TitleLLDP Session
Host and peer devices in session. Host is shown on top, with peer below.
, Indicates whether the host sees the peer or not; has a peer, no peer

The medium LLDP Session card displays:

ItemDescription
Time periodRange of time in which the displayed data was collected
Indicates data is for a single session of a Network Service or Protocol
TitleLLDP Session
Host and peer devices in session. Arrow points from host to peer.
, Indicates whether the host sees the peer or not; has a peer, no peer
Time periodRange of time for the distribution chart
Heat mapDistribution of neighbor availability (detected or undetected) during this given time period
HostnameUser-defined name of the host device
Interface NameSoftware interface on the host device where the session is running
Peer HostnameUser-defined name of the peer device
Peer Interface NameSoftware interface on the peer where the session is running

The large LLDP Session card contains two tabs.

The Session Summary tab displays:

ItemDescription
Time periodRange of time in which the displayed data was collected
Indicates data is for a single session of a Network Service or Protocol
TitleSummary Session (Network Services | LLDP Session)
Host and peer devices in session. Arrow points from host to peer.
, Indicates whether the host sees the peer or not; has a peer, no peer
Heat mapDistribution of neighbor state (detected or undetected) during this given time period
Alarm Count chartDistribution and count of LLDP alarm events during the given time period
Info Count chartDistribution and count of LLDP info events during the given time period
Host Interface NameSoftware interface on the host where the session is running
Peer HostnameUser-defined name of the peer device
Peer Interface NameSoftware interface on the peer where the session is running

The Configuration File Evolution tab displays:

ItemDescription
Time periodRange of time in which the displayed data was collected; applies to all card sizes
Indicates configuration file information for a single session of a Network Service or Protocol
Title(Network Services | LLDP Session) Configuration File Evolution
Device identifiers (hostname, IP address, or MAC address) for host and peer in session. Click to open associated device card.
, Indicates whether the host sees the peer or not; has a peer, no peer
TimestampsWhen changes to the configuration file have occurred, the date and time are indicated. Click the time to see the changed file.
Configuration File

When File is selected, the configuration file as it was at the selected time is shown. When Diff is selected, the configuration file at the selected time is shown on the left and the configuration file at the previous timestamp is shown on the right. Differences are highlighted.

Note: If no configuration file changes have been made, the card shows no results.

The full screen LLDP Session card provides tabs for all LLDP sessions and all events.

ItemDescription
TitleNetwork Services | LLDP
Closes full screen card and returns to workbench
Time periodRange of time in which the displayed data was collected; applies to all card sizes; select an alternate time period by clicking
Displays data refresh status. Click to pause data refresh. Click to resume data refresh. Current refresh rate is visible by hovering over icon.
ResultsNumber of results found for the selected tab
All LLDP Sessions tabDisplays all LLDP sessions on the host device. By default, the session list is sorted by hostname. This tab provides the following additional data about each session:
  • Ifname: Name of the host interface where LLDP session is running
  • LLDP Peer:
    • Os: Operating system (OS) used by peer device. Values include Cumulus Linux, RedHat, Ubuntu, and CentOS.
    • Osv: Version of the OS used by peer device. Example values include 3.7.3, 2.5.x, 16.04, 7.1.
    • Bridge: Indicates whether the peer device is a bridge (true) or not (false)
    • Router: Indicates whether the peer device is a router (true) or not (false)
    • Station: Indicates whether the peer device is a station (true) or not (false)
  • Peer:
    • Hostname: User-defined name for the peer device
    • Ifname: Name of the peer interface where the session is running
  • Timestamp: Date and time that the session was started, deleted, updated, or marked dead (device is down)
All Events tabDisplays all events network-wide. By default, the event list is sorted by time, with the most recent events listed first. The tab provides the following additional data about each event:
  • Message: Text description of an event. Example: LLDP Session with host leaf02 swp6 modified fields leaf06 swp21
  • Source: Hostname of network device that generated the event
  • Severity: Importance of the event. Values include critical, warning, info, and debug.
  • Type: Network protocol or service generating the event. This always has a value of lldp in this card workflow.
Table ActionsSelect, export, or filter the list. Refer to Table Settings.

View Session Status Summary

A summary of the LLDP session is available from the LLDP Session card workflow, showing the node and its peer and current status.

To view the summary:

  1. Open the full screen LLDP Service card.

  2. Double-click on a session. The full screen card closes automatically.

  3. Locate the medium LLDP Session card.

  4. Optionally, open the small LLDP Session card.

View LLDP Session Neighbor State Changes

You can view the neighbor state for a given LLDP session from the medium and large LLDP Session cards. For a given time period, you can determine the stability of the LLDP session between two devices. If you experienced connectivity issues at a particular time, you can use these cards to help verify the state of the neighbor. If the neighbor was not alive more than it was alive, you can then investigate further into possible causes.

To view the neighbor availability for a given LLDP session on the medium card:

  1. Open the full screen LLDP Service card.

  2. Double-click on a session. The full screen card closes automatically.

  3. Locate the medium LLDP Session card.

In this example, the heat map tells us that this LLDP session has been able to detect a neighbor for the entire time period.

From this card, you can also view the host name and interface name, and the peer name and interface name.

To view the neighbor availability for a given LLDP session on the large LLDP Session card, open that card.

From this card, you can also view the alarm and info event counts, host interface name, peer hostname, and peer interface identifying the session in more detail.

View Changes to the LLDP Service Configuration File

Each time a change is made to the configuration file for the LLDP service, NetQ logs the change and enables you to compare it with the last version. This can be useful when you are troubleshooting potential causes for alarms or sessions losing their connections.

To view the configuration file changes:

  1. Open the large LLDP Session card.

  2. Hover over the card and click to open the LLDP Configuration File Evolution tab.

  3. Select the time of interest on the left; when a change may have impacted the performance. Scroll down if needed.

  4. Choose between the File view and the Diff view (selected option is dark; File by default).

    The File view displays the content of the file for you to review.

    The Diff view displays the changes between this version (on left) and the most recent version (on right) side by side. The changes are highlighted in red and green. In this example, we don’t have any changes to the file, so the same file is shown on both sides, and thus no highlighted lines.

View All LLDP Session Details

You can view all stored attributes of all of the LLDP sessions associated with the two devices on this card.

To view all session details, open the full screen LLDP Session card, and click the All LLDP Sessions tab.

To return to your workbench, click in the top right of the card.

View All Events

You can view all of the alarm and info events in the network.

To view all events, open the full screen LLDP Session card, and click the All Events tab.

Where to go next depends on what data you see, but a few options include:

Monitor the MLAG Service

The Cumulus NetQ UI enables operators to view the health of the MLAG service on a network-wide and a per session basis, giving greater insight into all aspects of the service. This is accomplished through two card workflows, one for the service and one for the session. They are described separately here.

MLAG or CLAG? The Cumulus Linux implementation of MLAG is referred to by other vendors as MLAG, MC-LAG or VPC. The Cumulus NetQ UI uses the MLAG terminology predominantly.

Monitor the MLAG Service (All Sessions)

With NetQ, you can monitor the number of nodes running the MLAG service, view sessions running, and view alarms triggered by the MLAG service. For an overview and how to configure MLAG in your data center network, refer to Multi-Chassis Link Aggregation - MLAG.

MLAG Service Card Workflow Summary

The small MLAG Service card displays:

ItemDescription
Indicates data is for all sessions of a Network Service or Protocol
TitleMLAG: All MLAG Sessions, or the MLAG Service
Total number of switches with the MLAG service enabled during the designated time period
Total number of MLAG-related alarms received during the designated time period
ChartDistribution of MLAG-related alarms received during the designated time period

The medium MLAG Service card displays:

ItemDescription
Time periodRange of time in which the displayed data was collected; applies to all card sizes
Indicates data is for all sessions of a Network Service or Protocol
TitleNetwork Services | All MLAG Sessions
Total number of switches with the MLAG service enabled during the designated time period
Total number of MLAG-related alarms received during the designated time period
Total number of sessions with an inactive backup IP address during the designated time period
Total number of bonds with only a single connection during the designated time period
Total Nodes Running chart

Distribution of switches and hosts with the MLAG service enabled during the designated time period, and a total number of nodes running the service currently.

Note: The node count here may be different than the count in the summary bar. For example, the number of nodes running MLAG last week or last month might be more or less than the number of nodes running MLAG currently.

Total Open Alarms chart

Distribution of MLAG-related alarms received during the designated time period, and the total number of current MLAG-related alarms in the network.

Note: The alarm count here may be different than the count in the summary bar. For example, the number of new alarms received in this time period does not take into account alarms that have already been received and are still active. You might have no new alarms, but still have a total number of alarms present on the network of 10.

Total Sessions chartDistribution of MLAG sessions running during the designated time period, and the total number of sessions running on the network currently

The large MLAG service card contains two tabs.

The All MLAG Sessions summary tab which displays:

ItemDescription
Time periodRange of time in which the displayed data was collected; applies to all card sizes
Indicates data is for all sessions of a Network Service or Protocol
TitleAll MLAG Sessions Summary
Total number of switches with the MLAG service enabled during the designated time period
Total number of MLAG-related alarms received during the designated time period
Total Nodes Running chart

Distribution of switches and hosts with the MLAG service enabled during the designated time period, and a total number of nodes running the service currently.

Note: The node count here may be different than the count in the summary bar. For example, the number of nodes running MLAG last week or last month might be more or less than the number of nodes running MLAG currently.

Total Sessions chart

Distribution of MLAG sessions running during the designated time period, and the total number of sessions running on the network currently

Total Sessions with Inactive-backup-ip chartDistribution of sessions without an active backup IP defined during the designated time period, and the total number of these sessions running on the network currently
Table/Filter options

When the Switches with Most Sessions filter is selected, the table displays switches running MLAG sessions in decreasing order of session count-devices with the largest number of sessions are listed first

When the Switches with Most Unestablished Sessions filter is selected, the table displays switches running MLAG sessions in decreasing order of unestablished session count-devices with the largest number of unestablished sessions are listed first

Show All SessionsLink to view all MLAG sessions in the full screen card

The All MLAG Alarms tab which displays:

ItemDescription
Time periodRange of time in which the displayed data was collected; applies to all card sizes
(in header)Indicates alarm data for all MLAG sessions
TitleNetwork Services | All MLAG Alarms (visible when you hover over card)
Total number of switches with the MLAG service enabled during the designated time period
(in summary bar)Total number of MLAG-related alarms received during the designated time period
Total Alarms chart

Distribution of MLAG-related alarms received during the designated time period, and the total number of current MLAG-related alarms in the network.

Note: The alarm count here may be different than the count in the summary bar. For example, the number of new alarms received in this time period does not take into account alarms that have already been received and are still active. You might have no new alarms, but still have a total number of alarms present on the network of 10.

Table/Filter optionsWhen the Events by Most Active Device filter is selected, the table displays switches running MLAG sessions in decreasing order of alarm count-devices with the largest number of sessions are listed first
Show All SessionsLink to view all MLAG sessions in the full screen card

The full screen MLAG Service card provides tabs for all switches, all sessions, and all alarms.

ItemDescription
TitleNetwork Services | MLAG
Closes full screen card and returns to workbench
Time periodRange of time in which the displayed data was collected; applies to all card sizes; select an alternate time period by clicking
Displays data refresh status. Click to pause data refresh. Click to resume data refresh. Current refresh rate is visible by hovering over icon.
ResultsNumber of results found for the selected tab
All Switches tabDisplays all switches and hosts running the MLAG service. By default, the device list is sorted by hostname. This tab provides the following additional data about each device:
  • Agent
    • State: Indicates communication state of the NetQ Agent on a given device. Values include Fresh (heard from recently) and Rotten (not heard from recently).
    • Version: Software version number of the NetQ Agent on a given device. This should match the version number of the NetQ software loaded on your server or appliance; for example, 2.1.0.
  • ASIC
    • Core BW: Maximum sustained/rated bandwidth. Example values include 2.0 T and 720 G.
    • Model: Chip family. Example values include Tomahawk, Trident, and Spectrum.
    • Model Id: Identifier of networking ASIC model. Example values include BCM56960 and BCM56854.
    • Ports: Indicates port configuration of the switch. Example values include 32 x 100G-QSFP28, 48 x 10G-SFP+, and 6 x 40G-QSFP+.
    • Vendor: Manufacturer of the chip. Example values include Broadcom and Mellanox.
  • CPU
    • Arch: Microprocessor architecture type. Values include x86_64 (Intel), ARMv7 (AMD), and PowerPC.
    • Max Freq: Highest rated frequency for CPU. Example values include 2.40 GHz and 1.74 GHz.
    • Model: Chip family. Example values include Intel Atom C2538 and Intel Atom C2338.
    • Nos: Number of cores. Example values include 2, 4, and 8.
  • Disk Total Size: Total amount of storage space in physical disks (not total available). Example values: 10 GB, 20 GB, 30 GB.
  • License State: Indicator of validity. Values include ok and bad.
  • Memory Size: Total amount of local RAM. Example values include 8192 MB and 2048 MB.
  • OS
    • Vendor: Operating System manufacturer. Values include Cumulus Networks, RedHat, Ubuntu, and CentOS.
    • Version: Software version number of the OS. Example values include 3.7.3, 2.5.x, 16.04, 7.1.
    • Version Id: Identifier of the OS version. For Cumulus, this is the same as the Version (3.7.x).
  • Platform
    • Date: Date and time the platform was manufactured. Example values include 7/12/18 and 10/29/2015.
    • MAC: System MAC address. Example value: 17:01:AB:EE:C3:F5.
    • Model: Manufacturer's model name. Examples values include AS7712-32X and S4048-ON.
    • Number: Manufacturer part number. Examples values include FP3ZZ7632014A, 0J09D3.
    • Revision: Release version of the platform
    • Series: Manufacturer serial number. Example values include D2060B2F044919GD000060, CN046MRJCES0085E0004.
    • Vendor: Manufacturer of the platform. Example values include Cumulus Express, Dell, EdgeCore, Lenovo, Mellanox.
  • Time: Date and time the data was collected from device.
All Sessions tabDisplays all MLAG sessions network-wide. By default, the session list is sorted by hostname. This tab provides the following additional data about each session:
  • Backup Ip: IP address of the interface to use if the peerlink (or bond) goes down
  • Backup Ip Active: Indicates whether the backup IP address has been specified and is active (true) or not (false)
  • Bonds
    • Conflicted: Identifies the set of interfaces in a bond that do not match on each end of the bond
    • Single: Identifies a set of interfaces connecting to only one of the two switches
    • Dual: Identifies a set of interfaces connecting to both switches
    • Proto Down: Interface on the switch brought down by the clagd service. Value is blank if no interfaces are down due to clagd service.
  • Clag Sysmac: Unique MAC address for each bond interface pair. Note: Must be a value between 44:38:39:ff:00:00 and 44:38:39:ff:ff:ff.
  • Peer:
    • If: Name of the peer interface
    • Role: Role of the peer device. Values include primary and secondary.
    • State: Indicates if peer device is up (true) or down (false)
  • Role: Role of the host device. Values include primary and secondary.
  • Timestamp: Date and time the MLAG session was started, deleted, updated, or marked dead (device went down)
  • Vxlan Anycast: Anycast IP address used for VXLAN termination
All Alarms tabDisplays all MLAG events network-wide. By default, the event list is sorted by time, with the most recent events listed first. The tab provides the following additional data about each event:
  • Message: Text description of a MLAG-related event. Example: Clag conflicted bond changed from swp7 swp8 to swp9 swp10
  • Source: Hostname of network device that generated the event
  • Severity: Importance of the event. Values include critical, warning, info, and debug.
  • Type: Network protocol or service generating the event. This always has a value of clag in this card workflow.
Table ActionsSelect, export, or filter the list. Refer to Table Settings.

View Service Status Summary

A summary of the MLAG service is available from the MLAG Service card workflow, including the number of nodes running the service, the number of MLAG-related alarms, and a distribution of those alarms.

To view the summary, open the small MLAG Service card.

For more detail, select a different size MLAG Service card.

View the Distribution of Sessions and Alarms

It is useful to know the number of network nodes running the MLAG protocol over a period of time, as it gives you insight into the amount of traffic associated with and breadth of use of the protocol. It is also useful to compare the number of nodes running MLAG with the alarms present at the same time to determine if there is any correlation between the issues and the ability to establish a MLAG session.

To view these distributions, open the medium MLAG Service card.

If a visual correlation is apparent, you can dig a little deeper with the large MLAG Service card tabs.

View Devices with the Most CLAG Sessions

You can view the load from MLAG on your switches using the large MLAG Service card. This data enables you to see which switches are handling the most MLAG traffic currently, validate that is what is expected based on your network design, and compare that with data from an earlier time to look for any differences.

To view switches and hosts with the most MLAG sessions:

  1. Open the large MLAG Service card.

  2. Select Switches with Most Sessions from the filter above the table.

    The table content is sorted by this characteristic, listing nodes running the most MLAG sessions at the top. Scroll down to view those with the fewest sessions.

To compare this data with the same data at a previous time:

  1. Open another large MLAG Service card.

  2. Move the new card next to the original card if needed.

  3. Change the time period for the data on the new card by hovering over the card and clicking .

  4. Select the time period that you want to compare with the current time. You can now see whether there are significant differences between this time period and the previous time period.

    If the changes are unexpected, you can investigate further by looking at another time frame, determining if more nodes are now running MLAG than previously, looking for changes in the topology, and so forth.

View Devices with the Most Unestablished MLAG Sessions

You can identify switches that are experiencing difficulties establishing MLAG sessions; both currently and in the past.

To view switches with the most unestablished MLAG sessions:

  1. Open the large MLAG Service card.

  2. Select Switches with Most Unestablished Sessions from the filter above the table.

    The table content is sorted by this characteristic, listing nodes with the most unestablished MLAG sessions at the top. Scroll down to view those with the fewest unestablished sessions.

Where to go next depends on what data you see, but a few options include:

Switches experiencing a large number of MLAG alarms may indicate a configuration or performance issue that needs further investigation. You can view the switches sorted by the number of MLAG alarms and then use the Switches card workflow or the Alarms card workflow to gather more information about possible causes for the alarms.

To view switches with most MLAG alarms:

  1. Open the large MLAG Service card.

  2. Hover over the header and click .

  3. Select Events by Most Active Device from the filter above the table.

    The table content is sorted by this characteristic, listing nodes with the most MLAG alarms at the top. Scroll down to view those with the fewest alarms.

Where to go next depends on what data you see, but a few options include:

View All MLAG Events

The MLAG Service card workflow enables you to view all of the MLAG events in the designated time period.

To view all MLAG events:

  1. Open the full screen MLAG Service card.

  2. Click All Alarms tab.

Where to go next depends on what data you see, but a few options include:

View Details About All Switches Running MLAG

You can view all stored attributes of all switches running MLAG in your network in the full-screen card.

To view all switch details, open the full screen MLAG Service card, and click the All Switches tab.

To return to your workbench, click in the top right corner.

Use the icons above the table to select/deselect, filter, and export items in the list. Refer to Table Settings for more detail. To return to original display of results, click the associated tab.

Monitor a Single MLAG Session

With NetQ, you can monitor the number of nodes running the MLAG service, view switches with the most peers alive and not alive, and view alarms triggered by the MLAG service. For an overview and how to configure MLAG in your data center network, refer to Multi-Chassis Link Aggregation - MLAG.

To access the single session cards, you must open the full screen MLAG Service, click the All Sessions tab, select the desired session, then click (Open Cards).

Granularity of Data Shown Based on Time Period

On the medium and large single MLAG session cards, the status of the peers is represented in heat maps stacked vertically; one for peers that are reachable (alive), and one for peers that are unreachable (not alive). Depending on the time period of data on the card, the number of smaller time blocks used to indicate the status varies. A vertical stack of time blocks, one from each map, includes the results from all checks during that time. The results are shown by how saturated the color is for each block. If all peers during that time period were alive for the entire time block, then the top block is 100% saturated (white) and the not alive block is zero percent saturated (gray). As peers that are not alive increase in saturation, the peers that are alive block is proportionally reduced in saturation. An example heat map for a time period of 24 hours is shown here with the most common time periods in the table showing the resulting time blocks.

Time PeriodNumber of RunsNumber Time BlocksAmount of Time in Each Block
6 hours1861 hour
12 hours36121 hour
24 hours72241 hour
1 week50471 day
1 month2,086301 day
1 quarter7,000131 week

MLAG Session Card Workflow Summary

The small MLAG Session card displays:

ItemDescription
Indicates data is for a single session of a Network Service or Protocol
TitleCLAG Session
Device identifiers (hostname, IP address, or MAC address) for host and peer in session.
, Indication of host role, primary or secondary

The medium MLAG Session card displays:

ItemDescription
Time period (in header)Range of time in which the displayed data was collected; applies to all card sizes
Indicates data is for a single session of a Network Service or Protocol
TitleNetwork Services | MLAG Session
Device identifiers (hostname, IP address, or MAC address) for host and peer in session. Arrow points from the host to the peer. Click to open associated device card.
, Indication of host role, primary or secondary
Time period (above chart)Range of time for data displayed in peer status chart
Peer Status chartDistribution of peer availability, alive or not alive, during the designated time period. The number of time segments in a time period varies according to the length of the time period.
RoleRole that host device is playing. Values include primary and secondary.
CLAG sysmacSystem MAC address of the MLAG session
Peer RoleRole that peer device is playing. Values include primary and secondary.
Peer StateOperational state of the peer, up (true) or down (false)

The large MLAG Session card contains two tabs.

The Session Summary tab displays:

ItemDescription
Time periodRange of time in which the displayed data was collected; applies to all card sizes
Indicates data is for a single session of a Network Service or Protocol
Title(Network Services | MLAG Session) Session Summary
Device identifiers (hostname, IP address, or MAC address) for host and peer in session. Arrow points from the host to the peer. Click to open associated device card.
, Indication of host role, primary or secondary
Alarm Count ChartDistribution and count of CLAG alarm events over the given time period
Info Count ChartDistribution and count of CLAG info events over the given time period
Peer Status chartDistribution of peer availability, alive or not alive, during the designated time period. The number of time segments in a time period varies according to the length of the time period.
Backup IPIP address of the interface to use if the peerlink (or bond) goes down
Backup IP ActiveIndicates whether the backup IP address is configured
CLAG SysMACSystem MAC address of the MLAG session
Peer StateOperational state of the peer, up (true) or down (false)
Count of Dual BondsNumber of bonds connecting to both switches
Count of Single BondsNumber of bonds connecting to only one switch
Count of Protocol Down BondsNumber of bonds with interfaces that were brought down by the clagd service
Count of Conflicted BondsNumber of bonds which have a set of interfaces that are not the same on both switches

The Configuration File Evolution tab displays:

ItemDescription
Time periodRange of time in which the displayed data was collected; applies to all card sizes
Indicates configuration file information for a single session of a Network Service or Protocol
Title(Network Services | MLAG Session) Configuration File Evolution
Device identifiers (hostname, IP address, or MAC address) for host and peer in session. Arrow points from the host to the peer. Click to open associated device card.
, Indication of host role, primary or secondary
TimestampsWhen changes to the configuration file have occurred, the date and time are indicated. Click the time to see the changed file.
Configuration File

When File is selected, the configuration file as it was at the selected time is shown.

When Diff is selected, the configuration file at the selected time is shown on the left and the configuration file at the previous timestamp is shown on the right. Differences are highlighted.

The full screen MLAG Session card provides tabs for all MLAG sessions and all events.

ItemDescription
TitleNetwork Services | MLAG
Closes full screen card and returns to workbench
Time periodRange of time in which the displayed data was collected; applies to all card sizes; select an alternate time period by clicking
Displays data refresh status. Click to pause data refresh. Click to resume data refresh. Current refresh rate is visible by hovering over icon.
ResultsNumber of results found for the selected tab
All MLAG Sessions tabDisplays all MLAG sessions for the given session. By default, the session list is sorted by hostname. This tab provides the following additional data about each session:
  • Backup Ip: IP address of the interface to use if the peerlink (or bond) goes down
  • Backup Ip Active: Indicates whether the backup IP address has been specified and is active (true) or not (false)
  • Bonds
    • Conflicted: Identifies the set of interfaces in a bond that do not match on each end of the bond
    • Single: Identifies a set of interfaces connecting to only one of the two switches
    • Dual: Identifies a set of interfaces connecting to both switches
    • Proto Down: Interface on the switch brought down by the clagd service. Value is blank if no interfaces are down due to clagd service.
  • Mlag Sysmac: Unique MAC address for each bond interface pair. Note: Must be a value between 44:38:39:ff:00:00 and 44:38:39:ff:ff:ff.
  • Peer:
    • If: Name of the peer interface
    • Role: Role of the peer device. Values include primary and secondary.
    • State: Indicates if peer device is up (true) or down (false)
  • Role: Role of the host device. Values include primary and secondary.
  • Timestamp: Date and time the MLAG session was started, deleted, updated, or marked dead (device went down)
  • Vxlan Anycast: Anycast IP address used for VXLAN termination
All Events tabDisplays all events network-wide. By default, the event list is sorted by time, with the most recent events listed first. The tab provides the following additional data about each event:
  • Message: Text description of an event. Example: Clag conflicted bond changed from swp7 swp8 to swp9 swp10
  • Source: Hostname of network device that generated the event
  • Severity: Importance of the event. Values include critical, warning, info, and debug.
  • Type: Network protocol or service generating the event. This always has a value of clag in this card workflow.
Table ActionsSelect, export, or filter the list. Refer to Table Settings.

View Session Status Summary

A summary of the MLAG session is available from the MLAG Session card workflow, showing the node and its peer and current status.

To view the summary:

  1. Open the full screen MLAG Service card.

  2. Select a session from the listing to view.

  3. Close the full screen card to view the medium MLAG Session card.

    In the left example, we see that the tor1 switch plays the secondary role in this session with the switch at 44:38:39:ff:01:01. In the right example, we see that the leaf03 switch plays the primary role in this session with leaf04.

View MLAG Session Peering State Changes

You can view the peering state for a given MLAG session from the medium and large MLAG Session cards. For a given time period, you can determine the stability of the MLAG session between two devices. If you experienced connectivity issues at a particular time, you can use these cards to help verify the state of the peer. If the peer was not alive more than it was alive, you can then investigate further into possible causes.

To view the state transitions for a given MLAG session:

  1. Open the full screen MLAG Service card.

  2. Select a session from the listing to view.

  3. Close the full screen card to view the medium MLAG Session card.

    In this example, the peer switch has been alive for the entire 24-hour period.

From this card, you can also view the node role, peer role and state, and MLAG system MAC address which identify the session in more detail.

To view the peering state transitions for a given MLAG session on the large MLAG Session card, open that card.

From this card, you can also view the alarm and info event counts, node role, peer role, state, and interface, MLAG system MAC address, active backup IP address, single, dual, conflicted, and protocol down bonds, and the VXLAN anycast address identifying the session in more detail.

View Changes to the MLAG Service Configuration File

Each time a change is made to the configuration file for the MLAG service, NetQ logs the change and enables you to compare it with the last version. This can be useful when you are troubleshooting potential causes for alarms or sessions losing their connections.

To view the configuration file changes:

  1. Open the large MLAG Session card.

  2. Hover over the card and click to open the Configuration File Evolution tab.

  3. Select the time of interest on the left; when a change may have impacted the performance. Scroll down if needed.

  4. Choose between the File view and the Diff view (selected option is dark; File by default).

    The File view displays the content of the file for you to review.

    The Diff view displays the changes between this version (on left) and the most recent version (on right) side by side. The changes are highlighted in red and green. In this example, we don’t have any changes after this first creation, so the same file is shown on both sides and no highlighting is present.

All MLAG Session Details

You can view all stored attributes of all of the MLAG sessions associated with the two devices on this card.

To view all session details, open the full screen MLAG Session card, and click the All MLAG Sessions tab.

Where to go next depends on what data you see, but a few options include:

View All MLAG Session Events

You can view all of the alarm and info events for the two devices on this card.

To view all events, open the full screen MLAG Session card, and click the All Events tab.

Where to go next depends on what data you see, but a few options include:

Monitor the OSPF Service

The Cumulus NetQ UI enables operators to view the health of the OSPF service on a network-wide and a per session basis, giving greater insight into all aspects of the service. This is accomplished through two card workflows, one for the service and one for the session. They are described separately here.

Monitor the OSPF Service (All Sessions)

With NetQ, you can monitor the number of nodes running the OSPF service, view switches with the most full and unestablished OSPF sessions, and view alarms triggered by the OSPF service. For an overview and how to configure OSPF to run in your data center network, refer to Open Shortest Path First - OSPF or Open Shortest Path First v3 - OSPFv3.

OSPF Service Card Workflow

The small OSPF Service card displays:

ItemDescription
Indicates data is for all sessions of a Network Service or Protocol
TitleOSPF: All OSPF Sessions, or the OSPF Service
Total number of switches and hosts with the OSPF service enabled during the designated time period
Total number of OSPF-related alarms received during the designated time period
ChartDistribution of OSPF-related alarms received during the designated time period

The medium OSPF Service card displays:

ItemDescription
Time periodRange of time in which the displayed data was collected; applies to all card sizes
Indicates data is for all sessions of a Network Service or Protocol
TitleNetwork Services | All OSPF Sessions
Total number of switches and hosts with the OSPF service enabled during the designated time period
Total number of OSPF-related alarms received during the designated time period
Total Nodes Running chart

Distribution of switches and hosts with the OSPF service enabled during the designated time period, and a total number of nodes running the service currently.

Note: The node count here may be different than the count in the summary bar. For example, the number of nodes running OSPF last week or last month might be more or less than the number of nodes running OSPF currently.

Total Sessions Not Established chart

Distribution of unestablished OSPF sessions during the designated time period, and the total number of unestablished sessions in the network currently.

Note: The node count here may be different than the count in the summary bar. For example, the number of unestablished session last week or last month might be more of less than the number of nodes with unestablished sessions currently.

Total Sessions chartDistribution of OSPF sessions during the designated time period, and the total number of sessions running on the network currently.

The large OSPF service card contains two tabs.

The Sessions Summary tab displays:

ItemDescription
Time periodRange of time in which the displayed data was collected; applies to all card sizes
Indicates data is for all sessions of a Network Service or Protocol
TitleSessions Summary (visible when you hover over card)
Total number of switches and hosts with the OSPF service enabled during the designated time period
Total number of OSPF-related alarms received during the designated time period
Total Nodes Running chart

Distribution of switches and hosts with the OSPF service enabled during the designated time period, and a total number of nodes running the service currently.

Note: The node count here may be different than the count in the summary bar. For example, the number of nodes running OSPF last week or last month might be more or less than the number of nodes running OSPF currently.

Total Sessions chartDistribution of OSPF sessions during the designated time period, and the total number of sessions running on the network currently.
Total Sessions Not Established chart

Distribution of unestablished OSPF sessions during the designated time period, and the total number of unestablished sessions in the network currently.

Note: The node count here may be different than the count in the summary bar. For example, the number of unestablished session last week or last month might be more of less than the number of nodes with unestablished sessions currently.

Table/Filter options

When the Switches with Most Sessions filter option is selected, the table displays the switches and hosts running OSPF sessions in decreasing order of session count-devices with the largest number of sessions are listed first

When the Switches with Most Unestablished Sessions filter option is selected, the table switches and hosts running OSPF sessions in decreasing order of unestablished sessions-devices with the largest number of unestablished sessions are listed first

Show All SessionsLink to view data for all OSPF sessions in the full screen card

The Alarms tab displays:

ItemDescription
Time periodRange of time in which the displayed data was collected; applies to all card sizes
(in header)Indicates data is all alarms for all OSPF sessions
TitleAlarms (visible when you hover over card)
Total number of switches and hosts with the OSPF service enabled during the designated time period
(in summary bar)Total number of OSPF-related alarms received during the designated time period
Total Alarms chart

Distribution of OSPF-related alarms received during the designated time period, and the total number of current OSPF-related alarms in the network.

Note: The alarm count here may be different than the count in the summary bar. For example, the number of new alarms received in this time period does not take into account alarms that have already been received and are still active. You might have no new alarms, but still have a total number of alarms present on the network of 10.

Table/Filter optionsWhen the selected filter option is Switches with Most Alarms, the table displays switches and hosts running OSPF in decreasing order of the count of alarms-devices with the largest number of OSPF alarms are listed first
Show All SessionsLink to view data for all OSPF sessions in the full screen card

The full screen OSPF Service card provides tabs for all switches, all sessions, and all alarms.

ItemDescription
TitleNetwork Services | OSPF
Closes full screen card and returns to workbench
Time periodRange of time in which the displayed data was collected; applies to all card sizes; select an alternate time period by clicking
Displays data refresh status. Click to pause data refresh. Click to resume data refresh. Current refresh rate is visible by hovering over icon.
ResultsNumber of results found for the selected tab
All Switches tabDisplays all switches and hosts running the OSPF service. By default, the device list is sorted by hostname. This tab provides the following additional data about each device:
  • Agent
    • State: Indicates communication state of the NetQ Agent on a given device. Values include Fresh (heard from recently) and Rotten (not heard from recently).
    • Version: Software version number of the NetQ Agent on a given device. This should match the version number of the NetQ software loaded on your server or appliance; for example, 2.1.0.
  • ASIC
    • Core BW: Maximum sustained/rated bandwidth. Example values include 2.0 T and 720 G.
    • Model: Chip family. Example values include Tomahawk, Trident, and Spectrum.
    • Model Id: Identifier of networking ASIC model. Example values include BCM56960 and BCM56854.
    • Ports: Indicates port configuration of the switch. Example values include 32 x 100G-QSFP28, 48 x 10G-SFP+, and 6 x 40G-QSFP+.
    • Vendor: Manufacturer of the chip. Example values include Broadcom and Mellanox.
  • CPU
    • Arch: Microprocessor architecture type. Values include x86_64 (Intel), ARMv7 (AMD), and PowerPC.
    • Max Freq: Highest rated frequency for CPU. Example values include 2.40 GHz and 1.74 GHz.
    • Model: Chip family. Example values include Intel Atom C2538 and Intel Atom C2338.
    • Nos: Number of cores. Example values include 2, 4, and 8.
  • Disk Total Size: Total amount of storage space in physical disks (not total available). Example values: 10 GB, 20 GB, 30 GB.
  • License State: Indicator of validity. Values include ok and bad.
  • Memory Size: Total amount of local RAM. Example values include 8192 MB and 2048 MB.
  • OS
    • Vendor: Operating System manufacturer. Values include Cumulus Networks, RedHat, Ubuntu, and CentOS.
    • Version: Software version number of the OS. Example values include 3.7.3, 2.5.x, 16.04, 7.1.
    • Version Id: Identifier of the OS version. For Cumulus, this is the same as the Version (3.7.x).
  • Platform
    • Date: Date and time the platform was manufactured. Example values include 7/12/18 and 10/29/2015.
    • MAC: System MAC address. Example value: 17:01:AB:EE:C3:F5.
    • Model: Manufacturer's model name. Examples values include AS7712-32X and S4048-ON.
    • Number: Manufacturer part number. Examples values include FP3ZZ7632014A, 0J09D3.
    • Revision: Release version of the platform
    • Series: Manufacturer serial number. Example values include D2060B2F044919GD000060, CN046MRJCES0085E0004.
    • Vendor: Manufacturer of the platform. Example values include Cumulus Express, Dell, EdgeCore, Lenovo, Mellanox.
  • Time: Date and time the data was collected from device.
All Sessions tabDisplays all OSPF sessions network-wide. By default, the session list is sorted by hostname. This tab provides the following additional data about each session:
  • Area: Routing domain for this host device. Example values include 0.0.0.1, 0.0.0.23.
  • Ifname: Name of the interface on host device where session resides. Example values include swp5, peerlink-1.
  • Is IPv6: Indicates whether the address of the host device is IPv6 (true) or IPv4 (false)
  • Peer
    • Address: IPv4 or IPv6 address of the peer device
    • Hostname: User-defined name for peer device
    • ID: Network subnet address of router with access to the peer device
  • State: Current state of OSPF. Values include Full, 2-way, Attempt, Down, Exchange, Exstart, Init, and Loading.
  • Timestamp: Date and time session was started, deleted, updated or marked dead (device is down)
All Alarms tabDisplays all OSPF events network-wide. By default, the event list is sorted by time, with the most recent events listed first. The tab provides the following additional data about each event:
  • Message: Text description of a OSPF-related event. Example: swp4 area ID mismatch with peer leaf02
  • Source: Hostname of network device that generated the event
  • Severity: Importance of the event. Values include critical, warning, info, and debug.
  • Type: Network protocol or service generating the event. This always has a value of OSPF in this card workflow.
Table ActionsSelect, export, or filter the list. Refer to Table Settings.

View Service Status Summary

A summary of the OSPF service is available from the Network Services card workflow, including the number of nodes running the service, the number of OSPF-related alarms, and a distribution of those alarms.

To view the summary, open the small OSPF Service card.

For more detail, select a different size OSPF Service card.

View the Distribution of Sessions

It is useful to know the number of network nodes running the OSPF protocol over a period of time, as it gives you insight into the amount of traffic associated with and breadth of use of the protocol. It is also useful to view the health of the sessions.

To view these distributions, open the medium OSPF Service card.

You can dig a little deeper with the large OSPF Service card tabs.

View Devices with the Most OSPF Sessions

You can view the load from OSPF on your switches and hosts using the large Network Services card. This data enables you to see which switches are handling the most OSPF traffic currently, validate that is what is expected based on your network design, and compare that with data from an earlier time to look for any differences.

To view switches and hosts with the most OSPF sessions:

  1. Open the large OSPF Service card.

  2. Select Switches with Most Sessions from the filter above the table.

    The table content is sorted by this characteristic, listing nodes running the most OSPF sessions at the top. Scroll down to view those with the fewest sessions.

To compare this data with the same data at a previous time:

  1. Open another large OSPF Service card.

  2. Move the new card next to the original card if needed.

  3. Change the time period for the data on the new card by hovering over the card and clicking .

  4. Select the time period that you want to compare with the original time. We chose Past Week for this example.

    You can now see whether there are significant differences between this time and the original time. If the changes are unexpected, you can investigate further by looking at another time frame, determining if more nodes are now running OSPF than previously, looking for changes in the topology, and so forth.

View Devices with the Most Unestablished OSPF Sessions

You can identify switches and hosts that are experiencing difficulties establishing OSPF sessions; both currently and in the past.

To view switches with the most unestablished OSPF sessions:

  1. Open the large OSPF Service card.

  2. Select Switches with Most Unestablished Sessions from the filter above the table.

    The table content is sorted by this characteristic, listing nodes with the most unestablished OSPF sessions at the top. Scroll down to view those with the fewest unestablished sessions.

Where to go next depends on what data you see, but a couple of options include:

Switches or hosts experiencing a large number of OSPF alarms may indicate a configuration or performance issue that needs further investigation. You can view the devices sorted by the number of OSPF alarms and then use the Switches card workflow or the Alarms card workflow to gather more information about possible causes for the alarms. compare the number of nodes running OSPF with unestablished sessions with the alarms present at the same time to determine if there is any correlation between the issues and the ability to establish an OSPF session.

To view switches with the most OSPF alarms:

  1. Open the large OSPF Service card.

  2. Hover over the header and click .

  3. Select Switches with Most Alarms from the filter above the table.

    The table content is sorted by this characteristic, listing nodes with the most OSPF alarms at the top. Scroll down to view those with the fewest alarms.

Where to go next depends on what data you see, but a few options include:

View All OSPF Events

The OSPF Network Services card workflow enables you to view all of the OSPF events in the designated time period.

To view all OSPF events:

  1. Open the full screen OSPF Service card.

  2. Click All Alarms tab in the navigation panel. By default, events are listed in most recent to least recent order.

Where to go next depends on what data you see, but a couple of options include:

View Details for All Devices Running OSPF

You can view all stored attributes of all switches and hosts running OSPF in your network in the full screen card.

To view all device details, open the full screen OSPF Service card and click the All Switches tab.

To return to your workbench, click in the top right corner.

View Details for All OSPF Sessions

You can view all stored attributes of all OSPF sessions in your network in the full-screen card.

To view all session details, open the full screen OSPF Service card and click the All Sessions tab.

To return to your workbench, click in the top right corner.

Use the icons above the table to select/deselect, filter, and export items in the list. Refer to Table Settings for more detail. To return to original display of results, click the associated tab.

Monitor a Single OSPF Session

With NetQ, you can monitor a single session of the OSPF service, view session state changes, and compare with alarms occurring at the same time, as well as monitor the running OSPF configuration and changes to the configuration file. For an overview and how to configure OSPF to run in your data center network, refer to Open Shortest Path First - OSPF or Open Shortest Path First v3 - OSPFv3.

To access the single session cards, you must open the full screen OSPF Service, click the All Sessions tab, select the desired session, then click (Open Cards).

Granularity of Data Shown Based on Time Period

On the medium and large single OSPF session cards, the status of the sessions is represented in heat maps stacked vertically; one for established sessions, and one for unestablished sessions. Depending on the time period of data on the card, the number of smaller time blocks used to indicate the status varies. A vertical stack of time blocks, one from each map, includes the results from all checks during that time. The results are shown by how saturated the color is for each block. If all sessions during that time period were established for the entire time block, then the top block is 100% saturated (white) and the not established block is zero percent saturated (gray). As sessions that are not established increase in saturation, the sessions that are established block is proportionally reduced in saturation. An example heat map for a time period of 24 hours is shown here with the most common time periods in the table showing the resulting time blocks.

Time PeriodNumber of RunsNumber Time BlocksAmount of Time in Each Block
6 hours1861 hour
12 hours36121 hour
24 hours72241 hour
1 week50471 day
1 month2,086301 day
1 quarter7,000131 week

OSPF Session Card Workflow Summary

The small OSPF Session card displays:

ItemDescription
Indicates data is for a single session of a Network Service or Protocol
TitleOSPF Session
Hostnames of the two devices in a session. Host appears on top with peer below.
, Current state of OSPF.
Full or 2-way, Attempt, Down, Exchange, Exstart, Init, and Loading.

The medium OSPF Session card displays:

ItemDescription
Time periodRange of time in which the displayed data was collected; applies to all card sizes
Indicates data is for a single session of a Network Service or Protocol
TitleNetwork Services | OSPF Session
Hostnames of the two devices in a session. Host appears on top with peer below.
, Current state of OSPF.
Full or 2-way, Attempt, Down, Exchange, Exstart, Init, and Loading.
Time period for chartTime period for the chart data
Session State Changes ChartHeat map of the state of the given session over the given time period. The status is sampled at a rate consistent with the time period. For example, for a 24 hour period, a status is collected every hour. Refer to Granularity of Data Shown Based on Time Period.
IfnameInterface name on or hostname for host device where session resides
Peer AddressIP address of the peer device
Peer IDIP address of router with access to the peer device

The large OSPF Session card contains two tabs.

The Session Summary tab displays:

ItemDescription
Time periodRange of time in which the displayed data was collected; applies to all card sizes
Indicates data is for a single session of a Network Service or Protocol
TitleSession Summary (Network Services | OSPF Session)
Summary bar

Hostnames of the two devices in a session. Arrow points in the direction of the session.

Current state of OSPF. Full or 2-way, Attempt, Down, Exchange, Exstart, Init, and Loading.

Session State Changes ChartHeat map of the state of the given session over the given time period. The status is sampled at a rate consistent with the time period. For example, for a 24 hour period, a status is collected every hour. Refer to Granularity of Data Shown Based on Time Period.
Alarm Count ChartDistribution and count of OSPF alarm events over the given time period
Info Count ChartDistribution and count of OSPF info events over the given time period
IfnameName of the interface on the host device where the session resides
StateCurrent state of OSPF.
Full or 2-way, Attempt, Down, Exchange, Exstart, Init, and Loading.
Is UnnumberedIndicates if the session is part of an unnumbered OSPF configuration (true) or part of a numbered OSPF configuration (false)
Nbr CountNumber of routers in the OSPF configuration
Is PassiveIndicates if the host is in a passive state (true) or active state (false).
Peer IDIP address of router with access to the peer device
Is IPv6Indicates if the IP address of the host device is IPv6 (true) or IPv4 (false)
If UpIndicates if the interface on the host is up (true) or down (false)
Nbr Adj CountNumber of adjacent routers for this host
MTUMaximum transmission unit (MTU) on shortest path between the host and peer
Peer AddressIP address of the peer device
AreaRouting domain of the host device
Network TypeArchitectural design of the network. Values include Point-to-Point and Broadcast.
CostShortest path through the network between the host and peer devices
Dead TimeCountdown timer, starting at 40 seconds, that is constantly reset as messages are heard from the neighbor. If the dead time gets to zero, the neighbor is presumed dead, the adjacency is torn down, and the link removed from SPF calculations in the OSPF database.

The Configuration File Evolution tab displays:

ItemDescription
Time periodRange of time in which the displayed data was collected; applies to all card sizes
Indicates configuration file information for a single session of a Network Service or Protocol
Title(Network Services | OSPF Session) Configuration File Evolution
Device identifiers (hostname, IP address, or MAC address) for host and peer in session. Arrow points from the host to the peer. Click to open associated device card.
, Current state of OSPF.
Full or 2-way, Attempt, Down, Exchange, Exstart, Init, and Loading.
TimestampsWhen changes to the configuration file have occurred, the date and time are indicated. Click the time to see the changed file.
Configuration File

When File is selected, the configuration file as it was at the selected time is shown.

When Diff is selected, the configuration file at the selected time is shown on the left and the configuration file at the previous timestamp is shown on the right. Differences are highlighted.

The full screen OSPF Session card provides tabs for all OSPF sessions and all events.

ItemDescription
TitleNetwork Services | OSPF
Closes full screen card and returns to workbench
Time periodRange of time in which the displayed data was collected; applies to all card sizes; select an alternate time period by clicking
Displays data refresh status. Click to pause data refresh. Click to resume data refresh. Current refresh rate is visible by hovering over icon.
ResultsNumber of results found for the selected tab
All OSPF Sessions tabDisplays all OSPF sessions running on the host device. The session list is sorted by hostname by default. This tab provides the following additional data about each session:
  • Area: Routing domain for this host device. Example values include 0.0.0.1, 0.0.0.23.
  • Ifname: Name of the interface on host device where session resides. Example values include swp5, peerlink-1.
  • Is IPv6: Indicates whether the address of the host device is IPv6 (true) or IPv4 (false)
  • Peer
    • Address: IPv4 or IPv6 address of the peer device
    • Hostname: User-defined name for peer device
    • ID: Network subnet address of router with access to the peer device
  • State: Current state of OSPF. Values include Full, 2-way, Attempt, Down, Exchange, Exstart, Init, and Loading.
  • Timestamp: Date and time session was started, deleted, updated or marked dead (device is down)
All Events tabDisplays all events network-wide. By default, the event list is sorted by time, with the most recent events listed first. The tab provides the following additional data about each event:
  • Message: Text description of a OSPF-related event. Example: OSPF session with peer tor-1 swp7 vrf default state changed from failed to Established
  • Source: Hostname of network device that generated the event
  • Severity: Importance of the event. Values include critical, warning, info, and debug.
  • Type: Network protocol or service generating the event. This always has a value of OSPF in this card workflow.
Table ActionsSelect, export, or filter the list. Refer to Table Settings.

View Session Status Summary

A summary of the OSPF session is available from the OSPF Session card workflow, showing the node and its peer and current status.

To view the summary:

  1. Add the Network Services | All OSPF Sessions card.

  2. Switch to the full screen card.

  3. Click the All Sessions tab.

  4. Double-click the session of interest. The full screen card closes automatically.

  5. Optionally, switch to the small OSPF Session card.

View OSPF Session State Changes

You can view the state of a given OSPF session from the medium and large OSPF Session Network Service cards. For a given time period, you can determine the stability of the OSPF session between two devices. If you experienced connectivity issues at a particular time, you can use these cards to help verify the state of the session. If it was not established more than it was established, you can then investigate further into possible causes.

To view the state transitions for a given OSPF session, on the medium OSPF Session card:

  1. Add the Network Services | All OSPF Sessions card.

  2. Switch to the full screen card.

  3. Open the large OSPF Service card.

  4. Click the All Sessions tab.

  5. Double-click the session of interest. The full screen card closes automatically.

The heat map indicates the status of the session over the designated time period. In this example, the session has been established for the entire time period.

From this card, you can also view the interface name, peer address, and peer id identifying the session in more detail.

To view the state transitions for a given OSPF session on the large OSPF Session card, follow the same steps to open the medium OSPF Session card and then switch to the large card.

From this card, you can view the alarm and info event counts, interface name, peer address and peer id, state, and several other parameters identifying the session in more detail.

View Changes to the OSPF Service Configuration File

Each time a change is made to the configuration file for the OSPF service, NetQ logs the change and enables you to compare it with the last version. This can be useful when you are troubleshooting potential causes for alarms or sessions losing their connections.

To view the configuration file changes:

  1. Open the large OSPF Session card.

  2. Hover over the card and click to open the Configuration File Evolution tab.

  3. Select the time of interest on the left; when a change may have impacted the performance. Scroll down if needed.

  4. Choose between the File view and the Diff view (selected option is dark; File by default).

    The File view displays the content of the file for you to review.

    The Diff view displays the changes between this version (on left) and the most recent version (on right) side by side. The changes are highlighted in red and green. In this example, we don’t have a change to highlight, so it shows the same file on both sides.

View All OSPF Session Details

You can view all stored attributes of all of the OSPF sessions associated with the two devices on this card.

To view all session details, open the full screen OSPF Session card, and click the All OSPF Sessions tab.

To return to your workbench, click in the top right corner.

View All Events

You can view all of the alarm and info events for the two devices on this card.

To view all events, open the full screen OSPF Session card, and click the All Events tab.

To return to your workbench, click in the top right corner.

Monitor Network Connectivity

It is helpful to verify that communications are freely flowing between the various devices in your network. You can verify the connectivity between two devices in both an adhoc fashion and by defining connectivity checks to occur on a scheduled basis. There are three card workflows which enable you to view connectivity, the Trace Request, On-demand Trace Results, and Scheduled Trace Results.

Create a Trace Request

Two types of connectivity checks can be run-an immediate (on-demand) trace and a scheduled trace. The Trace Request card workflow is used to configure and run both of these trace types.

Trace Request Card Workflow Summary

The small Trace Request card displays:

ItemDescription
Indicates a trace request
Select Trace listSelect a scheduled trace request from the list
GoClick to start the trace now

The medium Trace Request card displays:

ItemDescription
Indicates a trace request
TitleNew Trace Request
New Trace RequestCreate a new layer 3 trace request. Use the large Trace Request card to create a new layer 2 or 3 request.
Source(Required) Hostname or IP address of device where to begin the trace
Destination(Required) IP address of device where to end the trace
Run NowStart the trace now

The large Trace Request card displays:

ItemDescription
Indicates a trace request
TitleNew Trace Request
Trace selectionLeave New Trace Request selected to create a new request, or choose a scheduled request from the list.
Source(Required) Hostname or IP address of device where to begin the trace.
Destination(Required) Ending point for the trace. For layer 2 traces, value must be a MAC address. For layer 3 traces, value must be an IP address.
VRFOptional for layer 3 traces. Virtual Route Forwarding interface to be used as part of the trace path.
VLAN IDRequired for layer 2 traces. Virtual LAN to be used as part of the trace path.
ScheduleSets the frequency with which to run a new trace (Run every) and when to start the trace for the first time (Starting).
Run NowStart the trace now
UpdateUpdate is available when a scheduled trace request is selected from the dropdown list and you make a change to its configuration. Clicking Update saves the changes to the existing scheduled trace.
Save As NewSave As New is available in two instances:
  • When you enter a source, destination, and schedule for a new trace. Clicking Save As New in this instance saves the new scheduled trace.
  • When changes are made to a selected scheduled trace request. Clicking Save As New in this instance saves the modified scheduled trace without changing the original trace on which it was based.

The full screen Trace Request card displays:

ItemDescription
TitleTrace Request
Closes full screen card and returns to workbench
Time periodRange of time in which the displayed data was collected; applies to all card sizes; select an alternate time period by clicking
ResultsNumber of results found for the selected tab
Schedule Preview tabDisplays all scheduled trace requests for the given user. By default, the listing is sorted by Start Time, with the most recently started traces listed at the top. The tab provides the following additional data about each event:
  • Action: Indicates latest action taken on the trace job. Values include Add, Deleted, Update.
  • Frequency: How often the trace is scheduled to run
  • Active: Indicates if trace is actively running (true), or stopped from running (false)
  • ID: Internal system identifier for the trace job
  • Trace Name: User-defined name for a trace
  • Trace Params: Indicates source and destination, optional VLAN or VRF specified, and whether to alert on failure
Table ActionsSelect, export, or filter the list. Refer to Table Settings.

Create a Layer 3 On-demand Trace Request

It is helpful to verify the connectivity between two devices when you suspect an issue is preventing proper communication between them. It you cannot find a path through a layer 3 path, you might also try checking connectivity through a layer 2 path.

To create a layer 3 trace request:

  1. Open the medium Trace Request card.

  2. In the Source field, enter the hostname or IP address of the device where you want to start the trace.

  3. In the Destination field, enter the IP address of the device where you want to end the trace.

    In this example, we are starting our trace at server02 and ending