Upgrade Cumulus Linux Using LCM

LCM provides the ability to upgrade Cumulus Linux on one or more switches in your network through the NetQ UI or the NetQ CLI. Up to five upgrade jobs can be run simultaneously; however, a given switch can only be contained in one running job at a time.

Upgrades can be performed between Cumulus Linux 3.x releases, and between Cumulus Linux 4.x releases. Lifecycle management does not support upgrades from Cumulus Linux 3.x to 4.x releases.

Workflows for Cumulus Linux Upgrades Using LCM

There are three methods available through LCM for upgrading Cumulus Linux on your switches based on whether the NetQ Agent is already installed on the switch or not, and whether you want to use the NetQ UI or the NetQ CLI:

  • Use NetQ UI or NetQ CLI for switches with NetQ 2.4.x or later Agent already installed
  • Use NetQ UI for switches without NetQ Agent installed

The workflows vary slightly with each approach:

  • Using the NetQ UI for switches with NetQ Agent installed, the workflow is:

  • Using the NetQ CLI for switches with NetQ Agent installed, the workflow is:

  • Using the NetQ UI for switches without NetQ Agent installed, the workflow is:

Upgrade Cumulus Linux on Switches with NetQ Agent Installed

You can upgrade Cumulus Linux on switches that already have a NetQ Agent (version 2.4.x or later) installed using either the NetQ UI or NetQ CLI.

Prepare for Upgrade

  1. Click (Switches) in any workbench header, then click Manage switches.

  2. Upload the Cumulus Linux upgrade images.

  3. Optionally, specify a default upgrade version.

  4. Verify the switches you want to manage are running NetQ Agent 2.4 or later. Refer to Manage Switches.

  5. Optionally, create a new NetQ configuration profile.

  6. Configure switch access credentials.

  7. Assign a role to each switch (optional, but recommended).

Your LCM dashboard should look similar to this after you have completed these steps:

  1. Verify network access to the relevant Cumulus Linux license file.

  2. Upload the Cumulus Linux upgrade images.

  3. Verify the switches you want to manage are running NetQ Agent 2.4 or later. Refer to Manage Switches.

  4. Configure switch access credentials.

  5. Assign a role to each switch (optional, but recommended).

Perform a Cumulus Linux Upgrade

Upgrade Cumulus Linux on switches through either the NetQ UI or NetQ CLI:

  1. Click (Switches) in any workbench header, then select Manage switches.

  2. Click Manage on the Switches card.

  1. Select the individual switches (or click to select all switches) that you want to upgrade. If needed, use the filter to the narrow the listing and find the relevant switches.
  1. Click (Upgrade CL) above the table.

    From this point forward, the software walks you through the upgrade process, beginning with a review of the switches that you selected for upgrade.

  1. Give the upgrade job a name. This is required, but can be no more than 22 characters, including spaces and special characters.

  2. Verify that the switches you selected are included, and that they have the correct IP address and roles assigned.

    • If you accidentally included a switch that you do NOT want to upgrade, hover over the switch information card and click to remove it from the upgrade job.
    • If the role is incorrect or missing, click , then select a role for that switch from the dropdown. Click to discard a role change.
  1. When you are satisfied that the list of switches is accurate for the job, click Next.

  2. Verify that you want to use the default Cumulus Linux or NetQ version for this upgrade job. If not, click Custom and select an alternate image from the list.

Default CL Version Selected

Default CL Version Selected

Custom CL Version Selected

Custom CL Version Selected

  1. Note that the switch access authentication method, Using global access credentials, indicates you have chosen either basic authentication with a username and password or SSH key-based authentication for all of your switches. Authentication on a per switch basis is not currently available.

  2. Click Next.

  3. Verify the upgrade job options.

    By default, NetQ takes a network snapshot before the upgrade and then one after the upgrade is complete. It also performs a roll back to the original Cumulus Linux version on any server which fails to upgrade.

    You can exclude selected services and protocols from the snapshots. By default, node and services are included, but you can deselect any of the other items. Click on one to remove it; click again to include it. This is helpful when you are not running a particular protocol or you have concerns about the amount of time it will take to run the snapshot. Note that removing services or protocols from the job may produce non-equivalent results compared with prior snapshots.

    While these options provide a smoother upgrade process and are highly recommended, you have the option to disable these options by clicking No next to one or both options.

  1. Click Next.

  2. After the pre-checks have completed successfully, click Preview. If there are failures, refer to Precheck Failures.

    These checks verify the following:

    • Selected switches are not currently scheduled for, or in the middle of, a Cumulus Linux or NetQ Agent upgrade
    • Selected versions of Cumulus Linux and NetQ Agent are valid upgrade paths
    • All mandatory parameters have valid values, including MLAG configurations
    • All switches are reachable
    • The order to upgrade the switches, based on roles and configurations
  1. Review the job preview.

    When all of your switches have roles assigned, this view displays the chosen job options (top center), the pre-checks status (top right and left in Pre-Upgrade Tasks), the order in which the switches are planned for upgrade (center; upgrade starts from the left), and the post-upgrade tasks status (right).

Roles assigned

Roles assigned

When none of your switches have roles assigned or they are all of the same role, this view displays the chosen job options (top center), the pre-checks status (top right and left in Pre-Upgrade Tasks), a list of switches planned for upgrade (center), and the post-upgrade tasks status (right).
All roles the same

All roles the same

When some of your switches have roles assigned, any switches without roles are upgraded last and are grouped under the label *Stage1*.
Some roles assigned

Some roles assigned

  1. When you are happy with the job specifications, click Start Upgrade.

  2. Click Yes to confirm that you want to continue with the upgrade, or click Cancel to discard the upgrade job.

Perform the upgrade using the netq lcm upgrade command, providing a name for the upgrade job, the Cumulus Linux and NetQ version, and the hostname(s) to be upgraded:

cumulus@switch:~$ netq lcm upgrade name upgrade-cl410 cl-version 4.1.0 netq-version 3.1.0 hostnames spine01,spine02

Optionally, you can apply some job options, including creation of network snapshots and previous version restoration if a failure occurs.

Network Snapshot Creation

You can also generate a Network Snapshot before and after the upgrade by adding the run-before-after option to the command:

cumulus@switch:~$ netq lcm upgrade name upgrade-3712 cl-version 3.7.12 netq-version 3.1.0 hostnames spine01,spine02,leaf01,leaf02 order spine,leaf run-before-after

Restore on an Upgrade Failure

You can have LCM restore the previous version of Cumulus Linux if the upgrade job fails by adding the run-restore-on-failure option to the command. This is highly recommended.

cumulus@switch:~$ netq lcm upgrade name upgrade-3712 cl-version 3.7.12 netq-version 3.1.0 hostnames spine01,spine02,leaf01,leaf02 order spine,leaf run-restore-on-failure

Precheck Failures

If one or more of the pre-checks fail, resolve the related issue and start the upgrade again. In the NetQ UI these failures appear on the Upgrade Preview page. In the NetQ CLI, it appears in the form of error messages in the netq lcm show upgrade-jobs command output.

Expand the following dropdown to view common failures, their causes and corrective actions.

Precheck Failure Messages

Analyze Results

After starting the upgrade you can monitor the progress of your upgrade job and the final results. While the views are different, essentially the same information is available from either the NetQ UI or the NetQ CLI.

You can track the progress of your upgrade job from the Preview page or the Upgrade History page of the NetQ UI.

From the preview page, a green circle with rotating arrows is shown above each step as it is working. Alternately, you can close the detail of the job and see a summary of all current and past upgrade jobs on the Upgrade History page. The job started most recently is shown at the bottom, and the data is refreshed every minute.

If you are disconnected while the job is in progress, it may appear as if nothing is happening. Try closing (click ) and reopening your view (click ), or refreshing the page.

Several viewing options are available for monitoring the upgrade job.

  • Monitor the job with full details open on the Preview page:
Single role

Single role

Multiple roles and some without roles

Multiple roles and some without roles

Each switch goes through a number of steps. To view these steps, click Details and scroll down as needed. Click collapse the step detail. Click to close the detail popup.
  • Monitor the job with summary information only in the CL Upgrade History page. Open this view by clicking in the full details view:
This view is refreshed automatically. Click to view what stage the job is in.
Click to view the detailed view.
  • Monitor the job through the CL Upgrade History card on the LCM dashboard. Click twice to return to the LCM dashboard. As you perform more upgrades the graph displays the success and failure of each job.
Click View to return to the Upgrade History page as needed.

Sample Successful Upgrade

On successful completion, you can:

  • Compare the network snapshots taken before and after the upgrade.
Click Compare Snapshots in the detail view.
Refer to Interpreting the Comparison Data for information about analyzing these results.
  • Download details about the upgrade in the form of a JSON-formatted file, by clicking Download Report.

  • View the changes on the Switches card of the LCM dashboard.

    Click Main Menu, then Upgrade Switches.

In our example, all switches have been upgraded to Cumulus Linux 3.7.12.

Sample Failed Upgrade

If an upgrade job fails for any reason, you can view the associated error(s):

  1. From the CL Upgrade History dashboard, find the job of interest.
  1. Click .

  2. Click .

Note in this example, all of the pre-upgrade tasks were successful, but backup failed on the spine switches.
  1. To view what step in the upgrade process failed, click and scroll down. Click to close the step list.
  1. To view details about the errors, either double-click the failed step or click Details and scroll down as needed. Click collapse the step detail. Click to close the detail popup.

To see the progress of current upgrade jobs and the history of previous upgrade jobs, run netq lcm show upgrade-jobs:

cumulus@switch:~$ netq lcm show upgrade-jobs
Job ID       Name            CL Version           Pre-Check Status                 Warnings         Errors       Start Time
------------ --------------- -------------------- -------------------------------- ---------------- ------------ --------------------
job_cl_upgra Leafs upgr to C 4.2.0                COMPLETED                                                      Fri Sep 25 17:16:10
de_ff9c35bc4 L410                                                                                                2020
950e92cf49ac
bb7eb4fc6e3b
7feca7d82960
570548454c50
cd05802
job_cl_upgra Spines to 4.2.0 4.2.0                COMPLETED                                                      Fri Sep 25 16:37:08
de_9b60d3a1f                                                                                                     2020
dd3987f787c7
69fd92f2eef1
c33f56707f65
4a5dfc82e633
dc3b860
job_upgrade_ 3.7.12 Upgrade  3.7.12               WARNING                                                        Fri Apr 24 20:27:47
fda24660-866                                                                                                     2020
9-11ea-bda5-
ad48ae2cfafb
job_upgrade_ DataCenter      3.7.12               WARNING                                                        Mon Apr 27 17:44:36
81749650-88a                                                                                                     2020
e-11ea-bda5-
ad48ae2cfafb
job_upgrade_ Upgrade to CL3. 3.7.12               COMPLETED                                                      Fri Apr 24 17:56:59
4564c160-865 7.12                                                                                                2020
3-11ea-bda5-
ad48ae2cfafb

To see details of a particular upgrade job, run netq lcm show status job-ID:

cumulus@switch:~$ netq lcm show status job_upgrade_fda24660-8669-11ea-bda5-ad48ae2cfafb
Hostname    CL Version    Backup Status    Backup Start Time         Restore Status    Restore Start Time        Upgrade Status    Upgrade Start Time
----------  ------------  ---------------  ------------------------  ----------------  ------------------------  ----------------  ------------------------
spine02     4.1.0         FAILED           Fri Sep 25 16:37:40 2020  SKIPPED_ON_FAILURE  N/A                   SKIPPED_ON_FAILURE  N/A
spine03     4.1.0         FAILED           Fri Sep 25 16:37:40 2020  SKIPPED_ON_FAILURE  N/A                   SKIPPED_ON_FAILURE  N/A
spine04     4.1.0         FAILED           Fri Sep 25 16:37:40 2020  SKIPPED_ON_FAILURE  N/A                   SKIPPED_ON_FAILURE  N/A
spine01     4.1.0         FAILED           Fri Sep 25 16:40:26 2020  SKIPPED_ON_FAILURE  N/A                   SKIPPED_ON_FAILURE  N/A

Postcheck Failures

Upgrades can be considered successful and still have post-check warnings. For example, the OS has been updated, but not all services are fully up and running after the upgrade. If one or more of the post-checks fail, warning messages are provided in the Post-Upgrade Tasks section of the preview. Click on the warning category to view the detailed messages.

Expand the following dropdown to view common failures, their causes and corrective actions.

Post-check Failure Messages

Reasons for Upgrade Job Failure

Upgrades can fail at any of the stages of the process, including when backing up data, upgrading the Cumulus Linux software, and restoring the data. Failures can occur when attempting to connect to a switch or perform a particular task on the switch.

Some of the common reasons for upgrade failures and the errors they present:

ReasonError Message
Switch is not reachable via SSHData could not be sent to remote host “192.168.0.15”. Make sure this host can be reached over ssh: ssh: connect to host 192.168.0.15 port 22: No route to host
Switch is reachable, but user-provided credentials are invalidInvalid/incorrect username/password. Skipping remaining 2 retries to prevent account lockout: Warning: Permanently added ‘<hostname-ipaddr>’ to the list of known hosts. Permission denied, please try again.
Switch is reachable, but a valid Cumulus Linux license is not installed1587866683.880463 2020-04-26 02:04:43 license.c:336 CRIT No license file. No license installed!
Upgrade task could not be runFailure message depends on the why the task could not be run. For example: /etc/network/interfaces: No such file or directory
Upgrade task failedFailed at- <task that failed>. For example: Failed at- MLAG check for the peerLink interface status
Retry failed after five attemptsFAILED In all retries to process the LCM Job

Upgrade Cumulus Linux on Switches Without NetQ Agent Installed

When you want to update Cumulus Linux on switches without NetQ installed, NetQ provides the LCM switch discovery feature. The feature browses your network to find all Cumulus Linux Switches, with and without NetQ currently installed and determines the versions of Cumulus Linux and NetQ installed. The results of switch discovery are then used to install or upgrade Cumulus Linux and Cumulus NetQ on all discovered switches in a single procedure rather than in two steps. Up to five jobs can be run simultaneously; however, a given switch can only be contained in one running job at a time.

If all of your Cumulus Linux switches already have NetQ 2.4.x or later installed, you can upgrade them directly. Refer to Upgrade Cumulus Linux.

To discover switches running Cumulus Linux and upgrade Cumulus Linux and NetQ on them:

  1. Click Main Menu (Main Menu) and select Upgrade Switches, or click (Switches) in the workbench header, then click Manage switches.

  2. On the Switches card, click Discover.

  3. Enter a name for the scan.

  4. Choose whether you want to look for switches by entering IP address ranges OR import switches using a comma-separated values (CSV) file.

    If you do not have a switch listing, then you can manually add the address ranges where your switches are located in the network. This has the advantage of catching switches that may have been missed in a file.

    A maximum of 50 addresses can be included in an address range. If necessary, break the range into smaller ranges.

    To discover switches using address ranges:

    1. Enter an IP address range in the IP Range field.

      Ranges can be contiguous, for example 192.168.0.24-64, or non-contiguous, for example 192.168.0.24-64,128-190,235, but they must be contained within a single subnet.

    2. Optionally, enter another IP address range (in a different subnet) by clicking .

      For example, 198.51.100.0-128 or 198.51.100.0-128,190,200-253.

    3. Add additional ranges as needed. Click to remove a range if needed.

    If you decide to use a CSV file instead, the ranges you entered will remain if you return to using IP ranges again.

    If you have a file of switches that you want to import, then it can be easier to use that, than to enter the IP address ranges manually.

    To import switches through a CSV file:

    1. Click Browse.

    2. Select the CSV file containing the list of switches.

      The CSV file must include a header containing hostname, ip, and port. They can be in any order you like, but the data must match that order. For example, a CSV file that represents the Cumulus reference topology could look like this:

    or this:

    You must have an IP address in your file, but the hostname is optional and if the port is blank, NetQ uses switch port 22 by default.

    Click Remove if you decide to use a different file or want to use IP address ranges instead. If you had entered ranges prior to selecting the CSV file option, they will have remained.

  5. Note that the switch access credentials defined in Manage Switch Credentials are used to access these switches. If you have issues accessing the switches, you may need to update your credentials.

  6. Click Next.

    When the network discovery is complete, NetQ presents the number of Cumulus Linux switches it has found. They are displayed in categories:

    • Discovered without NetQ: Switches found without NetQ installed
    • Discovered with NetQ: Switches found with some version of NetQ installed
    • Discovered but Rotten: Switches found that are unreachable
    • Incorrect Credentials: Switches found that cannot be reached because the provided access credentials do not match those for the switches
    • OS not Supported: Switches found that are running Cumulus Linux version not supported by the LCM upgrade feature
    • Not Discovered: IP addresses which did not have an associated Cumulus Linux switch

    If no switches are found for a particular category, that category is not displayed.

  7. Select which switches you want to upgrade from each category by clicking the checkbox on each switch card.

  8. Click Next.

  9. Verify the number of switches identified for upgrade and the configuration profile to be applied is correct.

  10. Accept the default NetQ version or click Custom and select an alternate version.

  11. By default, the NetQ Agent and CLI are upgraded on the selected switches. If you do not want to upgrade the NetQ CLI, click Advanced and change the selection to No.

  12. Click Next.

  13. Several checks are performed to eliminate preventable problems during the install process.

    These checks verify the following:

    • Selected switches are not currently scheduled for, or in the middle of, a Cumulus Linux or NetQ Agent upgrade
    • Selected versions of Cumulus Linux and NetQ Agent are valid upgrade paths
    • All mandatory parameters have valid values, including MLAG configurations
    • All switches are reachable
    • The order to upgrade the switches, based on roles and configurations

    If any of the pre-checks fail, review the error messages and take appropriate action.

    If all of the pre-checks pass, click Install to initiate the job.

  14. Monitor the job progress.

    After starting the upgrade you can monitor the progress from the preview page or the Upgrade History page.

    From the preview page, a green circle with rotating arrows is shown on each switch as it is working. Alternately, you can close the detail of the job and see a summary of all current and past upgrade jobs on the NetQ Install and Upgrade History page. The job started most recently is shown at the top, and the data is refreshed periodically.

    If you are disconnected while the job is in progress, it may appear as if nothing is happening. Try closing (click ) and reopening your view (click ), or refreshing the page.

    Several viewing options are available for monitoring the upgrade job.

    • Monitor the job with full details open:

    • Monitor the job with only summary information in the NetQ Install and Upgrade History page. Open this view by clicking in the full details view; useful when you have multiple jobs running simultaneously

    • Monitor the job through the NetQ Install and Upgrade History card on the LCM dashboard. Click twice to return to the LCM dashboard.

  15. Investigate any failures and create new jobs to reattempt the upgrade.