RDMA over Converged Ethernet - RoCE
RDMA over Converged Ethernet (RoCE) provides the ability to write to compute or storage elements using remote direct memory access (RDMA) over an Ethernet network instead of using host CPUs. RoCE relies on congestion control and lossless Ethernet to operate. Cumulus Linux supports features that can enable lossless Ethernet for RoCE environments. Note that while Cumulus Linux can support RoCE environments, the hosts send and receive the RoCE packets.
RoCE helps you obtain a converged network, where all services run over the Ethernet infrastructure, including Infiniband apps.
There are two versions of RoCE, which run at separate layers of the stack:
- RoCEv1, which runs at the link layer and cannot be run over a routed network. Therefore, it requires the link layer priority flow control (PFC) to be enabled.
- RoCEv2, which runs over layer 3. Since it’s a routed solution, Cumulus Networks recommends you use explicit congestion notification (ECN) with RoCEv2 since ECN bits are communicated end-to-end across a routed network.
Enabling RDMA over Converged Ethernet with PFC
RoCEv1 uses the Infiniband (IB) Protocol over converged Ethernet. The IB global route header rides directly on top of the Ethernet header. The lossless Ethernet layer handles congestion hop by hop.
While link pause is another way to provide lossless ethernet, PFC is the preferred method. PFC allows more granular control by pausing the traffic flow for a given CoS group, rather than the entire link.
Enabling RDMA over Converged Ethernet with ECN
RoCEv2 requires flow control for lossless Ethernet. RoCEv2 uses the Infiniband (IB) Transport Protocol over UDP. The IB transport protocol includes an end-to-end reliable delivery mechanism, and has its own sender notification mechanism.
RoCEv2 congestion management uses RFC 3168 to signal congestion experienced to the receiver. The receiver generates an RoCEv2 congestion notification packet directed to the source of the packet.
- RoCE introduction - roceinitiative.org
- RoCEv2 congestion management - community.mellanox.com
- Configuring RoCE over a DSCP-based lossless network with a Mellanox Spectrum switch