Editing Core Offloads (section)

== Introduction ==

This specification defines a core feature set that constitutes a minimum requirement for a server grade network interface card (NIC). It prescribes feature set and behavior, gives a rationale for the choices made and lists conformance tests.

Network interface cards can offload the host CPU by offloading network packet processing to fixed-function hardware on the NIC. Modern servers have come to depend on these I/O offloads, to the extent that a broken or missing implementation may disqualify a device for deployment.

<span id="why-standardization"></span>
=== Why Standardization ===

Experience shows that even well known and widely deployed features have subtly different behavior between devices due to incomplete feature specifications, protocol peculiarities and unclear operating conditions. This document aims to define features in sufficient detail to avoid ambiguity, warn about common implementation bugs and model representative workloads. Fundamentally, it aims to share and codify network device expertise, in an open format that is publicly accessible, unencumbered by NDAs and based on broad input from across the industry.

<span id="target"></span>
=== Target ===

Target platforms are high-end servers with many CPU cores, 100+ Gbps and 100+ Mpps. Though applicability is likely wider, the principal target is large scale deployment in data centers (“hyperscale”). That environment adds requirements for interoperability of heterogeneous hardware and monitoring at scale.

<span id="scope"></span>
=== Scope ===

This specification covers the core offloads that may be required of every NIC in the server domain. Sufficiently novel features that cannot be considered standard are out of scope. Sufficiently complex features are left for separate targeted specifications.

Specifically excluded complex features are inline cryptography and virtualization support, including smartNICs (“IPU”, “DPU”), virtual switch abstractions, PCI Virtual Functions, SR-IOV and Scalable IOV.

Also excluded are hardware requirements, such as power usage. Those are captured by the OCP NIC hardware spec 3.0 [ref_id:ocp_nic]. The two specifications are independent. A NIC can conform to this spec without conforming to the hardware spec and vice versa.

Multi-host devices, as defined in the OCP NIC 3.0 spec[ref_id:ocp_nic], are in scope. All requirements are specified per device, and are assumed to be distributed equally across hosts.

<span id="workloads"></span>
==== Workloads ====

Hyperscale servers are deployed in planetary scale environments, with many sites of tens of thousands or more servers. Each server can run hundreds of tasks across hundreds of cores. Each task can communicate with tens of thousands of peers. Network incast and outcast can reach millions of connections per host, tens of millions at the tail.

At this scale, connection establishment rate becomes a significant design consideration besides absolute connection count. From experimental results we derive 100K connections/sec per host as the minimum level that must be supported, and an order of magnitude higher to be future proof for the expected lifetime of new devices. At these levels, systems that scale O(1) with connection count are strongly preferred over those that scale O(N). This is the principal reason to prefer stateless device offloads.

Hyperscale servers today should be expected to scale at the 99% to at least

* connection count: 10M TCP/IP connections
* connection rate:100K connections/sec

Hyperscale workloads can be mixes of many applications. They can be generalized into three types.

* high priority, latency sensitive, such as user facing traffic
* low priority, latency insensitive, such as map-reduce style jobs
* high performance computing: dominated by machine learning workloads

<br />
Machines may run a mix of workloads to increase hardware utilization. This way they can offer assured service to high priority tasks, while scheduling low priority tasks on surplus resources at best effort. This model requires strong quality of service isolation to meet latency sensitive traffic service level objectives (SLOs).

<span id="interface"></span>
=== Interface ===

This specification covers behavior of the device-dependent driver as observed by the host operating system. Hardware implementation details are expressly out of scope. No specific device API is prescribed.

The host operating system is not defined. The same features can and often are used by multiple operating systems. In practice, requirements are derived from experience with Linux, the most widely used operating system in hyperscale deployments.

<span id="validation"></span>
=== Validation ===

The specification has an accompanying open source testsuite. Where possible, feature sections conclude with an introduction of an open source conformance test. The testsuite covers the majority of offloads, but is a work in progress. It is intended to be an on-going community effort.

<span id="certification"></span>
=== Certification ===

The goal is to allow for self certification: vendors can qualify their hardware against the spec and publish the results. Doing so reduces repetitive work, as it moves qualification from a large community of customers to the smaller set of vendors. By introducing a shared language and test suite, both unencumbered by legal limitations, it can also simplify communication between customers and vendors.

Devices may not conform fully to the spec. This is understood and acceptable. The specification is a starting point. A customer and vendor can agree to drop certain requirements and add or adjust others. The specification is an initial blueprint to these conversations. Vendors SHOULD publish their conformance to the specification, with an explicit list of known deviations. This can take the form of a vendor column to the list in Appendix A, plus optional clarification text for the deviations.

<span id="style-and-terminology"></span>
=== Style and Terminology ===

This document adopts IETF style as specified in [RFC2119], [RFC2223] and [RFC7322]. The key words “MUST”, “MUST NOT”, “REQUIRED”, “SHALL”, “SHALL NOT”, “SHOULD”, “SHOULD NOT”, “RECOMMENDED”, “NOT RECOMMENDED”, “MAY”, and “OPTIONAL” in this document are to be interpreted as described in BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all capitals, as shown here.

“Host” and “device” define the two sides of the specified network interface: the device-independent host operating system on the one hand and the network interface card including its device-dependent driver code on the other. The specification covers the ''behavior'' of the device in this relationship.

<span id="glossary"></span>
=== Glossary ===

* Device: Synonym for network interface card (NIC).
* Driver: Device-dependent operating system code to interact with the device.
* Queue: Asynchronous communication channel between driver and device.

<span id="contact"></span>
=== Contact ===

This specification was created through the [https://www.opencompute.org/wiki/Networking/NIC_Software NIC software] effort with the OCP Networking project by OCP member companies Google, Intel, Meta and NVIDIA.

Comments, questions, suggestions for revisions and requests to join the standard committee can be directed to the OCP Networking mailing list. See [https://www.opencompute.org/projects/networking opencompute.org/projects/networking] for details.

This document benefited tremendously from detailed feedback from the wider OCP networking community. The authors want to thank everyone who took the time to review the specification. Your contributions are invaluable.

<span id="i-o-api"></span>