Re: [PATCH net-next v3] docs: networking: document NAPI

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,
I have a few comments for your consideration.

On 3/21/23 22:38, Jakub Kicinski wrote:
> Add basic documentation about NAPI. We can stop linking to the ancient
> doc on the LF wiki.
> 
> Signed-off-by: Jakub Kicinski <kuba@xxxxxxxxxx>
> Link: https://lore.kernel.org/all/20230315223044.471002-1-kuba@xxxxxxxxxx/
> Reviewed-by: Bagas Sanjaya <bagasdotme@xxxxxxxxx>
> Reviewed-by: Toke Høiland-Jørgensen <toke@xxxxxxxxxx>
> Acked-by: Pavel Pisa <pisa@xxxxxxxxxxxxxxxx> # for ctucanfd-driver.rst
> Reviewed-by: Tony Nguyen <anthony.l.nguyen@xxxxxxxxx>
> Reviewed-by: Florian Fainelli <f.fainelli@xxxxxxxxx>
> ---
> v3: rebase on net-next (to avoid ixgb conflict)
>     fold in grammar fixes from Stephen
> v2: https://lore.kernel.org/all/20230321050334.1036870-1-kuba@xxxxxxxxxx/
>     remove the links in CAN and in ICE as well
>     improve the start of the threaded NAPI section
>     name footnote
>     internal links from the intro to sections
>     various clarifications from Florian and Stephen
> 
> CC: corbet@xxxxxxx
> CC: jesse.brandeburg@xxxxxxxxx
> CC: anthony.l.nguyen@xxxxxxxxx
> CC: pisa@xxxxxxxxxxxxxxxx
> CC: mkl@xxxxxxxxxxxxxx
> CC: linux-doc@xxxxxxxxxxxxxxx
> CC: f.fainelli@xxxxxxxxx
> CC: stephen@xxxxxxxxxxxxxxxxxx
> CC: romieu@xxxxxxxxxxxxx
> ---
>  .../can/ctu/ctucanfd-driver.rst               |   3 +-
>  .../device_drivers/ethernet/intel/e100.rst    |   3 +-
>  .../device_drivers/ethernet/intel/i40e.rst    |   4 +-
>  .../device_drivers/ethernet/intel/ice.rst     |   4 +-
>  Documentation/networking/index.rst            |   1 +
>  Documentation/networking/napi.rst             | 251 ++++++++++++++++++
>  include/linux/netdevice.h                     |  13 +-
>  7 files changed, 266 insertions(+), 13 deletions(-)
>  create mode 100644 Documentation/networking/napi.rst


> diff --git a/Documentation/networking/napi.rst b/Documentation/networking/napi.rst
> new file mode 100644
> index 000000000000..4f848d86750c
> --- /dev/null
> +++ b/Documentation/networking/napi.rst
> @@ -0,0 +1,251 @@
> +.. _napi:
> +
> +====
> +NAPI
> +====
> +
> +NAPI is the event handling mechanism used by the Linux networking stack.
> +The name NAPI no longer stands for anything in particular [#]_.
> +
> +In basic operation device notifies the host about new events via an interrupt.

                     {a | the} device

> +The host then schedules a NAPI instance to process the events.
> +Device may also be polled for events via NAPI without receiving

A device may also be
The device may also be
Devices may also be

> +interrupts first (:ref:`busy polling<poll>`).
> +
> +NAPI processing usually happens in the software interrupt context,
> +but there is an option to use :ref:`separate kernel threads<threaded>`
> +for NAPI processing.
> +
> +All in all NAPI abstracts away from the drivers the context and configuration
> +of event (packet Rx and Tx) processing.
> +
> +Driver API
> +==========
> +
> +The two most important elements of NAPI are the struct napi_struct
> +and the associated poll method. struct napi_struct holds the state
> +of the NAPI instance while the method is the driver-specific event
> +handler. The method will typically free Tx packets that have been
> +transmitted and process newly received packets.
> +
> +.. _drv_ctrl:
> +
> +Control API
> +-----------
> +

[snip]

> +
> +Datapath API
> +------------
> +
> +napi_schedule() is the basic method of scheduling a NAPI poll.
> +Drivers should call this function in their interrupt handler
> +(see :ref:`drv_sched` for more info). A successful call to napi_schedule()
> +will take ownership of the NAPI instance.
> +
> +Later, after NAPI is scheduled, the driver's poll method will be
> +called to process the events/packets. The method takes a ``budget``
> +argument - drivers can process completions for any number of Tx
> +packets but should only process up to ``budget`` number of
> +Rx packets. Rx processing is usually much more expensive.
> +
> +In other words, it is recommended to ignore the budget argument when
> +performing TX buffer reclamation to ensure that the reclamation is not
> +arbitrarily bounded, however, it is required to honor the budget argument

               bounded; however
or
               bounded. However

> +for RX processing.
> +
> +.. warning::
> +
> +   The ``budget`` argument may be 0 if core tries to only process Tx completions
> +   and no Rx packets.
> +
> +The poll method returns the amount of work done. If the driver still
> +has outstanding work to do (e.g. ``budget`` was exhausted)
> +the poll method should return exactly ``budget``. In that case,

So it return the original 'budget' parameter value that was passed in to it?

> +the NAPI instance will be serviced/polled again (without the
> +need to be scheduled).
> +
> +If event processing has been completed (all outstanding packets
> +processed) the poll method should call napi_complete_done()
> +before returning. napi_complete_done() releases the ownership
> +of the instance.
> +
> +.. warning::
> +
> +   The case of finishing all events and using exactly ``budget``
> +   must be handled carefully. There is no way to report this
> +   (rare) condition to the stack, so the driver must either
> +   not call napi_complete_done() and wait to be called again,
> +   or return ``budget - 1``.
> +
> +   If the ``budget`` is 0 napi_complete_done() should never be called.
> +
> +Call sequence
> +-------------
> +

[snip]
> +
> +.. _drv_sched:
> +
> +Scheduling and IRQ masking
> +--------------------------
> +

[snip]

> +
> +Instance to queue mapping
> +-------------------------
> +
> +Modern devices have multiple NAPI instances (struct napi_struct) per
> +interface. There is no strong requirement on how the instances are
> +mapped to queues and interrupts. NAPI is primarily a polling/processing
> +abstraction without specific user-facing semantics. That said, most networking
> +devices end up using NAPI in fairly similar ways.
> +
> +NAPI instances most often correspond 1:1:1 to interrupts and queue pairs
> +(queue pair is a set of a single Rx and single Tx queue).
> +
> +In less common cases a NAPI instance may be used for multiple queues
> +or Rx and Tx queues can be serviced by separate NAPI instances on a single
> +core. Regardless of the queue assignment, however, there is usually still
> +a 1:1 mapping between NAPI instances and interrupts.
> +
> +It's worth noting that the ethtool API uses a "channel" terminology where
> +each channel can be either ``rx``, ``tx`` or ``combined``. It's not clear
> +what constitutes a channel, the recommended interpretation is to understand

                      channel;

> +a channel as an IRQ/NAPI which services queues of a given type. For example,
> +a configuration of 1 ``rx``, 1 ``tx`` and 1 ``combined`` channel is expected
> +to utilize 3 interrupts, 2 Rx and 2 Tx queues.
> +
> +User API
> +========
> +
> +User interactions with NAPI depend on NAPI instance ID. The instance IDs
> +are only visible to the user thru the ``SO_INCOMING_NAPI_ID`` socket option.
> +It's not currently possible to query IDs used by a given device.
> +
> +Software IRQ coalescing
> +-----------------------
> +

[snip]

> +
> +.. _poll:
> +
> +Busy polling
> +------------
> +
> +Busy polling allows user process to check for incoming packets before

                allows a user process
or
                allows user processes

> +the device interrupt fires. As is the case with any busy polling it trades
> +off CPU cycles for lower latency (in fact production uses of NAPI busy

I would drop "in fact".

> +polling are not well known).
> +
> +Busy polling is enabled by either setting ``SO_BUSY_POLL`` on
> +selected sockets or using the global ``net.core.busy_poll`` and
> +``net.core.busy_read`` sysctls. An io_uring API for NAPI busy polling
> +also exists.
> +
> +IRQ mitigation
> +---------------
> +

[snip]

and in any case:
Reviewed-by: Randy Dunlap <rdunlap@xxxxxxxxxxxxx>

Thanks.
-- 
~Randy



[Index of Archives]     [Kernel Newbies]     [Security]     [Netfilter]     [Bugtraq]     [Linux FS]     [Yosemite Forum]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Samba]     [Video 4 Linux]     [Device Mapper]     [Linux Resources]

  Powered by Linux