Hi, I have a few comments for your consideration. On 3/21/23 22:38, Jakub Kicinski wrote: > Add basic documentation about NAPI. We can stop linking to the ancient > doc on the LF wiki. > > Signed-off-by: Jakub Kicinski <kuba@xxxxxxxxxx> > Link: https://lore.kernel.org/all/20230315223044.471002-1-kuba@xxxxxxxxxx/ > Reviewed-by: Bagas Sanjaya <bagasdotme@xxxxxxxxx> > Reviewed-by: Toke Høiland-Jørgensen <toke@xxxxxxxxxx> > Acked-by: Pavel Pisa <pisa@xxxxxxxxxxxxxxxx> # for ctucanfd-driver.rst > Reviewed-by: Tony Nguyen <anthony.l.nguyen@xxxxxxxxx> > Reviewed-by: Florian Fainelli <f.fainelli@xxxxxxxxx> > --- > v3: rebase on net-next (to avoid ixgb conflict) > fold in grammar fixes from Stephen > v2: https://lore.kernel.org/all/20230321050334.1036870-1-kuba@xxxxxxxxxx/ > remove the links in CAN and in ICE as well > improve the start of the threaded NAPI section > name footnote > internal links from the intro to sections > various clarifications from Florian and Stephen > > CC: corbet@xxxxxxx > CC: jesse.brandeburg@xxxxxxxxx > CC: anthony.l.nguyen@xxxxxxxxx > CC: pisa@xxxxxxxxxxxxxxxx > CC: mkl@xxxxxxxxxxxxxx > CC: linux-doc@xxxxxxxxxxxxxxx > CC: f.fainelli@xxxxxxxxx > CC: stephen@xxxxxxxxxxxxxxxxxx > CC: romieu@xxxxxxxxxxxxx > --- > .../can/ctu/ctucanfd-driver.rst | 3 +- > .../device_drivers/ethernet/intel/e100.rst | 3 +- > .../device_drivers/ethernet/intel/i40e.rst | 4 +- > .../device_drivers/ethernet/intel/ice.rst | 4 +- > Documentation/networking/index.rst | 1 + > Documentation/networking/napi.rst | 251 ++++++++++++++++++ > include/linux/netdevice.h | 13 +- > 7 files changed, 266 insertions(+), 13 deletions(-) > create mode 100644 Documentation/networking/napi.rst > diff --git a/Documentation/networking/napi.rst b/Documentation/networking/napi.rst > new file mode 100644 > index 000000000000..4f848d86750c > --- /dev/null > +++ b/Documentation/networking/napi.rst > @@ -0,0 +1,251 @@ > +.. _napi: > + > +==== > +NAPI > +==== > + > +NAPI is the event handling mechanism used by the Linux networking stack. > +The name NAPI no longer stands for anything in particular [#]_. > + > +In basic operation device notifies the host about new events via an interrupt. {a | the} device > +The host then schedules a NAPI instance to process the events. > +Device may also be polled for events via NAPI without receiving A device may also be The device may also be Devices may also be > +interrupts first (:ref:`busy polling<poll>`). > + > +NAPI processing usually happens in the software interrupt context, > +but there is an option to use :ref:`separate kernel threads<threaded>` > +for NAPI processing. > + > +All in all NAPI abstracts away from the drivers the context and configuration > +of event (packet Rx and Tx) processing. > + > +Driver API > +========== > + > +The two most important elements of NAPI are the struct napi_struct > +and the associated poll method. struct napi_struct holds the state > +of the NAPI instance while the method is the driver-specific event > +handler. The method will typically free Tx packets that have been > +transmitted and process newly received packets. > + > +.. _drv_ctrl: > + > +Control API > +----------- > + [snip] > + > +Datapath API > +------------ > + > +napi_schedule() is the basic method of scheduling a NAPI poll. > +Drivers should call this function in their interrupt handler > +(see :ref:`drv_sched` for more info). A successful call to napi_schedule() > +will take ownership of the NAPI instance. > + > +Later, after NAPI is scheduled, the driver's poll method will be > +called to process the events/packets. The method takes a ``budget`` > +argument - drivers can process completions for any number of Tx > +packets but should only process up to ``budget`` number of > +Rx packets. Rx processing is usually much more expensive. > + > +In other words, it is recommended to ignore the budget argument when > +performing TX buffer reclamation to ensure that the reclamation is not > +arbitrarily bounded, however, it is required to honor the budget argument bounded; however or bounded. However > +for RX processing. > + > +.. warning:: > + > + The ``budget`` argument may be 0 if core tries to only process Tx completions > + and no Rx packets. > + > +The poll method returns the amount of work done. If the driver still > +has outstanding work to do (e.g. ``budget`` was exhausted) > +the poll method should return exactly ``budget``. In that case, So it return the original 'budget' parameter value that was passed in to it? > +the NAPI instance will be serviced/polled again (without the > +need to be scheduled). > + > +If event processing has been completed (all outstanding packets > +processed) the poll method should call napi_complete_done() > +before returning. napi_complete_done() releases the ownership > +of the instance. > + > +.. warning:: > + > + The case of finishing all events and using exactly ``budget`` > + must be handled carefully. There is no way to report this > + (rare) condition to the stack, so the driver must either > + not call napi_complete_done() and wait to be called again, > + or return ``budget - 1``. > + > + If the ``budget`` is 0 napi_complete_done() should never be called. > + > +Call sequence > +------------- > + [snip] > + > +.. _drv_sched: > + > +Scheduling and IRQ masking > +-------------------------- > + [snip] > + > +Instance to queue mapping > +------------------------- > + > +Modern devices have multiple NAPI instances (struct napi_struct) per > +interface. There is no strong requirement on how the instances are > +mapped to queues and interrupts. NAPI is primarily a polling/processing > +abstraction without specific user-facing semantics. That said, most networking > +devices end up using NAPI in fairly similar ways. > + > +NAPI instances most often correspond 1:1:1 to interrupts and queue pairs > +(queue pair is a set of a single Rx and single Tx queue). > + > +In less common cases a NAPI instance may be used for multiple queues > +or Rx and Tx queues can be serviced by separate NAPI instances on a single > +core. Regardless of the queue assignment, however, there is usually still > +a 1:1 mapping between NAPI instances and interrupts. > + > +It's worth noting that the ethtool API uses a "channel" terminology where > +each channel can be either ``rx``, ``tx`` or ``combined``. It's not clear > +what constitutes a channel, the recommended interpretation is to understand channel; > +a channel as an IRQ/NAPI which services queues of a given type. For example, > +a configuration of 1 ``rx``, 1 ``tx`` and 1 ``combined`` channel is expected > +to utilize 3 interrupts, 2 Rx and 2 Tx queues. > + > +User API > +======== > + > +User interactions with NAPI depend on NAPI instance ID. The instance IDs > +are only visible to the user thru the ``SO_INCOMING_NAPI_ID`` socket option. > +It's not currently possible to query IDs used by a given device. > + > +Software IRQ coalescing > +----------------------- > + [snip] > + > +.. _poll: > + > +Busy polling > +------------ > + > +Busy polling allows user process to check for incoming packets before allows a user process or allows user processes > +the device interrupt fires. As is the case with any busy polling it trades > +off CPU cycles for lower latency (in fact production uses of NAPI busy I would drop "in fact". > +polling are not well known). > + > +Busy polling is enabled by either setting ``SO_BUSY_POLL`` on > +selected sockets or using the global ``net.core.busy_poll`` and > +``net.core.busy_read`` sysctls. An io_uring API for NAPI busy polling > +also exists. > + > +IRQ mitigation > +--------------- > + [snip] and in any case: Reviewed-by: Randy Dunlap <rdunlap@xxxxxxxxxxxxx> Thanks. -- ~Randy