Re: [PATCH v4 bpf-next 15/22] xsk: add multi-buffer documentation

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Maciej Fijalkowski <maciej.fijalkowski@xxxxxxxxx> writes:

> From: Magnus Karlsson <magnus.karlsson@xxxxxxxxx>
>
> Add AF_XDP multi-buffer support documentation including two
> pseudo-code samples.
>
> Signed-off-by: Magnus Karlsson <magnus.karlsson@xxxxxxxxx>
> ---
>  Documentation/networking/af_xdp.rst | 177 ++++++++++++++++++++++++++++
>  1 file changed, 177 insertions(+)
>
> diff --git a/Documentation/networking/af_xdp.rst b/Documentation/networking/af_xdp.rst
> index 247c6c4127e9..2b583f58967b 100644
> --- a/Documentation/networking/af_xdp.rst
> +++ b/Documentation/networking/af_xdp.rst
> @@ -453,6 +453,93 @@ XDP_OPTIONS getsockopt
>  Gets options from an XDP socket. The only one supported so far is
>  XDP_OPTIONS_ZEROCOPY which tells you if zero-copy is on or not.
>  
> +Multi-Buffer Support
> +--------------------
> +
> +With multi-buffer support, programs using AF_XDP sockets can receive
> +and transmit packets consisting of multiple buffers both in copy and
> +zero-copy mode. For example, a packet can consist of two
> +frames/buffers, one with the header and the other one with the data,
> +or a 9K Ethernet jumbo frame can be constructed by chaining together
> +three 4K frames.
> +
> +Some definitions:
> +
> +* A packet consists of one or more frames
> +
> +* A descriptor in one of the AF_XDP rings always refers to a single
> +  frame. In the case the packet consists of a single frame, the
> +  descriptor refers to the whole packet.
> +
> +To enable multi-buffer support for an AF_XDP socket, use the new bind
> +flag XDP_USE_SG. If this is not provided, all multi-buffer packets
> +will be dropped just as before. Note that the XDP program loaded also
> +needs to be in multi-buffer mode. This can be accomplished by using
> +"xdp.frags" as the section name of the XDP program used.
> +
> +To represent a packet consisting of multiple frames, a new flag called
> +XDP_PKT_CONTD is introduced in the options field of the Rx and Tx
> +descriptors. If it is true (1) the packet continues with the next
> +descriptor and if it is false (0) it means this is the last descriptor
> +of the packet. Why the reverse logic of end-of-packet (eop) flag found
> +in many NICs? Just to preserve compatibility with non-multi-buffer
> +applications that have this bit set to false for all packets on Rx,
> +and the apps set the options field to zero for Tx, as anything else
> +will be treated as an invalid descriptor.
> +
> +These are the semantics for producing packets onto AF_XDP Tx ring
> +consisting of multiple frames:
> +
> +* When an invalid descriptor is found, all the other
> +  descriptors/frames of this packet are marked as invalid and not
> +  completed. The next descriptor is treated as the start of a new
> +  packet, even if this was not the intent (because we cannot guess
> +  the intent). As before, if your program is producing invalid
> +  descriptors you have a bug that must be fixed.
> +
> +* Zero length descriptors are treated as invalid descriptors.
> +
> +* For copy mode, the maximum supported number of frames in a packet is
> +  equal to CONFIG_MAX_SKB_FRAGS + 1. If it is exceeded, all
> +  descriptors accumulated so far are dropped and treated as
> +  invalid. To produce an application that will work on any system
> +  regardless of this config setting, limit the number of frags to 18,
> +  as the minimum value of the config is 17.
> +
> +* For zero-copy mode, the limit is up to what the NIC HW
> +  supports. Usually at least five on the NICs we have checked. We
> +  consciously chose to not enforce a rigid limit (such as
> +  CONFIG_MAX_SKB_FRAGS + 1) for zero-copy mode, as it would have
> +  resulted in copy actions under the hood to fit into what limit
> +  the NIC supports. Kind of defeats the purpose of zero-copy mode.

How is an application supposed to discover the actual limit for a given
NIC/driver?

> +* The ZC batch API guarantees that it will provide a batch of Tx
> +  descriptors that ends with full packet at the end. If not, ZC
> +  drivers would have to gather the full packet on their side. The
> +  approach we picked makes ZC drivers' life much easier (at least on
> +  Tx side).

This seems like it implies some constraint on how an application can use
the APIs, but it's not quite clear to me what those constraints are, nor
what happens if an application does something different. This should
probably be spelled out...

> +On the Rx path in copy-mode, the xsk core copies the XDP data into
> +multiple descriptors, if needed, and sets the XDP_PKT_CONTD flag as
> +detailed before. Zero-copy mode works the same, though the data is not
> +copied. When the application gets a descriptor with the XDP_PKT_CONTD
> +flag set to one, it means that the packet consists of multiple buffers
> +and it continues with the next buffer in the following
> +descriptor. When a descriptor with XDP_PKT_CONTD == 0 is received, it
> +means that this is the last buffer of the packet. AF_XDP guarantees
> +that only a complete packet (all frames in the packet) is sent to the
> +application.

In light of the comment on batch size below, I think it would be useful
to spell out what this means exactly. IIUC correctly, it means that the
kernel will check the ringbuffer before starting to copy the data, and
if there are not enough descriptors available, it will drop the packet
instead of doing a partial copy, right? And this is the case for both ZC
and copy mode?

> +If application reads a batch of descriptors, using for example the libxdp
> +interfaces, it is not guaranteed that the batch will end with a full
> +packet. It might end in the middle of a packet and the rest of the
> +buffers of that packet will arrive at the beginning of the next batch,
> +since the libxdp interface does not read the whole ring (unless you
> +have an enormous batch size or a very small ring size).
> +
> +An example program each for Rx and Tx multi-buffer support can be found
> +later in this document.
> +
>  Usage
>  =====
>  
> @@ -532,6 +619,96 @@ like this:
>  But please use the libbpf functions as they are optimized and ready to
>  use. Will make your life easier.
>  
> +Usage Multi-Buffer Rx
> +=====================
> +
> +Here is a simple Rx path pseudo-code example (using libxdp interfaces
> +for simplicity). Error paths have been excluded to keep it short:
> +
> +.. code-block:: c
> +
> +    void rx_packets(struct xsk_socket_info *xsk)
> +    {
> +        static bool new_packet = true;
> +        u32 idx_rx = 0, idx_fq = 0;
> +        static char *pkt;
> +
> +        int rcvd = xsk_ring_cons__peek(&xsk->rx, opt_batch_size, &idx_rx);
> +
> +        xsk_ring_prod__reserve(&xsk->umem->fq, rcvd, &idx_fq);
> +
> +        for (int i = 0; i < rcvd; i++) {
> +            struct xdp_desc *desc = xsk_ring_cons__rx_desc(&xsk->rx, idx_rx++);
> +            char *frag = xsk_umem__get_data(xsk->umem->buffer, desc->addr);
> +            bool eop = !(desc->options & XDP_PKT_CONTD);
> +
> +        if (new_packet)
> +            pkt = frag;
> +        else
> +            add_frag_to_pkt(pkt, frag);
> +
> +        if (eop)
> +            process_pkt(pkt);
> +
> +        new_packet = eop;
> +
> +        *xsk_ring_prod__fill_addr(&xsk->umem->fq, idx_fq++) = desc->addr;

Indentation is off here...

-Toke




[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux