Re: [PATCH bpf-next v7 01/17] bpf: Document XDP RX metadata

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On 17/01/2023 21.33, Stanislav Fomichev wrote:
On Mon, Jan 16, 2023 at 5:09 AM Jesper Dangaard Brouer
<jbrouer@xxxxxxxxxx> wrote:

On 12/01/2023 01.32, Stanislav Fomichev wrote:
Document all current use-cases and assumptions.

[...]
Acked-by: David Vernet <void@xxxxxxxxxxxxx>
Signed-off-by: Stanislav Fomichev <sdf@xxxxxxxxxx>
---
   Documentation/networking/index.rst           |   1 +
   Documentation/networking/xdp-rx-metadata.rst | 108 +++++++++++++++++++
   2 files changed, 109 insertions(+)
   create mode 100644 Documentation/networking/xdp-rx-metadata.rst

diff --git a/Documentation/networking/index.rst b/Documentation/networking/index.rst
index 4f2d1f682a18..4ddcae33c336 100644
--- a/Documentation/networking/index.rst
+++ b/Documentation/networking/index.rst
[...cut...]
+AF_XDP
+======
+
+:doc:`af_xdp` use-case implies that there is a contract between the BPF
+program that redirects XDP frames into the ``AF_XDP`` socket (``XSK``) and
+the final consumer. Thus the BPF program manually allocates a fixed number of
+bytes out of metadata via ``bpf_xdp_adjust_meta`` and calls a subset
+of kfuncs to populate it. The userspace ``XSK`` consumer computes
+``xsk_umem__get_data() - METADATA_SIZE`` to locate that metadata.
+Note, ``xsk_umem__get_data`` is defined in ``libxdp`` and
+``METADATA_SIZE`` is an application-specific constant.

The main problem with AF_XDP and metadata is that, the AF_XDP descriptor
doesn't contain any info about the length METADATA_SIZE.

The text does says this, but in a very convoluted way.
I think this challenge should be more clearly spelled out.

(p.s. This was something that XDP-hints via BTF have a proposed solution
for)

Any suggestions on how to clarify it better? I have two hints:
1. ``METADATA_SIZE`` is an application-specific constant
2. note missing ``data_meta`` pointer

Do you prefer I also add a sentence where I spell it out more
explicitly? Something like:

Note, ``xsk_umem__get_data`` is defined in ``libxdp`` and
``METADATA_SIZE`` is an application-specific constant (``AF_XDP``
receive descriptor does _not_ explicitly carry the size of the
metadata).

That addition works for me.
(Later we can hopefully come up with a solution for this)

+
+Here is the ``AF_XDP`` consumer layout (note missing ``data_meta`` pointer)::

The "note" also hint to this issue.

This seems like an explicit design choice of the AF_XDP? In theory, I
don't see why we can't have a v2 receive descriptor format where we
return the size of the metadata?

(Cc. Magnus+Bjørn)
Yes, it was a design choice from AF_XDP not to include the metadata length.

The AF_XDP descriptor, see struct xdp_desc (below) from /include/uapi/linux/if_xdp.h.

 /* Rx/Tx descriptor */
 struct xdp_desc {
	__u64 addr;
	__u32 len;
	__u32 options;
 };

Does contain a 'u32 options' field, that we could use.

In previous discussions, the proposed solution (from Bjørn+Magnus) was
to use some bits in the 'options' field to say metadata is present, and
xsk_umem__get_data minus 4 (or 8) bytes contain a BTF_ID.  The AF_XDP
programmer can then get the metadata length by looking up the BTF_ID.


+
+  +----------+-----------------+------+
+  | headroom | custom metadata | data |
+  +----------+-----------------+------+
+                               ^
+                               |
+                        rx_desc->address
+
+XDP_PASS
+========
+
+This is the path where the packets processed by the XDP program are passed
+into the kernel. The kernel creates the ``skb`` out of the ``xdp_buff``
+contents. Currently, every driver has custom kernel code to parse
+the descriptors and populate ``skb`` metadata when doing this ``xdp_buff->skb``
+conversion, and the XDP metadata is not used by the kernel when building
+``skbs``. However, TC-BPF programs can access the XDP metadata area using
+the ``data_meta`` pointer.
+
+In the future, we'd like to support a case where an XDP program
+can override some of the metadata used for building ``skbs``.

Happy this is mentioned as future work.

As mentioned in a separate email, if you prefer to focus on that, feel

Yes, I'm going to work on PoC code that explore this area.

free to drive it since I'm gonna look into the TX side first.

Happy to hear you are going to look into TX-side.
Are your use case related to TX timestamping?

--Jesper




[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux