Re: [RFC bpf-next 0/4] Add XDP rx hw hints support performing XDP_REDIRECT

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 





On 04/10/2024 04.13, Daniel Xu wrote:
On Thu, Oct 03, 2024 at 01:26:08PM GMT, Stanislav Fomichev wrote:
On 10/03, Arthur Fabre wrote:
On Thu Oct 3, 2024 at 12:49 AM CEST, Stanislav Fomichev wrote:
On 10/02, Toke Høiland-Jørgensen wrote:
Stanislav Fomichev <stfomichev@xxxxxxxxx> writes:

On 10/01, Toke Høiland-Jørgensen wrote:
Lorenzo Bianconi <lorenzo@xxxxxxxxxx> writes:

On Mon Sep 30, 2024 at 1:49 PM CEST, Lorenzo Bianconi wrote:
Lorenzo Bianconi <lorenzo@xxxxxxxxxx> writes:

[...]

I like this 'fast' KV approach but I guess we should really evaluate its
impact on performances (especially for xdp) since, based on the kfunc calls
order in the ebpf program, we can have one or multiple memmove/memcpy for
each packet, right?

Yes, with Arthur's scheme, performance will be ordering dependent. Using

I really like the *compact* Key-Value (KV) store idea from Arthur.
 - The question is it is fast enough?

I've promised Arthur to XDP micro-benchmark this, if he codes this up to
be usable in the XDP code path.  Listening to the LPC recording I heard
that Alexei also saw potential and other use-case for this kind of
fast-and-compact KV approach.

I have high hopes for the performance, as Arthur uses POPCNT instruction
which is *very* fast[1]. I checked[2] AMD Zen 3 and 4 have Ops/Latency=1
and Reciprocal throughput 0.25.

 [1] https://www.agner.org/optimize/blog/read.php?i=853#848
 [2] https://www.agner.org/optimize/instruction_tables.pdf

[...]
There are two different use-cases for the metadata:

* "Hardware" metadata (like the hash, rx_timestamp...). There are only a
   few well known fields, and only XDP can access them to set them as
   metadata, so storing them in a struct somewhere could make sense.

* Arbitrary metadata used by services. Eg a TC filter could set a field
   describing which service a packet is for, and that could be reused for
   iptables, routing, socket dispatch...
   Similarly we could set a "packet_id" field that uniquely identifies a
   packet so we can trace it throughout the network stack (through
   clones, encap, decap, userspace services...).
   The skb->mark, but with more room, and better support for sharing it.

We can only know the layout ahead of time for the first one. And they're
similar enough in their requirements (need to be stored somewhere in the
SKB, have a way of retrieving each one individually, that it seems to
make sense to use a common API).

Why not have the following layout then?

+---------------+-------------------+----------------------------------------+------+
| more headroom | user-defined meta | hw-meta (potentially fixed skb format) | data |
+---------------+-------------------+----------------------------------------+------+
                 ^                                                            ^
             data_meta                                                      data

You obviously still have a problem of communicating the layout if you
have some redirects in between, but you, in theory still have this
problem with user-defined metadata anyway (unless I'm missing
something).


Hmm, I think you are missing something... As far as I'm concerned we are
discussing placing the KV data after the xdp_frame, and not in the XDP
data_meta area (as your drawing suggests).  The xdp_frame is stored at
the very top of the headroom.  Lorenzo's patchset is extending struct
xdp_frame and now we are discussing to we can make a more flexible API
for extending this. I understand that Toke confirmed this here [3].  Let
me know if I missed something :-)

 [3] https://lore.kernel.org/all/874j62u1lb.fsf@xxxxxxx/

As part of designing this flexible API, we/Toke are trying hard not to
tie this to a specific data area.  This is a good API design, keeping it
flexible enough that we can move things around should the need arise.

I don't think it is viable to store this KV data in XDP data_meta area,
because existing BPF-prog's already have direct memory (write) access
and can change size of area, which creates too much headache with
(existing) BPF-progs creating unintentional breakage for the KV store,
which would then need extensive checks to handle random corruptions
(slowing down KV-store code).

--Jesper




[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux