On 08/09/2022 12.10, Maryam Tahhan wrote:
On 08/09/2022 09:06, Magnus Karlsson wrote:
On Wed, Sep 7, 2022 at 5:48 PM Jesper Dangaard Brouer
<brouer@xxxxxxxxxx> wrote:
From: Maryam Tahhan <mtahhan@xxxxxxxxxx>
Simply set AF_XDP descriptor options to XDP flags.
Jesper: Will this really be acceptable by AF_XDP maintainers?
Maryam, you guessed correctly that dedicating all these options bits
for a single feature will not be ok :-). E.g., I want one bit for the
AF_XDP multi-buffer support and who knows what other uses there might
be for this options field in the future. Let us try to solve this in
some other way. Here are some suggestions, all with their pros and
cons.
TBH it was Jespers question :)
True. I'm generally questioning this patch...
... and indirectly asking Magnus. (If you noticed, I didn't add my SoB)
* Put this feature flag at a known place in the metadata area, for
example just before the BTF ID. No need to fill this in if you are not
redirecting to AF_XDP, but at a redirect to AF_XDP, the XDP flags are
copied into this u32 in the metadata area so that user-space can
consume it. Will cost 4 bytes of the metadata area though.
If Jesper agrees I think this approach would make sense. Trying to
translate encodings into some other flags for AF_XDP I think will lead
to a growing set of translations as more options come along.
The other thing to be aware of is just making sure to clear/zero the
metadata space in the buffers at some point (ideally when the descriptor
is returned from the application) so when the buffers are used again
they are already in a "reset" state.
I don't like this option ;-)
First of all because this can give false positives, if "XDP flags copied
into metadata area" is used for something else. This can easily happen
as XDP BPF-progs are free to metadata for something else.
Second reason, because it would require AF_XDP to always read the
metadata cache-line (and write, if clearing on "return"). Not a good
optioon, given how performance sensitive AF_XDP workloads (at least
benchmarks).
* Instead encode this information into each metadata entry in the
metadata area, in some way so that a flags field is not needed (-1
signifies not valid, or whatever happens to make sense). This has the
drawback that the user might have to look at a large number of entries
just to find out there is nothing valid to read. To alleviate this, it
could be combined with the next suggestion.
* Dedicate one bit in the options field to indicate that there is at
least one valid metadata entry in the metadata area. This could be
combined with the two approaches above. However, depending on what
metadata you have enabled, this bit might be pointless. If some
metadata is always valid, then it serves no purpose. But it might if
all enabled metadata is rarely valid, e.g., if you get an Rx timestamp
on one packet out of one thousand.
I like this option better! Except that I have hoped to get 2 bits ;-)
The performance advantage is that the AF_XDP descriptor bits will
already be cache-hot, and if it indicates no-metadata-hints the AF_XDP
application can avoid reading the metadata cache-line :-).
When metadata is valid and contains valid XDP-hints can change between
two packets. E.g. XDP-hints can be enabled/disabled via ethtool, and
the content can be enabled/disabled by other ethtool commands, and even
setsockopt calls (e.g timestamping). An XDP prog can also choose to use
the area for something else for a subset of the packets.
It is a design choice in this patchset to avoid locking down the NIC
driver to a fixed XDP-hints layout, and avoid locking/disabling other
ethtool config setting to keeping XDP-hints layout stable. Originally I
wanted this, but I realized that it would be impossible (and annoying
for users) if we had to control every config interface to NIC hardware
offload hints, to keep XDP-hints "always-valid".
--Jesper
Signed-off-by: Maryam Tahhan <mtahhan@xxxxxxxxxx>
---
include/uapi/linux/if_xdp.h | 2 +-
net/xdp/xsk.c | 2 +-
net/xdp/xsk_queue.h | 3 ++-
3 files changed, 4 insertions(+), 3 deletions(-)
diff --git a/include/uapi/linux/if_xdp.h b/include/uapi/linux/if_xdp.h
index a78a8096f4ce..9335b56474e7 100644
--- a/include/uapi/linux/if_xdp.h
+++ b/include/uapi/linux/if_xdp.h
@@ -103,7 +103,7 @@ struct xdp_options {
struct xdp_desc {
__u64 addr;
__u32 len;
- __u32 options;
+ __u32 options; /* set to the values of xdp_hints_flags*/
};
/* UMEM descriptor is __u64 */
diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c
index 5b4ce6ba1bc7..32095d78f06b 100644
--- a/net/xdp/xsk.c
+++ b/net/xdp/xsk.c
@@ -141,7 +141,7 @@ static int __xsk_rcv_zc(struct xdp_sock *xs,
struct xdp_buff *xdp, u32 len)
int err;
addr = xp_get_handle(xskb);
- err = xskq_prod_reserve_desc(xs->rx, addr, len);
+ err = xskq_prod_reserve_desc(xs->rx, addr, len, xdp->flags);
if (err) {
xs->rx_queue_full++;
return err;
diff --git a/net/xdp/xsk_queue.h b/net/xdp/xsk_queue.h
index fb20bf7207cf..7a66f082f97e 100644
--- a/net/xdp/xsk_queue.h
+++ b/net/xdp/xsk_queue.h
@@ -368,7 +368,7 @@ static inline u32
xskq_prod_reserve_addr_batch(struct xsk_queue *q, struct xdp_d
}
static inline int xskq_prod_reserve_desc(struct xsk_queue *q,
- u64 addr, u32 len)
+ u64 addr, u32 len, u32 flags)
{
struct xdp_rxtx_ring *ring = (struct xdp_rxtx_ring *)q->ring;
u32 idx;
@@ -380,6 +380,7 @@ static inline int xskq_prod_reserve_desc(struct
xsk_queue *q,
idx = q->cached_prod++ & q->ring_mask;
ring->desc[idx].addr = addr;
ring->desc[idx].len = len;
+ ring->desc[idx].options = flags;
return 0;
}