Re: Redux: Backwards compatibility for XDP multi-buff

Zvi Effron <zeffron@xxxxxxxxxxxxx> · Wed, 22 Sep 2021 14:23:23 -0700

On Wed, Sep 22, 2021 at 1:01 PM Toke Høiland-Jørgensen <toke@xxxxxxxxxx> wrote:
>
> Jakub Kicinski <kuba@xxxxxxxxxx> writes:
>
> > On Wed, 22 Sep 2021 00:20:19 +0200 Toke Høiland-Jørgensen wrote:
> >> >> Neither of those are desirable outcomes, I think; and if we add a
> >> >> separate "XDP multi-buff" switch, we might as well make it system-wide?
> >> >
> >> > If we have an internal flag 'this driver supports multi-buf xdp' cannot we
> >> > make xdp_redirect to linearize in case the packet is being redirected
> >> > to non multi-buf aware driver (potentially with corresponding non mb aware xdp
> >> > progs attached) from mb aware driver?
> >>
> >> Hmm, the assumption that XDP frames take up at most one page has been
> >> fundamental from the start of XDP. So what does linearise mean in this
> >> context? If we get a 9k packet, should we dynamically allocate a
> >> multi-page chunk of contiguous memory and copy the frame into that, or
> >> were you thinking something else?
> >
> > My $.02 would be to not care about redirect at all.
> >
> > It's not like the user experience with redirect is anywhere close
> > to amazing right now. Besides (with the exception of SW devices which
> > will likely gain mb support quickly) mixed-HW setups are very rare.
> > If the source of the redirect supports mb so will likely the target.
>
> It's not about device support it's about XDP program support: If I run
> an MB-aware XDP program on a physical interface and redirect the (MB)
> frame into a container, and there's an XDP program running inside that
> container that isn't MB-aware, bugs will ensue. Doesn't matter if the
> veth driver itself supports MB...
>
> We could leave that as a "don't do that, then" kind of thing, but that
> was what we were proposing (as the "do nothing" option) and got some
> pushback on, hence why we're having this conversation :)
>
> -Toke
>

I hadn't even considered the case of redirecting to a veth pair on the same
system. I'm assuming from your statement that the buffers are passed directly
to the ingress inside the container and don't go through the sort of egress
process they would if leaving the system? And I'm assuming that's as an
optimization?

I'm not sure that makes a difference, though. It's not about whether the
driver's code is mb-capable, it's about whether the driver _as currently
configured_ could generate multiple buffers. If it can, then only an mb-aware
program should be able to be attached to it (and tail called from whatever's
attached to it). If it can't, then there should be no way to have multiple
buffers come to it.

So in the situation you've described, either the veth driver should be in a
state where it coalesces the multiple buffers into one, fragmenting the frame
if necessary or drops the frame, or the program attached inside the container
would need to be mb-aware. I'm assuming with the veth driver as written, this
might mean that all programs attached to the veth driver would need to be
mb-aware, which is obviously undesirable.

All of which significantly adds to the complexity to support mb-aware, so maybe
this could be developed later? Initially we could have a sysctl toggling the
state 0 single-buffer only, 1 multibuffer allowed. Then later we _could_ add a
state for dynamic control once all XDP supporting drivers support the necessary
dynamic functionality (if ever). At that point we'd have actual experience with
the sysctl and could see how much of a burden having static control is.

I may have been misinterpreting your use case though, and you were talking
about the XDP program running on the egress side of the redirect? Is that what
you were talking about case?

--Zvi