On Wed, Sep 22, 2021 at 1:01 PM Toke Høiland-Jørgensen <toke@xxxxxxxxxx> wrote: > > Jakub Kicinski <kuba@xxxxxxxxxx> writes: > > > On Wed, 22 Sep 2021 00:20:19 +0200 Toke Høiland-Jørgensen wrote: > >> >> Neither of those are desirable outcomes, I think; and if we add a > >> >> separate "XDP multi-buff" switch, we might as well make it system-wide? > >> > > >> > If we have an internal flag 'this driver supports multi-buf xdp' cannot we > >> > make xdp_redirect to linearize in case the packet is being redirected > >> > to non multi-buf aware driver (potentially with corresponding non mb aware xdp > >> > progs attached) from mb aware driver? > >> > >> Hmm, the assumption that XDP frames take up at most one page has been > >> fundamental from the start of XDP. So what does linearise mean in this > >> context? If we get a 9k packet, should we dynamically allocate a > >> multi-page chunk of contiguous memory and copy the frame into that, or > >> were you thinking something else? > > > > My $.02 would be to not care about redirect at all. > > > > It's not like the user experience with redirect is anywhere close > > to amazing right now. Besides (with the exception of SW devices which > > will likely gain mb support quickly) mixed-HW setups are very rare. > > If the source of the redirect supports mb so will likely the target. > > It's not about device support it's about XDP program support: If I run > an MB-aware XDP program on a physical interface and redirect the (MB) > frame into a container, and there's an XDP program running inside that > container that isn't MB-aware, bugs will ensue. Doesn't matter if the > veth driver itself supports MB... > > We could leave that as a "don't do that, then" kind of thing, but that > was what we were proposing (as the "do nothing" option) and got some > pushback on, hence why we're having this conversation :) > > -Toke > I hadn't even considered the case of redirecting to a veth pair on the same system. I'm assuming from your statement that the buffers are passed directly to the ingress inside the container and don't go through the sort of egress process they would if leaving the system? And I'm assuming that's as an optimization? I'm not sure that makes a difference, though. It's not about whether the driver's code is mb-capable, it's about whether the driver _as currently configured_ could generate multiple buffers. If it can, then only an mb-aware program should be able to be attached to it (and tail called from whatever's attached to it). If it can't, then there should be no way to have multiple buffers come to it. So in the situation you've described, either the veth driver should be in a state where it coalesces the multiple buffers into one, fragmenting the frame if necessary or drops the frame, or the program attached inside the container would need to be mb-aware. I'm assuming with the veth driver as written, this might mean that all programs attached to the veth driver would need to be mb-aware, which is obviously undesirable. All of which significantly adds to the complexity to support mb-aware, so maybe this could be developed later? Initially we could have a sysctl toggling the state 0 single-buffer only, 1 multibuffer allowed. Then later we _could_ add a state for dynamic control once all XDP supporting drivers support the necessary dynamic functionality (if ever). At that point we'd have actual experience with the sysctl and could see how much of a burden having static control is. I may have been misinterpreting your use case though, and you were talking about the XDP program running on the egress side of the redirect? Is that what you were talking about case? --Zvi