On Fri, 16 Jun 2023 at 02:09, Stanislav Fomichev <sdf@xxxxxxxxxx> wrote: > > On Mon, Jun 12, 2023 at 2:01 PM Toke Høiland-Jørgensen <toke@xxxxxxxxxx> wrote: > > > > Some immediate thoughts after glancing through this: > > > > > --- Use cases --- > > > > > > The goal of this series is to add two new standard-ish places > > > in the transmit path: > > > > > > 1. Right before the packet is transmitted (with access to TX > > > descriptors) > > > 2. Right after the packet is actually transmitted and we've received the > > > completion (again, with access to TX completion descriptors) > > > > > > Accessing TX descriptors unlocks the following use-cases: > > > > > > - Setting device hints at TX: XDP/AF_XDP might use these new hooks to > > > use device offloads. The existing case implements TX timestamp. > > > - Observability: global per-netdev hooks can be used for tracing > > > the packets and exploring completion descriptors for all sorts of > > > device errors. > > > > > > Accessing TX descriptors also means that the hooks have to be called > > > from the drivers. > > > > > > The hooks are a light-weight alternative to XDP at egress and currently > > > don't provide any packet modification abilities. However, eventually, > > > can expose new kfuncs to operate on the packet (or, rather, the actual > > > descriptors; for performance sake). > > > > dynptr? > > > > > --- UAPI --- > > > > > > The hooks are implemented in a HID-BPF style. Meaning they don't > > > expose any UAPI and are implemented as tracing programs that call > > > a bunch of kfuncs. The attach/detach operation happen via BPF syscall > > > programs. The series expands device-bound infrastructure to tracing > > > programs. > > > > Not a fan of the "attach from BPF syscall program" thing. These are part > > of the XDP data path API, and I think we should expose them as proper > > bpf_link attachments from userspace with introspection etc. But I guess > > the bpf_mprog thing will give us that? > > > > > --- skb vs xdp --- > > > > > > The hooks operate on a new light-weight devtx_frame which contains: > > > - data > > > - len > > > - sinfo > > > > > > This should allow us to have a unified (from BPF POW) place at TX > > > and not be super-taxing (we need to copy 2 pointers + len to the stack > > > for each invocation). > > > > Not sure what I think about this one. At the very least I think we > > should expose xdp->data_meta as well. I'm not sure what the use case for > > accessing skbs is? If that *is* indeed useful, probably there will also > > end up being a use case for accessing the full skb? > > I spent some time looking at data_meta story on AF_XDP TX and it > doesn't look like it's supported (at least in a general way). > You obviously get some data_meta when you do XDP_TX, but if you want > to pass something to the bpf prog when doing TX via the AF_XDP ring, > it gets complicated. When we designed this some 5 - 6 years ago, we thought that there would be an XDP for egress action in the "nearish" future that could be used to interpret the metadata field in front of the packet. Basically, the user would load an XDP egress program that would define the metadata layout by the operations it would perform on the metadata area. But since XDP on egress has not happened, you are right, there is definitely something missing to be able to use metadata on Tx. Or could your proposed hook points be used for something like this? > In zerocopy mode, we can probably use XDP_UMEM_UNALIGNED_CHUNK_FLAG > and pass something in the headroom. This feature is mainly used to allow for multiple packets on the same chunk (to save space) and also to be able to have packets spanning two chunks. Even in aligned mode, you can start a packet at an arbitrary address in the chunk as long as the whole packet fits into the chunk. So no problem having headroom in any of the modes. > If copy-mode, there is no support to do skb_metadata_set. > > Probably makes sense to have something like tx_metalen on the xsk? And > skb_metadata_set it in copy more and skip it in zerocopy mode? > Or maybe I'm missing something? >