On 2024-01-17 17:37:29 [+0100], Toke Høiland-Jørgensen wrote: > This is all back-of-the-envelope calculations, of course. Having some > actual numbers to look at would be great; I don't suppose you have a > setup where you can run xdp-bench and see how your patches affect the > throughput? No but I probably could set it up. > I chatted with Jesper about this, and he had an idea not too far from > this: split up the XDP and regular stack processing in two stages, each > with their individual batching. So whereas right now we're doing > something like: > > run_napi() > bh_disable() > for pkt in budget: > act = run_xdp(pkt) > if (act == XDP_PASS) > run_netstack(pkt) // this is the expensive bit > bh_enable() > > We could instead do: > > run_napi() > bh_disable() > for pkt in budget: > act = run_xdp(pkt) > if (act == XDP_PASS) > add_to_list(pkt, to_stack_list) > bh_enable() > // sched point > bh_disable() > for pkt in to_stack_list: > run_netstack(pkt) > bh_enable() > > > This would limit the batching that blocks everything to only the XDP > processing itself, which should limit the maximum time spent in the > blocking state significantly compared to what we have today. The caveat > being that rearranging things like this is potentially a pretty major > refactoring task that needs to touch all the drivers (even if some of > the logic can be moved into the core code in the process). So not really > sure if this approach is feasible, TBH. This does not work because bh_disable() does not disable scheduling. Scheduling may happen. bh_disable() acquires a lock which is currently the only synchronisation point between two say network driver doing NAPI. And this what I want to get rid of. Regarding expensive bit as in XDP_PASS: This doesn't need locking as per proposal, just the REDIRECT piece. > > Daniel said netkit doesn't need this locking because it is not > > supporting this redirect and it made me think. Would it work to make > > the redirect structures part of the bpf_prog-structure instead of > > per-CPU? My understanding is that eBPF's programs data structures are > > part of it and contain locking allowing one eBPF program preempt > > another one. > > Having the redirect structures part of the program would obsolete > > locking. Do I miss anything? > > This won't work, unfortunately: the same XDP program can be attached to > multiple interfaces simultaneously, and for hardware with multiple > receive queues (which is most of the hardware that supports XDP), it can > even run simultaneously on multiple CPUs on the same interface. This is > the reason why this is all being kept in per-CPU variables today. So I started hacking this and noticed yesterday and noticed that you can run multiple bpf programs. This is how I learned that it won't work. My plan B is now to move it into task_struct. > -Toke Sebastian