Toke Høiland-Jørgensen wrote: > This series adds support for executing multiple XDP programs on a single > interface in sequence, through the use of chain calls, as discussed at the Linux > Plumbers Conference last month: > > https://linuxplumbersconf.org/event/4/contributions/460/ > > # HIGH-LEVEL IDEA > > The basic idea is to express the chain call sequence through a special map type, > which contains a mapping from a (program, return code) tuple to another program > to run in next in the sequence. Userspace can populate this map to express > arbitrary call sequences, and update the sequence by updating or replacing the > map. > > The actual execution of the program sequence is done in bpf_prog_run_xdp(), > which will lookup the chain sequence map, and if found, will loop through calls > to BPF_PROG_RUN, looking up the next XDP program in the sequence based on the > previous program ID and return code. > > An XDP chain call map can be installed on an interface by means of a new netlink > attribute containing an fd pointing to a chain call map. This can be supplied > along with the XDP prog fd, so that a chain map is always installed together > with an XDP program. > > # PERFORMANCE > > I performed a simple performance test to get an initial feel for the overhead of > the chain call mechanism. This test consists of running only two programs in > sequence: One that returns XDP_PASS and another that returns XDP_DROP. I then > measure the drop PPS performance and compare it to a baseline of just a single > program that only returns XDP_DROP. > > For comparison, a test case that uses regular eBPF tail calls to sequence two > programs together is also included. Finally, because 'perf' showed that the > hashmap lookup was the largest single source of overhead, I also added a test > case where I removed the jhash() call from the hashmap code, and just use the > u32 key directly as an index into the hash bucket structure. > > The performance for these different cases is as follows (with retpolines disabled): retpolines enabled would also be interesting. > > | Test case | Perf | Add. overhead | Total overhead | > |---------------------------------+-----------+---------------+----------------| > | Before patch (XDP DROP program) | 31.0 Mpps | | | > | After patch (XDP DROP program) | 28.9 Mpps | 2.3 ns | 2.3 ns | IMO even 1 Mpps overhead is too much for a feature that is primarily about ease of use. Sacrificing performance to make userland a bit easier is hard to justify for me when XDP _is_ singularly about performance. Also that is nearly 10% overhead which is fairly large. So I think going forward the performance gab needs to be removed. > | XDP tail call | 26.6 Mpps | 3.0 ns | 5.3 ns | > | XDP chain call (no jhash) | 19.6 Mpps | 13.4 ns | 18.7 ns | > | XDP chain call (this series) | 17.0 Mpps | 7.9 ns | 26.6 ns | > > From this it is clear that while there is some overhead from this mechanism; but > the jhash removal example indicates that it is probably possible to optimise the > code to the point where the overhead becomes low enough that it is acceptable. I'm missing why 'in theory' at least this can't be made as-fast as tail calls? Again I can't see why someone would lose 30% of their performance when a userland program could populate a tail call map for the same effect. Sure userland would also have to enforce some program standards/conventions but it could be done and at 30% overhead that pain is probably worth it IMO. My thinking though is if we are a bit clever chaining and tail calls could be performance-wise equivalent? I'll go read the patches now ;) .John > > # PATCH SET STRUCTURE > This series is structured as follows: > > - Patch 1: Prerequisite > - Patch 2: New map type > - Patch 3: Netlink hooks to install the chain call map > - Patch 4: Core chain call logic > - Patch 5-7: Bookkeeping updates to tools > - Patch 8: Libbpf support for installing chain call maps > - Patch 9: Selftests with example user space code > > The whole series is also available in my git repo on kernel.org: > https://git.kernel.org/pub/scm/linux/kernel/git/toke/linux.git/log/?h=xdp-multiprog-01 >