Daniel Borkmann <daniel@xxxxxxxxxxxxx> writes: > On Wed, Oct 02, 2019 at 09:43:49AM -0700, John Fastabend wrote: >> Toke Høiland-Jørgensen wrote: >> > This series adds support for executing multiple XDP programs on a single >> > interface in sequence, through the use of chain calls, as discussed at the Linux >> > Plumbers Conference last month: >> > >> > https://linuxplumbersconf.org/event/4/contributions/460/ >> > >> > # HIGH-LEVEL IDEA >> > >> > The basic idea is to express the chain call sequence through a special map type, >> > which contains a mapping from a (program, return code) tuple to another program >> > to run in next in the sequence. Userspace can populate this map to express >> > arbitrary call sequences, and update the sequence by updating or replacing the >> > map. >> > >> > The actual execution of the program sequence is done in bpf_prog_run_xdp(), >> > which will lookup the chain sequence map, and if found, will loop through calls >> > to BPF_PROG_RUN, looking up the next XDP program in the sequence based on the >> > previous program ID and return code. >> > >> > An XDP chain call map can be installed on an interface by means of a new netlink >> > attribute containing an fd pointing to a chain call map. This can be supplied >> > along with the XDP prog fd, so that a chain map is always installed together >> > with an XDP program. >> > >> > # PERFORMANCE >> > >> > I performed a simple performance test to get an initial feel for the overhead of >> > the chain call mechanism. This test consists of running only two programs in >> > sequence: One that returns XDP_PASS and another that returns XDP_DROP. I then >> > measure the drop PPS performance and compare it to a baseline of just a single >> > program that only returns XDP_DROP. >> > >> > For comparison, a test case that uses regular eBPF tail calls to sequence two >> > programs together is also included. Finally, because 'perf' showed that the >> > hashmap lookup was the largest single source of overhead, I also added a test >> > case where I removed the jhash() call from the hashmap code, and just use the >> > u32 key directly as an index into the hash bucket structure. >> > >> > The performance for these different cases is as follows (with retpolines disabled): >> >> retpolines enabled would also be interesting. >> >> > >> > | Test case | Perf | Add. overhead | Total overhead | >> > |---------------------------------+-----------+---------------+----------------| >> > | Before patch (XDP DROP program) | 31.0 Mpps | | | >> > | After patch (XDP DROP program) | 28.9 Mpps | 2.3 ns | 2.3 ns | >> >> IMO even 1 Mpps overhead is too much for a feature that is primarily about >> ease of use. Sacrificing performance to make userland a bit easier is hard >> to justify for me when XDP _is_ singularly about performance. Also that is >> nearly 10% overhead which is fairly large. So I think going forward the >> performance gab needs to be removed. > > Fully agree, for the case where this facility is not used, it must > have *zero* overhead. This is /one/ map flavor, in future there will > be other facilities with different use-cases, but we cannot place them > all into the critical fast-path. Given this is BPF, we have the > flexibility that this can be hidden behind the scenes by rewriting and > therefore only add overhead when used. > > What I also see as a red flag with this proposal is the fact that it's > tied to XDP only because you need to go and hack bpf_prog_run_xdp() > all the way to fetch xdp->rxq->dev->xdp_chain_map even though the > map/concept itself is rather generic and could be used in various > other program types as well. I'm very sure that once there, people > would request it. Therefore, better to explore a way where this has no > changes to BPF_PROG_RUN() similar to the original tail call work. As I said in the other reply, I actually went out of my way to make this XDP only. But since you're now the third person requesting it not be, I guess I'll take the hint and look at a more general way to hook this in :) -Toke