Re: [PATCH bpf-next 0/9] xdp: Support multiple programs on a single interface through chain calls

Song Liu <songliubraving@xxxxxx> · Wed, 2 Oct 2019 18:38:59 +0000

> On Oct 2, 2019, at 6:30 AM, Toke Høiland-Jørgensen <toke@xxxxxxxxxx> wrote:
> 
> This series adds support for executing multiple XDP programs on a single
> interface in sequence, through the use of chain calls, as discussed at the Linux
> Plumbers Conference last month:
> 
> https://urldefense.proofpoint.com/v2/url?u=https-3A__linuxplumbersconf.org_event_4_contributions_460_&d=DwIDaQ&c=5VD0RTtNlTh3ycd41b3MUw&r=dR8692q0_uaizy0jkrBJQM5k2hfm4CiFxYT8KaysFrg&m=YXqqHTC51zXBviPBEk55y-fQjFQwcXWFlH0IoOqm2KU&s=NF4w3eSPmNhSpJr1-0FLqqlqfgEV8gsCQb9YqWQ9p-k&e= 
> 
> # HIGH-LEVEL IDEA
> 
> The basic idea is to express the chain call sequence through a special map type,
> which contains a mapping from a (program, return code) tuple to another program
> to run in next in the sequence. Userspace can populate this map to express
> arbitrary call sequences, and update the sequence by updating or replacing the
> map.
> 
> The actual execution of the program sequence is done in bpf_prog_run_xdp(),
> which will lookup the chain sequence map, and if found, will loop through calls
> to BPF_PROG_RUN, looking up the next XDP program in the sequence based on the
> previous program ID and return code.
> 
> An XDP chain call map can be installed on an interface by means of a new netlink
> attribute containing an fd pointing to a chain call map. This can be supplied
> along with the XDP prog fd, so that a chain map is always installed together
> with an XDP program.

Interesting work!

Quick question: can we achieve the same by adding a "retval to call_tail_next" 
map to each program? I think one issue is how to avoid loop like A->B->C->A, 
but this should be solvable? 

> 
> # PERFORMANCE
> 
> I performed a simple performance test to get an initial feel for the overhead of
> the chain call mechanism. This test consists of running only two programs in
> sequence: One that returns XDP_PASS and another that returns XDP_DROP. I then
> measure the drop PPS performance and compare it to a baseline of just a single
> program that only returns XDP_DROP.
> 
> For comparison, a test case that uses regular eBPF tail calls to sequence two
> programs together is also included. Finally, because 'perf' showed that the
> hashmap lookup was the largest single source of overhead, I also added a test
> case where I removed the jhash() call from the hashmap code, and just use the
> u32 key directly as an index into the hash bucket structure.
> 
> The performance for these different cases is as follows (with retpolines disabled):
> 
> | Test case                       | Perf      | Add. overhead | Total overhead |
> |---------------------------------+-----------+---------------+----------------|
> | Before patch (XDP DROP program) | 31.0 Mpps |               |                |
> | After patch (XDP DROP program)  | 28.9 Mpps |        2.3 ns |         2.3 ns |
> | XDP tail call                   | 26.6 Mpps |        3.0 ns |         5.3 ns |
> | XDP chain call (no jhash)       | 19.6 Mpps |       13.4 ns |        18.7 ns |
> | XDP chain call (this series)    | 17.0 Mpps |        7.9 ns |        26.6 ns |
> 
> From this it is clear that while there is some overhead from this mechanism; but
> the jhash removal example indicates that it is probably possible to optimise the
> code to the point where the overhead becomes low enough that it is acceptable.

I think we can probably re-jit multiple programs into one based on the mapping, 
which should give the best performance. 

Thanks,
Song