On Thu, Sep 02, 2021 at 02:57:11PM +0200, Jiri Olsa wrote: > > > > Let's say we have 5 kernel functions: a, b, c, d, e. Say a, b, c all > > have 1 input args, and d and e have 2. > > > > Now let's say we attach just normal fentry program A to function a. > > Also we attach normal fexit program E to func e. > > > > We'll have A attached to a with trampoline T1. We'll also have E > > attached to e with trampoline T2. Right? > > > > And now we try to attach generic fentry (fentry.multi in your > > terminology) prog X to all 5 of them. If A and E weren't attached, > > we'd need two generic trampolines, one for a, b, c (because 1 input > > argument) and another for d,e (because 2 input arguments). But because > > we already have A and B attached, we'll end up needing 4: > > > > T1 (1 arg) for func a calling progs A and X > > T2 (2 args) for func e calling progs E and X > > T3 (1 arg) for func b and c calling X > > T4 (2 args) for func d calling X > > so current code would group T3/T4 together, but if we keep > them separated, then we won't need to use new model and > cut off some of the code, ok We've brainstormed this idea further with Andrii. (thankfully we could do it in-person now ;) which saved a ton of time) It seems the following should work: 5 kernel functions: a(int), b(long), c(void*), d(int, int), e(long, long). fentry prog A is attached to 'a'. fexit prog E is attached to 'e'. multi-prog X wants to attach to all of them. It can be achieved with 4 trampolines. The trampolines called from funcs 'a' and 'e' can be patched to call A+X and E+X programs correspondingly. The multi program X needs to be able to access return values and arguments of all functions it was attached to. We can achieve that by always generating a trampoline (both multi and normal) with extra constant stored in the stack. This constant is the number of arguments served by this trampoline. The trampoline 'a' will store nr_args=1. The tramopline 'e' will store nr_args=2. We need two multi trampolines. The multi tramopline X1 that will serve 'b' and 'c' and store nr_args=1 and multi-tramopline X2 that will serve 'd' and store nr_args=2 into hidden stack location (like ctx[-2]). The multi prog X can look like: int BPF_PROG(x, __u64 arg1, __u64 arg2, __u64 ret) in such case it will read correct args and ret when called from 'd' and 'e' and only correct arg1 when called from 'a', 'b', 'c'. To always correctly access arguments and the return value the program can use two new helpers: bpf_arg(ctx, N) and bpf_ret_value(ctx). Both will be fully inlined helpers similar to bpf_get_func_ip(). u64 bpf_arg(ctx, int n) { u64 nr_args = ctx[-2]; /* that's the place where _all_ trampoline will store nr_args */ if (n > nr_args) return 0; return ctx[n]; } u64 bpf_ret_value(ctx) { u64 nr_args = ctx[-2]; return ctx[nr_args]; } These helpers will be the only recommended way to access args and ret value in multi progs. The nice advantage is that normal fentry/fexit progs can use them too. We can rearrange ctx[-1] /* func_ip */ and ctx[-2] /* nr_args */ if it makes things easier. If multi prog knows that it is attaching to 100 kernel functions and all of them have 2 arguments it can still do int BPF_PROG(x, __u64 arg1, __u64 arg2, __u64 ret) { // access arg1, arg2, ret directly and it will work correctly. We can make it really strict in the verifier and disallow such direct access to args from the multi prog and only allow access via bpf_arg/bpf_ret_value helpers, but I think it's overkill. Reading garbage values from stack isn't great, but it's not a safety issue. It means that the verifier will allow something like 16 u64-s args in multi program. It cannot allow large number, since ctx[1024] might become a safety issue, while ctx[4] could be a garbage or a valid value depending on the call site. Thoughts?