Re: [PATCH bpf-next v4 18/27] bpf, x64: Store properly return value for trampoline with multi func programs

Jiri Olsa <jolsa@xxxxxxxxxx> · Fri, 3 Sep 2021 11:50:09 +0200



On Thu, Sep 02, 2021 at 02:55:38PM -0700, Alexei Starovoitov wrote:
> On Thu, Sep 02, 2021 at 02:57:11PM +0200, Jiri Olsa wrote:
> > > 
> > > Let's say we have 5 kernel functions: a, b, c, d, e. Say a, b, c all
> > > have 1 input args, and d and e have 2.
> > > 
> > > Now let's say we attach just normal fentry program A to function a.
> > > Also we attach normal fexit program E to func e.
> > > 
> > > We'll have A  attached to a with trampoline T1. We'll also have E
> > > attached to e with trampoline T2. Right?
> > > 
> > > And now we try to attach generic fentry (fentry.multi in your
> > > terminology) prog X to all 5 of them. If A and E weren't attached,
> > > we'd need two generic trampolines, one for a, b, c (because 1 input
> > > argument) and another for d,e (because 2 input arguments). But because
> > > we already have A and B attached, we'll end up needing 4:
> > > 
> > > T1 (1 arg)  for func a calling progs A and X
> > > T2 (2 args) for func e calling progs E and X
> > > T3 (1 arg)  for func b and c calling X
> > > T4 (2 args) for func d calling X
> > 
> > so current code would group T3/T4 together, but if we keep
> > them separated, then we won't need to use new model and
> > cut off some of the code, ok
> 
> We've brainstormed this idea further with Andrii.
> (thankfully we could do it in-person now ;) which saved a ton of time)
> 
> It seems the following should work:
> 5 kernel functions: a(int), b(long), c(void*), d(int, int), e(long, long).
> fentry prog A is attached to 'a'.
> fexit prog E is attached to 'e'.
> multi-prog X wants to attach to all of them.
> It can be achieved with 4 trampolines.
> 
> The trampolines called from funcs 'a' and 'e' can be patched to
> call A+X and E+X programs correspondingly.
> The multi program X needs to be able to access return values
> and arguments of all functions it was attached to.
> We can achieve that by always generating a trampoline (both multi and normal)
> with extra constant stored in the stack. This constant is the number of
> arguments served by this trampoline.
> The trampoline 'a' will store nr_args=1.
> The tramopline 'e' will store nr_args=2.
> We need two multi trampolines.
> The multi tramopline X1 that will serve 'b' and 'c' and store nr_args=1
> and multi-tramopline X2 that will serve 'd' and store nr_args=2
> into hidden stack location (like ctx[-2]).
> 
> The multi prog X can look like:
> int BPF_PROG(x, __u64 arg1, __u64 arg2, __u64 ret)
> in such case it will read correct args and ret when called from 'd' and 'e'
> and only correct arg1 when called from 'a', 'b', 'c'.
> 
> To always correctly access arguments and the return value
> the program can use two new helpers: bpf_arg(ctx, N) and bpf_ret_value(ctx).
> Both will be fully inlined helpers similar to bpf_get_func_ip().
> u64 bpf_arg(ctx, int n)
> {
>   u64 nr_args = ctx[-2]; /* that's the place where _all_ trampoline will store nr_args */
>   if (n > nr_args)
>     return 0;
>   return ctx[n];
> }
> u64 bpf_ret_value(ctx)
> {
>   u64 nr_args = ctx[-2];
>   return ctx[nr_args];
> }

ok, this is much better then rewiring args access in verifier

> 
> These helpers will be the only recommended way to access args and ret value
> in multi progs.
> The nice advantage is that normal fentry/fexit progs can use them too.
> 
> We can rearrange ctx[-1] /* func_ip */ and ctx[-2] /* nr_args */
> if it makes things easier.

so nr_args will be there all the time, while func_ip is optional
at the moment (based on get_func_ip helper presence in program),
so we can either switch that:

   func_ip in ctx[-2]
   nr_args in ctx[-1]

or make func_ip not optional to avoid confusion

I think pushing func_ip to ctx-2 is ok

> 
> If multi prog knows that it is attaching to 100 kernel functions
> and all of them have 2 arguments it can still do
> int BPF_PROG(x, __u64 arg1, __u64 arg2, __u64 ret)
> { // access arg1, arg2, ret directly
> and it will work correctly.

ok, it's user's decision, because at load time we don't know the
functions it will be attached to, so verifier can't do anything

> 
> We can make it really strict in the verifier and disallow such
> direct access to args from the multi prog and only allow
> access via bpf_arg/bpf_ret_value helpers, but I think it's overkill.
> Reading garbage values from stack isn't great, but it's not a safety issue.

we could also check it in attach time and forbid to attach if there
are attach functions with different nr_args and program does not use
arg helpers


> It means that the verifier will allow something like 16 u64-s args
> in multi program. It cannot allow large number, since ctx[1024]
> might become a safety issue, while ctx[4] could be a garbage
> or a valid value depending on the call site.
> 
> Thoughts?
> 

looks good, thanks for solving this ;-)

jirka