Re: [PATCH v3 bpf-next 15/18] bpf: Support attaching tracing BPF program to other BPF programs

Alexei Starovoitov <alexei.starovoitov@xxxxxxxxx> · Mon, 11 Nov 2019 15:04:00 -0800

On Sat, Nov 09, 2019 at 11:17:37PM -0800, Andrii Nakryiko wrote:
> On Thu, Nov 7, 2019 at 10:41 PM Alexei Starovoitov <ast@xxxxxxxxxx> wrote:
> >
> > Allow FENTRY/FEXIT BPF programs to attach to other BPF programs of any type
> > including their subprograms. This feature allows snooping on input and output
> > packets in XDP, TC programs including their return values. In order to do that
> > the verifier needs to track types not only of vmlinux, but types of other BPF
> > programs as well. The verifier also needs to translate uapi/linux/bpf.h types
> > used by networking programs into kernel internal BTF types used by FENTRY/FEXIT
> > BPF programs. In some cases LLVM optimizations can remove arguments from BPF
> > subprograms without adjusting BTF info that LLVM backend knows. When BTF info
> > disagrees with actual types that the verifiers sees the BPF trampoline has to
> > fallback to conservative and treat all arguments as u64. The FENTRY/FEXIT
> > program can still attach to such subprograms, but won't be able to recognize
> > pointer types like 'struct sk_buff *' into won't be able to pass them to
> > bpf_skb_output() for dumping to user space.
> >
> > The BPF_PROG_LOAD command is extended with attach_prog_fd field. When it's set
> > to zero the attach_btf_id is one vmlinux BTF type ids. When attach_prog_fd
> > points to previously loaded BPF program the attach_btf_id is BTF type id of
> > main function or one of its subprograms.
> >
> > Signed-off-by: Alexei Starovoitov <ast@xxxxxxxxxx>
> > ---
> >  arch/x86/net/bpf_jit_comp.c |  3 +-
> >  include/linux/bpf.h         |  2 +
> >  include/linux/btf.h         |  1 +
> >  include/uapi/linux/bpf.h    |  1 +
> >  kernel/bpf/btf.c            | 58 +++++++++++++++++++---
> >  kernel/bpf/core.c           |  2 +
> >  kernel/bpf/syscall.c        | 19 +++++--
> >  kernel/bpf/verifier.c       | 98 +++++++++++++++++++++++++++++--------
> >  kernel/trace/bpf_trace.c    |  2 -
> >  9 files changed, 151 insertions(+), 35 deletions(-)
> >
> 
> [...]
> 
> > +
> > +static bool btf_translate_to_vmlinux(struct bpf_verifier_log *log,
> > +                                    struct btf *btf,
> > +                                    const struct btf_type *t,
> > +                                    struct bpf_insn_access_aux *info)
> > +{
> > +       const char *tname = __btf_name_by_offset(btf, t->name_off);
> > +       int btf_id;
> > +
> > +       if (!tname) {
> > +               bpf_log(log, "Program's type doesn't have a name\n");
> > +               return false;
> > +       }
> > +       if (strcmp(tname, "__sk_buff") == 0) {
> 
> might be a good idea to ensure that t's type is also a struct?
> 
> > +               btf_id = btf_resolve_helper_id(log, &bpf_skb_output_proto, 0);
> 
> This is kind of ugly and high-maintenance. Have you considered having
> something like this, to do this mapping:
> 
> struct bpf_ctx_mapping {
>     struct sk_buff *__sk_buff;
>     struct xdp_buff *xdp_md;
> };
> 
> So field name is a name you are trying to match, while field type is
> actual type you are mapping to? You won't need to find special
> function protos (like bpf_skb_output_proto), it will be easy to
> extend, you'll have real vmlinux types automatically captured for you
> (you'll just have to initially find bpf_ctx_mapping's btf_id).

I was thinking something along these lines.
The problem with single struct like above is that it's centralized.
convert_ctx_access callbacks are all over the place.
So I'm thinking to add macro like this to bpf.h
+#define BPF_RECORD_CTX_CONVERSION(user_type, kernel_type) \
+       ({typedef kernel_type (*bpf_ctx_convert)(user_type); \
+        (void) (bpf_ctx_convert) (void *) 0;})

and then do
BPF_RECORD_CTX_CONVERSION(struct bpf_xdp_sock, struct xdp_sock);
inside convert_ctx_access functions (like bpf_xdp_sock_convert_ctx_access).
There will be several typedefs with 'bpf_ctx_convert' name. The
btf_translate_to_vmlinux() will iterate over them. Speed is not criticial here,
but long term we probably need to merge prog's BTF with vmlinux's BTF, so most
of the type comparison is done during prog load. It probably should reduce the
size of prog's BTF too. Renumbering of prog's BTF will be annoying though.
Something to consider long term.

> 
> > +               if (btf_id < 0)
> > +                       return false;
> > +               info->btf_id = btf_id;
> > +               return true;
> > +       }
> > +       return false;
> > +}
> >
> 
> [...]
> 
> > +               if (tgt_prog && conservative) {
> > +                       struct btf_func_model *m = &tr->func.model;
> > +
> > +                       /* BTF function prototype doesn't match the verifier types.
> > +                        * Fall back to 5 u64 args.
> > +                        */
> > +                       for (i = 0; i < 5; i++)
> > +                               m->arg_size[i] = 8;
> > +                       m->ret_size = 8;
> > +                       m->nr_args = 5;
> > +                       prog->aux->attach_func_proto = NULL;
> > +               } else {
> > +                       ret = btf_distill_func_proto(&env->log, btf, t,
> > +                                                    tname, &tr->func.model);
> 
> there is nothing preventing some parallel thread to modify
> tr->func.model in parallel, right? Should these modifications be
> either locked or at least WRITE_ONCE, similar to
> btf_resolve_helper_id?

hmm. Right. There is a race with bpf_trampoline_lookup. One thread could have
just created the trampoline and still doing distill, while another thread is
trying to use it after getting it from bpf_trampoline_lookup. The fix choices
are not pretty. Either to add a mutex to check_attach_btf_id() or do
bpf_trampoline_lookup_or_create() with extra callback that does
btf_distill_func_proto while bpf_trampoline_lookup_or_create is holding
trampoline_mutex or move most of the check_attach_btf_id() logic into
bpf_trampoline_lookup_or_create().
I tried to keep trampoline as abstract concept, but with callback or move
the verifer and btf logic will bleed into trampoline. Hmm.