On 11/11/19 8:38 PM, Andrii Nakryiko wrote: > On Mon, Nov 11, 2019 at 3:04 PM Alexei Starovoitov > <alexei.starovoitov@xxxxxxxxx> wrote: >> >> On Sat, Nov 09, 2019 at 11:17:37PM -0800, Andrii Nakryiko wrote: >>> On Thu, Nov 7, 2019 at 10:41 PM Alexei Starovoitov <ast@xxxxxxxxxx> wrote: >>>> >>>> Allow FENTRY/FEXIT BPF programs to attach to other BPF programs of any type >>>> including their subprograms. This feature allows snooping on input and output >>>> packets in XDP, TC programs including their return values. In order to do that >>>> the verifier needs to track types not only of vmlinux, but types of other BPF >>>> programs as well. The verifier also needs to translate uapi/linux/bpf.h types >>>> used by networking programs into kernel internal BTF types used by FENTRY/FEXIT >>>> BPF programs. In some cases LLVM optimizations can remove arguments from BPF >>>> subprograms without adjusting BTF info that LLVM backend knows. When BTF info >>>> disagrees with actual types that the verifiers sees the BPF trampoline has to >>>> fallback to conservative and treat all arguments as u64. The FENTRY/FEXIT >>>> program can still attach to such subprograms, but won't be able to recognize >>>> pointer types like 'struct sk_buff *' into won't be able to pass them to >>>> bpf_skb_output() for dumping to user space. >>>> >>>> The BPF_PROG_LOAD command is extended with attach_prog_fd field. When it's set >>>> to zero the attach_btf_id is one vmlinux BTF type ids. When attach_prog_fd >>>> points to previously loaded BPF program the attach_btf_id is BTF type id of >>>> main function or one of its subprograms. >>>> >>>> Signed-off-by: Alexei Starovoitov <ast@xxxxxxxxxx> >>>> --- >>>> arch/x86/net/bpf_jit_comp.c | 3 +- >>>> include/linux/bpf.h | 2 + >>>> include/linux/btf.h | 1 + >>>> include/uapi/linux/bpf.h | 1 + >>>> kernel/bpf/btf.c | 58 +++++++++++++++++++--- >>>> kernel/bpf/core.c | 2 + >>>> kernel/bpf/syscall.c | 19 +++++-- >>>> kernel/bpf/verifier.c | 98 +++++++++++++++++++++++++++++-------- >>>> kernel/trace/bpf_trace.c | 2 - >>>> 9 files changed, 151 insertions(+), 35 deletions(-) >>>> >>> >>> [...] >>> >>>> + >>>> +static bool btf_translate_to_vmlinux(struct bpf_verifier_log *log, >>>> + struct btf *btf, >>>> + const struct btf_type *t, >>>> + struct bpf_insn_access_aux *info) >>>> +{ >>>> + const char *tname = __btf_name_by_offset(btf, t->name_off); >>>> + int btf_id; >>>> + >>>> + if (!tname) { >>>> + bpf_log(log, "Program's type doesn't have a name\n"); >>>> + return false; >>>> + } >>>> + if (strcmp(tname, "__sk_buff") == 0) { >>> >>> might be a good idea to ensure that t's type is also a struct? >>> >>>> + btf_id = btf_resolve_helper_id(log, &bpf_skb_output_proto, 0); >>> >>> This is kind of ugly and high-maintenance. Have you considered having >>> something like this, to do this mapping: >>> >>> struct bpf_ctx_mapping { >>> struct sk_buff *__sk_buff; >>> struct xdp_buff *xdp_md; >>> }; >>> >>> So field name is a name you are trying to match, while field type is >>> actual type you are mapping to? You won't need to find special >>> function protos (like bpf_skb_output_proto), it will be easy to >>> extend, you'll have real vmlinux types automatically captured for you >>> (you'll just have to initially find bpf_ctx_mapping's btf_id). >> >> I was thinking something along these lines. >> The problem with single struct like above is that it's centralized. >> convert_ctx_access callbacks are all over the place. >> So I'm thinking to add macro like this to bpf.h >> +#define BPF_RECORD_CTX_CONVERSION(user_type, kernel_type) \ >> + ({typedef kernel_type (*bpf_ctx_convert)(user_type); \ >> + (void) (bpf_ctx_convert) (void *) 0;}) >> >> and then do >> BPF_RECORD_CTX_CONVERSION(struct bpf_xdp_sock, struct xdp_sock); >> inside convert_ctx_access functions (like bpf_xdp_sock_convert_ctx_access). >> There will be several typedefs with 'bpf_ctx_convert' name. The >> btf_translate_to_vmlinux() will iterate over them. Speed is not criticial here, > > I guess that works as well. Please leave a comment explaining the idea > behind this distributed mapping :) > >> but long term we probably need to merge prog's BTF with vmlinux's BTF, so most >> of the type comparison is done during prog load. It probably should reduce the >> size of prog's BTF too. Renumbering of prog's BTF will be annoying though. >> Something to consider long term. >> >>> >>>> + if (btf_id < 0) >>>> + return false; >>>> + info->btf_id = btf_id; >>>> + return true; >>>> + } >>>> + return false; >>>> +} >>>> >>> >>> [...] >>> >>>> + if (tgt_prog && conservative) { >>>> + struct btf_func_model *m = &tr->func.model; >>>> + >>>> + /* BTF function prototype doesn't match the verifier types. >>>> + * Fall back to 5 u64 args. >>>> + */ >>>> + for (i = 0; i < 5; i++) >>>> + m->arg_size[i] = 8; >>>> + m->ret_size = 8; >>>> + m->nr_args = 5; >>>> + prog->aux->attach_func_proto = NULL; >>>> + } else { >>>> + ret = btf_distill_func_proto(&env->log, btf, t, >>>> + tname, &tr->func.model); >>> >>> there is nothing preventing some parallel thread to modify >>> tr->func.model in parallel, right? Should these modifications be >>> either locked or at least WRITE_ONCE, similar to >>> btf_resolve_helper_id? >> >> hmm. Right. There is a race with bpf_trampoline_lookup. One thread could have >> just created the trampoline and still doing distill, while another thread is >> trying to use it after getting it from bpf_trampoline_lookup. The fix choices >> are not pretty. Either to add a mutex to check_attach_btf_id() or do >> bpf_trampoline_lookup_or_create() with extra callback that does >> btf_distill_func_proto while bpf_trampoline_lookup_or_create is holding >> trampoline_mutex or move most of the check_attach_btf_id() logic into >> bpf_trampoline_lookup_or_create(). >> I tried to keep trampoline as abstract concept, but with callback or move >> the verifer and btf logic will bleed into trampoline. Hmm. > > yeah, that sounds too intrusive. I'd change btf_distill_func_proto to > accept struct btf_func_model **m, allocate model dynamically, and then > compare_exchange the final constructed model pointer. cmpxchg is too ugly and also not covering all other fields that may need to have serialized access in the future. I went with simpler model of additional mutex per trampoline. It also helped to avoid global mutex for link/unlink.