Artem Savkov <asavkov@xxxxxxxxxx> writes: > On Sun, Nov 13, 2022 at 07:04:43PM +0100, Toke Høiland-Jørgensen wrote: >> Lorenzo Bianconi <lorenzo.bianconi@xxxxxxxxxx> writes: >> >> >> >> >> Hi everyone >> >> >> >> There seems to be some issue with BTF mismatch when trying to run the >> >> bpf_ct_set_nat_info() kfunc from a module. I was under the impression >> >> that this is supposed to work, so is there some kind of BTF dedup issue >> >> here or something? >> >> >> >> Steps to reproduce: >> >> >> >> 1. Compile kernel with nf_conntrack built-in and run selftests; >> >> './test_progs -a bpf_nf' works >> >> >> >> 2. Change the kernel config so nf_conntrack is build as a module >> >> >> >> 3. Start the test kernel and manually modprobe nf_conntrack and nf_nat >> >> >> >> 4. Run ./test_progs -a bpf_nf; this now fails with an error like: >> >> >> >> kernel function bpf_ct_set_nat_info args#0 expected pointer to STRUCT nf_conn___init but R1 has a pointer to STRUCT nf_conn___init >> > >> > This week Kumar and I took a look at this issue and we ended up >> > identifying a duplication of nf_conn___init structure. In particular: >> > >> > [~/workspace/bpf-next]$ bpftool btf --base-btf vmlinux dump file >> > net/netfilter/nf_conntrack.ko format raw | grep nf_conn__ >> > [110941] STRUCT 'nf_conn___init' size=248 vlen=1 >> > [~/workspace/bpf-next]$ bpftool btf --base-btf vmlinux dump file >> > net/netfilter/nf_nat.ko format raw | grep nf_conn__ >> > [107488] STRUCT 'nf_conn___init' size=248 vlen=1 >> > >> > Is it the root cause of the problem? >> >> It certainly seems to be related to it, at least. Amending the log >> message to include the BTF object IDs of the two versions shows that the >> register has a reference to nf_conn__init in nf_conntrack.ko, while the kernel >> expects it to point to nf_nat.ko. >> >> Not sure what's the right fix for this? Should libbpf be smart enough to >> pull the kfunc arg ID from the same BTF ID as the function itself? Or > > Verifier chose the ID based on where the variable was allocated, which > is bpf_(skb|xdp)_ct_alloc() from nf_conntrack.ko. Assuming btf id based > on usage location might be error prone. Yeah, I think we need something more robust. >> should the kernel compare structs and allow things if they're identical? >> Andrii, WDYT? > > This works but might make verifier slower. > > diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c > index 195d24316750..562d2c15906d 100644 > --- a/kernel/bpf/verifier.c > +++ b/kernel/bpf/verifier.c > @@ -24,6 +24,7 @@ > #include <linux/bpf_lsm.h> > #include <linux/btf_ids.h> > #include <linux/poison.h> > +#include "../tools/lib/bpf/relo_core.h" > > #include "disasm.h" > > @@ -8236,7 +8237,8 @@ static int process_kf_arg_ptr_to_btf_id(struct bpf_verifier_env *env, > > reg_ref_t = btf_type_skip_modifiers(reg_btf, reg_ref_id, ®_ref_id); > reg_ref_tname = btf_name_by_offset(reg_btf, reg_ref_t->name_off); > - if (!btf_struct_ids_match(&env->log, reg_btf, reg_ref_id, reg->off, meta->btf, ref_id, strict_type_match)) { > + if (!btf_struct_ids_match(&env->log, reg_btf, reg_ref_id, reg->off, meta->btf, ref_id, strict_type_match) && \ > + !__bpf_core_types_match(reg_btf, reg_ref_id, meta->btf, ref_id, false, 10)) { > verbose(env, "kernel function %s args#%d expected pointer to %s %s but R%d has a pointer to %s %s\n", > meta->func_name, argno, btf_type_str(ref_t), ref_tname, argno + 1, > btf_type_str(reg_ref_t), reg_ref_tname); Ah, cool! This is a lot of code to important into the kernel, though; do we really want to do that? I guess an alternative could be to make sure that every data type used by a kfunc is always referenced in a built-in module somewhere, so modules will use the definition from vmlinux. Andrii, Alexei, any opinions on which way to go with this? -Toke