On Wed, Nov 30, 2022 at 05:09:03PM -0800, Andrii Nakryiko wrote: > On Tue, Nov 29, 2022 at 12:12 PM Toke Høiland-Jørgensen <toke@xxxxxxxxxx> wrote: > > >> > This week Kumar and I took a look at this issue and we ended up > > >> > identifying a duplication of nf_conn___init structure. In particular: > > >> > > > >> > [~/workspace/bpf-next]$ bpftool btf --base-btf vmlinux dump file > > >> > net/netfilter/nf_conntrack.ko format raw | grep nf_conn__ > > >> > [110941] STRUCT 'nf_conn___init' size=248 vlen=1 > > >> > [~/workspace/bpf-next]$ bpftool btf --base-btf vmlinux dump file > > >> > net/netfilter/nf_nat.ko format raw | grep nf_conn__ > > >> > [107488] STRUCT 'nf_conn___init' size=248 vlen=1 > > >> > > > >> > Is it the root cause of the problem? > > >> > > >> It certainly seems to be related to it, at least. Amending the log > > >> message to include the BTF object IDs of the two versions shows that the > > >> register has a reference to nf_conn__init in nf_conntrack.ko, while the kernel > > >> expects it to point to nf_nat.ko. > > >> > > >> Not sure what's the right fix for this? Should libbpf be smart enough to > > >> pull the kfunc arg ID from the same BTF ID as the function itself? Or > > Libbpf is doing just that. Or rather this just happens automatically. > Libbpf finds the FUNC type corresponding to a kfunc, and then all the > types of all the arguments are consistent with that FUNC definition. > > I think the problem is that test is getting `struct nf_conn` from > bpf_xdp_ct_alloc() kfunc, which is defined in nf_conntrack module (and > so specifies that it returns `struct nf_conn` coming from > nf_conntrack's module BTF), while bpf_ct_set_nat_info() kfunc is > defined in nf_nat module and specifies that it expects `struct > nf_conn` defined in nf_nat's module BTF. > > And those two types are two completely different types, with different > BTF object ID and BTF type ID, as far as all the BTF stuff is > concerned. > > I don't know what the solution here is, but it's not on the libbpf > side at all for sure. As Toke said, bringing BTF dedup into the kernel > seems like an overkill. So some hacky "let's compare struct name and > size" approach perhaps? Wouldn't that be a bit too relaxed for a general case? I wonder how often can this issue come up. If this is relatively rare maybe known kfuncs that need this can be flagged with a new flag (KF_RELAXED_ARG_CHECK or similar) to allow this shortcut? -- Regards, Artem