Alan Maguire <alan.maguire@xxxxxxxxxx> writes: > On 13/11/2022 18:04, Toke Høiland-Jørgensen wrote: >> Lorenzo Bianconi <lorenzo.bianconi@xxxxxxxxxx> writes: >> >>>> >>>> Hi everyone >>>> >>>> There seems to be some issue with BTF mismatch when trying to run the >>>> bpf_ct_set_nat_info() kfunc from a module. I was under the impression >>>> that this is supposed to work, so is there some kind of BTF dedup issue >>>> here or something? >>>> >>>> Steps to reproduce: >>>> >>>> 1. Compile kernel with nf_conntrack built-in and run selftests; >>>> './test_progs -a bpf_nf' works >>>> >>>> 2. Change the kernel config so nf_conntrack is build as a module >>>> >>>> 3. Start the test kernel and manually modprobe nf_conntrack and nf_nat >>>> >>>> 4. Run ./test_progs -a bpf_nf; this now fails with an error like: >>>> >>>> kernel function bpf_ct_set_nat_info args#0 expected pointer to STRUCT nf_conn___init but R1 has a pointer to STRUCT nf_conn___init >>> >>> This week Kumar and I took a look at this issue and we ended up >>> identifying a duplication of nf_conn___init structure. In particular: >>> >>> [~/workspace/bpf-next]$ bpftool btf --base-btf vmlinux dump file >>> net/netfilter/nf_conntrack.ko format raw | grep nf_conn__ >>> [110941] STRUCT 'nf_conn___init' size=248 vlen=1 >>> [~/workspace/bpf-next]$ bpftool btf --base-btf vmlinux dump file >>> net/netfilter/nf_nat.ko format raw | grep nf_conn__ >>> [107488] STRUCT 'nf_conn___init' size=248 vlen=1 >>> >>> Is it the root cause of the problem? >> >> It certainly seems to be related to it, at least. Amending the log >> message to include the BTF object IDs of the two versions shows that the >> register has a reference to nf_conn__init in nf_conntrack.ko, while the kernel >> expects it to point to nf_nat.ko. >> >> Not sure what's the right fix for this? Should libbpf be smart enough to >> pull the kfunc arg ID from the same BTF ID as the function itself? Or >> should the kernel compare structs and allow things if they're identical? >> Andrii, WDYT? >> > > There were some dedup issues fixed recently in pahole > and libbpf; since dwarves libbpf hasn't been synced with > libbpf recently as far as I can see it won't have the fix > for [1]; I suspect it may help with dedup-ing here. Would > probably be worth trying rebuilding dwarves with a libbpf > with [1] applied and seeing if the dedup issue goes away > before we go any further. If it fixes the issue, would it > be worth updating the libbpf that dwarves uses Arnaldo? > I saw some pretty large improvements in removing > redundant definitions. I don't think it's a deduplication issue; the type is simply not defined in vmlinux, so each module has its own "version" of it. And since each module's BTF is semantically independent of other modules the duplication is expected in this case. -Toke