On 13/11/2022 18:04, Toke Høiland-Jørgensen wrote: > Lorenzo Bianconi <lorenzo.bianconi@xxxxxxxxxx> writes: > >>> >>> Hi everyone >>> >>> There seems to be some issue with BTF mismatch when trying to run the >>> bpf_ct_set_nat_info() kfunc from a module. I was under the impression >>> that this is supposed to work, so is there some kind of BTF dedup issue >>> here or something? >>> >>> Steps to reproduce: >>> >>> 1. Compile kernel with nf_conntrack built-in and run selftests; >>> './test_progs -a bpf_nf' works >>> >>> 2. Change the kernel config so nf_conntrack is build as a module >>> >>> 3. Start the test kernel and manually modprobe nf_conntrack and nf_nat >>> >>> 4. Run ./test_progs -a bpf_nf; this now fails with an error like: >>> >>> kernel function bpf_ct_set_nat_info args#0 expected pointer to STRUCT nf_conn___init but R1 has a pointer to STRUCT nf_conn___init >> >> This week Kumar and I took a look at this issue and we ended up >> identifying a duplication of nf_conn___init structure. In particular: >> >> [~/workspace/bpf-next]$ bpftool btf --base-btf vmlinux dump file >> net/netfilter/nf_conntrack.ko format raw | grep nf_conn__ >> [110941] STRUCT 'nf_conn___init' size=248 vlen=1 >> [~/workspace/bpf-next]$ bpftool btf --base-btf vmlinux dump file >> net/netfilter/nf_nat.ko format raw | grep nf_conn__ >> [107488] STRUCT 'nf_conn___init' size=248 vlen=1 >> >> Is it the root cause of the problem? > > It certainly seems to be related to it, at least. Amending the log > message to include the BTF object IDs of the two versions shows that the > register has a reference to nf_conn__init in nf_conntrack.ko, while the kernel > expects it to point to nf_nat.ko. > > Not sure what's the right fix for this? Should libbpf be smart enough to > pull the kfunc arg ID from the same BTF ID as the function itself? Or > should the kernel compare structs and allow things if they're identical? > Andrii, WDYT? > There were some dedup issues fixed recently in pahole and libbpf; since dwarves libbpf hasn't been synced with libbpf recently as far as I can see it won't have the fix for [1]; I suspect it may help with dedup-ing here. Would probably be worth trying rebuilding dwarves with a libbpf with [1] applied and seeing if the dedup issue goes away before we go any further. If it fixes the issue, would it be worth updating the libbpf that dwarves uses Arnaldo? I saw some pretty large improvements in removing redundant definitions. Thanks! Alan https://lore.kernel.org/bpf/1666622309-22289-1-git-send-email-alan.maguire@xxxxxxxxxx/ > -Toke >