On 14/10/2022 07:47, Jiri Olsa wrote: > On Thu, Oct 13, 2022 at 03:24:59PM -0700, Andrii Nakryiko wrote: >> On Thu, Oct 13, 2022 at 3:12 PM Jiri Olsa <olsajiri@xxxxxxxxx> wrote: >>> >>> On Thu, Oct 13, 2022 at 08:05:17AM -0700, Jakub Kicinski wrote: >>>> On Wed, 5 Oct 2022 22:07:57 +0200 Jiri Olsa wrote: >>>>>> Yeah, it's there on linux-next, too. >>>>>> >>>>>> Let me grab a fresh VM and try there. Maybe it's my system. Somehow. >>>>> >>>>> ok, I will look around what's the way to install that centos 8 thing >>>> >>>> Any luck? >>> >>> now BTFIDS warnings.. >>> >>> I can see following on centos8 with gcc 8.5: >>> >>> BTFIDS vmlinux >>> WARN: multiple IDs found for 'task_struct': 300, 56614 - using 300 >>> WARN: multiple IDs found for 'file': 540, 56649 - using 540 >>> WARN: multiple IDs found for 'vm_area_struct': 549, 56652 - using 549 >>> WARN: multiple IDs found for 'seq_file': 953, 56690 - using 953 >>> WARN: multiple IDs found for 'inode': 1132, 56966 - using 1132 >>> WARN: multiple IDs found for 'path': 1164, 56995 - using 1164 >>> WARN: multiple IDs found for 'task_struct': 300, 61905 - using 300 >>> WARN: multiple IDs found for 'file': 540, 61943 - using 540 >>> WARN: multiple IDs found for 'vm_area_struct': 549, 61946 - using 549 >>> WARN: multiple IDs found for 'inode': 1132, 62029 - using 1132 >>> WARN: multiple IDs found for 'path': 1164, 62058 - using 1164 >>> WARN: multiple IDs found for 'cgroup': 1190, 62067 - using 1190 >>> WARN: multiple IDs found for 'seq_file': 953, 62253 - using 953 >>> WARN: multiple IDs found for 'sock': 7960, 62374 - using 7960 >>> WARN: multiple IDs found for 'sk_buff': 1876, 62485 - using 1876 >>> WARN: multiple IDs found for 'bpf_prog': 6094, 62542 - using 6094 >>> WARN: multiple IDs found for 'socket': 7993, 62545 - using 7993 >>> WARN: multiple IDs found for 'xdp_buff': 6191, 62836 - using 6191 >>> WARN: multiple IDs found for 'sock_common': 8164, 63152 - using 8164 >>> WARN: multiple IDs found for 'request_sock': 17296, 63204 - using 17296 >>> WARN: multiple IDs found for 'inet_request_sock': 36292, 63222 - using 36292 >>> WARN: multiple IDs found for 'inet_sock': 32700, 63225 - using 32700 >>> WARN: multiple IDs found for 'inet_connection_sock': 33944, 63240 - using 33944 >>> WARN: multiple IDs found for 'tcp_request_sock': 36299, 63260 - using 36299 >>> WARN: multiple IDs found for 'tcp_sock': 33969, 63264 - using 33969 >>> WARN: multiple IDs found for 'bpf_map': 6623, 63343 - using 6623 >>> >>> I'll need to check on that.. >>> >>> and I just actually saw the 'nf_conn' warning on linux-next/master with >>> latest fedora/gcc-12: >>> >>> BTF [M] net/netfilter/nf_nat.ko >>> WARN: multiple IDs found for 'nf_conn': 106518, 120156 - using 106518 >>> WARN: multiple IDs found for 'nf_conn': 106518, 121853 - using 106518 >>> WARN: multiple IDs found for 'nf_conn': 106518, 123126 - using 106518 >>> WARN: multiple IDs found for 'nf_conn': 106518, 124537 - using 106518 >>> WARN: multiple IDs found for 'nf_conn': 106518, 126442 - using 106518 >>> WARN: multiple IDs found for 'nf_conn': 106518, 128256 - using 106518 >>> LD [M] net/netfilter/nf_nat_tftp.ko >>> >>> looks like maybe dedup missed this struct for some reason >>> >>> nf_conn dump from module: >>> >>> [120155] PTR '(anon)' type_id=120156 >>> [120156] STRUCT 'nf_conn' size=320 vlen=14 >>> 'ct_general' type_id=105882 bits_offset=0 >>> 'lock' type_id=180 bits_offset=64 >>> 'timeout' type_id=113 bits_offset=640 >>> 'zone' type_id=106520 bits_offset=672 >>> 'tuplehash' type_id=106533 bits_offset=704 >>> 'status' type_id=1 bits_offset=1600 >>> 'ct_net' type_id=3215 bits_offset=1664 >>> 'nat_bysource' type_id=139 bits_offset=1728 >>> '__nfct_init_offset' type_id=949 bits_offset=1856 >>> 'master' type_id=120155 bits_offset=1856 >>> 'mark' type_id=106351 bits_offset=1920 >>> 'secmark' type_id=106351 bits_offset=1952 >>> 'ext' type_id=106536 bits_offset=1984 >>> 'proto' type_id=106532 bits_offset=2048 >>> >>> nf_conn dump from vmlinux: >>> >>> [106517] PTR '(anon)' type_id=106518 >>> [106518] STRUCT 'nf_conn' size=320 vlen=14 >>> 'ct_general' type_id=105882 bits_offset=0 >>> 'lock' type_id=180 bits_offset=64 >>> 'timeout' type_id=113 bits_offset=640 >>> 'zone' type_id=106520 bits_offset=672 >>> 'tuplehash' type_id=106533 bits_offset=704 >>> 'status' type_id=1 bits_offset=1600 >>> 'ct_net' type_id=3215 bits_offset=1664 >>> 'nat_bysource' type_id=139 bits_offset=1728 >>> '__nfct_init_offset' type_id=949 bits_offset=1856 >>> 'master' type_id=106517 bits_offset=1856 >>> 'mark' type_id=106351 bits_offset=1920 >>> 'secmark' type_id=106351 bits_offset=1952 >>> 'ext' type_id=106536 bits_offset=1984 >>> 'proto' type_id=106532 bits_offset=2048 >>> >>> look identical.. Andrii, any idea? >> >> I'm pretty sure they are not identical. There is somewhere a STRUCT vs >> FWD difference. We had a similar discussion recently with Alan >> Maguire. >> >>> 'master' type_id=120155 bits_offset=1856 >> >> vs >> >>> 'master' type_id=106517 bits_offset=1856 > > master is pointer to same 'nf_conn' object, and rest of the ids are same > > jirka > I tried digging into this problem a bit - in my case I was seeing "struct sk_buff" duplicated in kernel/module BTF. Here's what I found.. Consider a situation like this, where one header file defining a struct s1 has a pointer field, pointing at struct s2. But struct s2 is a fwd definition. $ cat s1.h #include <stdio.h> struct s2; struct s1 { struct s1 *f1; struct s2 *f2; }; $ cat s1.c #include "s1.h" int main(int argc, char *argv[]) { struct s1 s1; return 0; } Now consider a separate program s2, that #includes definitions for both s1 and s2: $ cat s2.h #include <stdio.h> struct s1; struct s2 { struct s1 *f1; }; $cat s2.c #include "s2.h" #include "s1.h" int main(int argc, char *argv[]) { struct s1 s1 = {}; struct s2 s2 = {}; return 0; } In this case the generated base BTF contains a definition for s1, and a FWD for s2, but the "module" BTF for s2 contains a full definition for s2, so the dedup fails: $ bpftool btf dump file s1 [29] STRUCT 's1' size=16 vlen=2 'f1' type_id=30 bits_offset=0 'f2' type_id=32 bits_offset=64 [30] PTR '(anon)' type_id=29 [31] FWD 's2' fwd_kind=struct $ bpftool btf dump -B s1 file s2 [36] STRUCT 's2' size=8 vlen=1 'f1' type_id=38 bits_offset=0 [37] STRUCT 's1' size=16 vlen=2 'f1' type_id=38 bits_offset=0 'f2' type_id=39 bits_offset=64 [38] PTR '(anon)' type_id=37 [39] PTR '(anon)' type_id=36 So we had to redefine struct s1 in the "module" because the FWD wasn't resolved in the base BTF. This is by design as I understand it; in effect we can't supplement base BTF with info we've gotten from module BTF about forward resolution (at least that's my understanding of the reason). Now does this sort of thing happen in the kernel? It looks like it; consider struct nf_conn; it contains a possible_net_t: typedef struct { struct net * net; /* 0 8 */ /* size: 8, cachelines: 1, members: 1 */ /* last cacheline: 8 bytes */ } possible_net_t; ...and a struct net * contains pointers to structures that aren't in the vmlinux BTF (because they are in modules); for example: struct netns_ipvs * ipvs; /* 3912 8 */ $ pahole netns_ipvs pahole: type 'netns_ipvs' not found ...and in vmlinux BTF it is: [2983] FWD 'netns_ipvs' fwd_kind=struct [2984] PTR '(anon)' type_id=2983 ...and in struct net we can see the fwd type is referenced alright: [2021] STRUCT 'net' size=4288 vlen=52 ... 'ipvs' type_id=2984 bits_offset=31808 So we'd expect any ipvs-related modules to not dedup struct net, since they'll have the full definition for netns_ipvs. In xt_ipvs.ko we see: [111924] STRUCT 'netns_ipvs' size=2176 vlen=78 'gen' type_id=21 bits_offset=0 'enable' type_id=21 bits_offset=32 'rs_table' type_id=4044 bits_offset=64 'app_list' type_id=83 bits_offset=1088 ...and when we look at 'struct net' we see: [111786] STRUCT 'net' size=4288 vlen=52 ... 'ipvs' type_id=111925 bits_offset=31808 And then if we don't dedup struct net, it seems likely that structures referencing struct net (like skbs, nf_conn etc) won't dedup either since they'll point at "their" version of struct net. Not sure if that's the root cause here, but it seems like it is happening in other modules at least. More subtle effects are also possible I think; if a type is in a header file is defined but not referenced anywhere (as might well happen for a module-related type in vmlinux), it might not always make it into the DWARF description, and as a result of that might not have a BTF representation. Alan