Em Thu, Oct 28, 2021 at 10:31:33AM +0100, Douglas Raillard escreveu: > On 10/27/21 9:47 PM, Arnaldo Carvalho de Melo wrote: > > So, what is now in the 'next' branch (an alias for the tmp.master branch > > that is tested in the libbpf CI) produces: > > --- /tmp/btfdiff.dwarf.8098nf 2021-10-27 17:29:02.788601053 -0300 > > +++ /tmp/btfdiff.btf.eTagYI 2021-10-27 17:29:02.994606515 -0300 > > @@ -107,7 +107,7 @@ struct Qdisc { > > /* XXX 24 bytes hole, try to pack */ > > /* --- cacheline 2 boundary (128 bytes) --- */ > > - struct sk_buff_head gso_skb __attribute__((__aligned__(64))); /* 128 24 */ > > + struct sk_buff_head gso_skb __attribute__((__aligned__(32))); /* 128 24 */ > > struct qdisc_skb_head q; /* 152 24 */ > > struct gnet_stats_basic_packed bstats; /* 176 16 */ > > /* --- cacheline 3 boundary (192 bytes) --- */ > > The one above is a bit difficult to solve, perhaps we can use an > > heuristic for kernel files and assume this is dependend on the > > typical cacheline sizes? As it in the kernel sources is: > > struct sk_buff_head gso_skb ____cacheline_aligned_in_smp; > > I.e. if it is a multiple of both 64, then we use it instead of 32? > > @@ -117,18 +117,18 @@ struct Qdisc { > > struct Qdisc * next_sched; /* 224 8 */ > > struct sk_buff_head skb_bad_txq; /* 232 24 */ > > /* --- cacheline 4 boundary (256 bytes) --- */ > > - spinlock_t busylock __attribute__((__aligned__(64))); /* 256 4 */ > > + spinlock_t busylock; /* 256 4 */ > > Originally: > > spinlock_t busylock ____cacheline_aligned_in_smp; > Sounds like a good idea, I can't think of another reason for large alignments anyway. > Larger power of 2 alignments are harmless anyway so we might as well try to guess. > I'll prepare a v3 with that update. Cool > > But since it is already naturally aligned (232 + 24 = 256), we can't > > infer it from what BTF carries. > > spinlock_t seqlock; /* 260 4 */ > > - struct callback_head rcu __attribute__((__aligned__(8))); /* 264 16 */ > > + struct callback_head rcu; /* 264 16 */ > > Ditto > > /* XXX 40 bytes hole, try to pack */ > > /* --- cacheline 5 boundary (320 bytes) --- */ > > - long int privdata[] __attribute__((__aligned__(64))); /* 320 0 */ > > + long int privdata[]; /* 320 0 */ > > But this one should've been inferred, right? > Assuming you talk about offset 320 aligned on 64, we cannot infer a specific alignment since > 5 * 64 = 320. I mean this line: /* XXX 40 bytes hole, try to pack */ I.e. without an explicit __attribute__((__aligned__(64))) it wouldn't be at 320, but at 280: $ cat no_forced_alignment.c struct foo { char rcu[264 + 16]; long int privdata[]; } bar; $ gcc -g -c no_forced_alignment.c $ pahole no_forced_alignment.o struct foo { char rcu[280]; /* 0 280 */ /* --- cacheline 4 boundary (256 bytes) was 24 bytes ago --- */ long int privdata[]; /* 280 0 */ /* size: 280, cachelines: 5, members: 2 */ /* last cacheline: 24 bytes */ }; $ Its only when we explicitely add the __attribute__((__aligned__(64))) that we get that hole: $ cat forced_alignment.c struct foo { char rcu[264 + 16]; long int privdata[] __attribute__((__aligned__(64))); } bar; $ gcc -g -c forced_alignment.c $ pahole forced_alignment.o struct foo { char rcu[280]; /* 0 280 */ /* XXX 40 bytes hole, try to pack */ /* --- cacheline 5 boundary (320 bytes) --- */ long int privdata[] __attribute__((__aligned__(64))); /* 320 0 */ /* size: 320, cachelines: 5, members: 2 */ /* sum members: 280, holes: 1, sum holes: 40 */ /* forced alignments: 1, forced holes: 1, sum forced holes: 40 */ } __attribute__((__aligned__(64))); $ I.e. you have to look at the hole right after the previous class member to notice that yeah, there is a forced alignment. > If you mean the number of forced alignments, I'm not surprised that BTF finds less forced alignments, > as each some of the technically forced alignments (in the original source) also happens to be already > satisfied by the normal layout, so we don't infer anything special. I meant the hole before privdata :-) Yeah, there are some alignments that won't be detectable purely from looking at BTF. What I'm wanting is to reduce btfdiff output, and in this case, fix a bug, as the BTF generated from this struct would be incorrectly aligned, when someone would use that privdata it would get it at offset 280, not at 320, as intended. - Arnaldo