Re: [PATCH pahole 4/4] reorganize: shift tail members for non-bitfields only

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On April 3, 2019 8:16:03 PM GMT-03:00, Andrii Nakryiko <andrii.nakryiko@xxxxxxxxx> wrote:
>On Wed, Apr 3, 2019 at 2:50 PM Andrii Nakryiko
><andrii.nakryiko@xxxxxxxxx> wrote:
>>
>> On Wed, Apr 3, 2019 at 2:36 PM Arnaldo Carvalho de Melo
>> <arnaldo.melo@xxxxxxxxx> wrote:
>> >
>> > Em Wed, Apr 03, 2019 at 06:06:18PM -0300, Arnaldo Carvalho de Melo
>escreveu:
>> > > Em Wed, Apr 03, 2019 at 12:52:01PM -0700, Andrii Nakryiko
>escreveu:
>> > > > On Wed, Apr 3, 2019 at 6:45 AM Arnaldo Carvalho de Melo
><arnaldo.melo@xxxxxxxxx> wrote:
>> > > > > Em Wed, Apr 03, 2019 at 12:01:48AM -0700,
>andrii.nakryiko@xxxxxxxxx escreveu:
>> > > > > > +++ b/dwarves_reorganize.c
>> > > > > > @@ -321,7 +321,7 @@ static void class__move_member(struct
>class *class, struct class_member *dest,
>> > > > > >               }
>> > > > > >       } else if (from_was_last) {
>> > > > > >               class->type.size -= from_size +
>class->padding;
>> > > > > > -     } else {
>> > > > > > +     } else if (from->bitfield_size == 0) {
>> > > > > >               /*
>> > > > > >                * See if we are adding a new hole that is
>bigger than
>> > > > > >                * sizeof(long), this may have problems with
>explicit alignment
>> >
>> > > > > After applying this patch I still get a BFA with a vmlinux
>for x86_64:
>> >
>> > > > Yeah, I've been using older vmlinux and its net_device didn't
>hit this
>> > > > case. I've recompiled latest vmlinux an now see BFAs as well.
>> >
>> > > > This happens because at some step there is a bitfield that is
>moved
>> > > > from the end of a struct into somewhere in the middle, causing
>some
>> > > > padding adjustments to be done using from_size (which is
>incorrect for
>> > > > bitfield and shouldn't be used for adjustments). Slightly
>better way
>> > > > to prevent that would be to add if (from->bitfield_size == 0)
>{}
>> > > > around this whole if/else if/else statement. But even with
>that, there
>> > > > are other cases where bitfields are mishandled. Ultimately,
>> > > > correctness and BFAs depend on all the byte offsets and bit
>offsets
>> > > > adjustments this algorithm does, and in some cases it doesn't
>handle
>> > > > bitfields correctly.
>> >
>> > > > As you said, it will take some time to fix/rewrite this whole
>> > > > algorithm. You might as well discard this patch, because it
>helps only
>> > > > in some cases (and causes some of the padding to not be
>adjusted).
>> >
>> > > Perhaps we can just disable moving bitfields around for now, then
>> > > revisit this, so that we leave, again, for now, just the moves
>that do
>> > > not produce inconsistencies. I'll look into that.
>> >
>> > So, with the patch below we get to:
>> >
>> > [acme@quaco pahole]$ pahole --show_reorg_steps -F dwarf -C
>net_device --reorganize vmlinux | egrep '^/|BRAIN'
>> > /* Moving 'min_header_len' from after 'hard_header_len' to after
>'irq' */
>> > /* Moving 'hard_header_len' from after 'type' to after
>'min_header_len' */
>> > /* Moving 'uc_promisc' from after 'name_assign_type' to after
>'min_header_len' */
>> > /* Moving 'dev_port' from after 'dev_id' to after 'type' */
>> > /* Moving 'dev_id' from after 'neigh_priv_len' to after 'dev_port'
>*/
>> > /* Moving 'name_assign_type' from after 'addr_list_lock' to after
>'neigh_priv_len' */
>> > /* Moving 'watchdog_timeo' from after 'tx_global_lock' to after
>'addr_list_lock' */
>> > /* Moving 'needs_free_netdev' from after 'rtnl_link_state' to after
>'name_assign_type' */
>> > /* Moving 'gso_max_segs' from after 'gso_max_size' to after
>'rtnl_link_state' */
>> > /* Moving 'gso_max_segs' from after 'rtnl_link_state' to after
>'needs_free_netdev' */
>> > /* Moving 'gso_max_size' from after 'rtnl_link_ops' to after
>'rtnl_link_state' */
>> > /* Moving 'prio_tc_map' from after 'tc_to_txq' to after
>'index_hlist' */
>> > /* Moving 'proto_down' from after 'qdisc_running_key' to after
>'gso_max_segs' */
>> > /* Moving 'dismantle' from after 'reg_state' to after 'proto_down'
>*/
>> > /* Final reorganized struct: */
>> > [acme@quaco pahole]$ pahole --show_reorg_steps -F btf -C net_device
>--reorganize vmlinux | egrep '^/|BRAIN'
>> > /* Moving 'min_header_len' from after 'hard_header_len' to after
>'irq' */
>> > /* Moving 'hard_header_len' from after 'type' to after
>'min_header_len' */
>> > /* Moving 'uc_promisc' from after 'name_assign_type' to after
>'min_header_len' */
>> > /* Moving 'dev_port' from after 'dev_id' to after 'type' */
>> > /* Moving 'dev_id' from after 'neigh_priv_len' to after 'dev_port'
>*/
>> > /* Moving 'name_assign_type' from after 'addr_list_lock' to after
>'neigh_priv_len' */
>> > /* Moving 'watchdog_timeo' from after 'tx_global_lock' to after
>'addr_list_lock' */
>> > /* Moving 'needs_free_netdev' from after 'rtnl_link_state' to after
>'name_assign_type' */
>> > /* Moving 'gso_max_segs' from after 'gso_max_size' to after
>'rtnl_link_state' */
>> > /* Moving 'gso_max_segs' from after 'rtnl_link_state' to after
>'needs_free_netdev' */
>> > /* Moving 'gso_max_size' from after 'rtnl_link_ops' to after
>'rtnl_link_state' */
>> > /* Moving 'prio_tc_map' from after 'tc_to_txq' to after
>'index_hlist' */
>> > /* Moving 'proto_down' from after 'qdisc_running_key' to after
>'gso_max_segs' */
>> > /* Moving 'dismantle' from after 'reg_state' to after 'proto_down'
>*/
>> > /* Final reorganized struct: */
>> > [acme@quaco pahole]$ pahole --show_reorg_steps -F btf -C net_device
>--reorganize vmlinux | tail
>> >         unsigned int               wol_enabled:1;        /*  2216:
>8  4 */
>> >
>> >         /* size: 2224, cachelines: 35, members: 135 */
>> >         /* sum members: 2194, holes: 2, sum holes: 18 */
>> >         /* sum bitfield members: 25 bits, bit holes: 2, sum bit
>holes: 16 bits */
>> >         /* padding: 4 */
>> >         /* paddings: 5, sum paddings: 23 */
>> >         /* bit_padding: 23 bits */
>> >         /* last cacheline: 48 bytes */
>> > };   /* saved 80 bytes and 1 cacheline! */
>> > [acme@quaco pahole]$
>> >
>> > [acme@quaco pahole]$ pahole -F btf --reorganize -C task_struct
>vmlinux  | grep BRAIN
>> > [acme@quaco pahole]$ pahole -F btf --reorganize -C task_struct
>vmlinux  | tail
>> >
>> >         /* --- cacheline 103 boundary (6592 bytes) was 16 bytes ago
>--- */
>> >         struct thread_struct       thread;               /*  6608 
>4352 */
>> >
>> >         /* size: 10960, cachelines: 172, members: 207 */
>> >         /* sum members: 10898, holes: 3, sum holes: 54 */
>> >         /* sum bitfield members: 10 bits, bit holes: 2, sum bit
>holes: 54 bits */
>> >         /* paddings: 3, sum paddings: 14 */
>> >         /* last cacheline: 16 bytes */
>> > };   /* saved 48 bytes! */
>> > [acme@quaco pahole]$ pahole -F btf --reorganize -C super_block
>vmlinux  | grep BRAIN
>> > [acme@quaco pahole]$ pahole -F btf --reorganize -C super_block
>vmlinux  | tail
>> >         /* --- cacheline 21 boundary (1344 bytes) was 24 bytes ago
>--- */
>> >         struct list_head           s_inodes_wb;          /*  1368  
> 16 */
>> >         struct list_head           s_inodes;             /*  1384  
> 16 */
>> >         spinlock_t                 s_inode_wblist_lock;  /*  1400  
>  4 */
>> >         spinlock_t                 s_inode_list_lock;    /*  1404  
>  4 */
>> >
>> >         /* size: 1408, cachelines: 22, members: 58 */
>> >         /* sum members: 1405, holes: 1, sum holes: 3 */
>> >         /* paddings: 2, sum paddings: 8 */
>> > };   /* saved 128 bytes and 2 cachelines! */
>> > [acme@quaco pahole]$
>> >
>> > [acme@quaco pahole]$ pahole -F btf --packable vmlinux  | cut -f 1 |
>while read struct ; do pahole -F btf -C $struct --reorganize vmlinux |
>grep BRAIN ; done
>> >         /* BRAIN FART ALERT! 200 bytes != 176 (member bytes) + 0
>(member bits) + 80 (byte holes) + 0 (bit holes), diff = -448 bits */
>> >         /* BRAIN FART ALERT! 760 bytes != 756 (member bytes) + 0
>(member bits) + 4 (byte holes) + 0 (bit holes), diff = -32 bits */
>> >         /* BRAIN FART ALERT! 72 bytes != 28 (member bytes) + 0
>(member bits) + 100 (byte holes) + 0 (bit holes), diff = -448 bits */
>> >         /* BRAIN FART ALERT! 40 bytes != 36 (member bytes) + 0
>(member bits) + 44 (byte holes) + 0 (bit holes), diff = -352 bits */
>> >         /* BRAIN FART ALERT! 184 bytes != 182 (member bytes) + 0
>(member bits) + 6 (byte holes) + 0 (bit holes), diff = -32 bits */
>> > [acme@quaco pahole]$ pahole -F btf --packable vmlinux  | wc -l
>> > 503
>> > [acme@quaco pahole]$
>> >
>> > 1% getting closer...
>> >
>> > [acme@quaco pahole]$ pahole -F btf --reorganize -C netns_frags
>vmlinux
>> > struct netns_frags {
>> >         long int                   high_thresh;          /*     0  
>  8 */
>> >         long int                   low_thresh;           /*     8  
>  8 */
>> >         int                        timeout;              /*    16  
>  4 */
>> >         int                        max_dist;             /*    20  
>  4 */
>> >         struct inet_frags *        f;                    /*    24  
>  8 */
>> >         atomic_long_t              mem;                  /*    32  
>  8 */
>> >
>> >         /* XXX 24 bytes hole, try to pack */
>> >
>> >         /* --- cacheline 1 boundary (64 bytes) --- */
>> >         struct rhashtable          rhashtable;           /*    64  
>136 */
>> >
>> >         /* XXX 56 bytes hole, try to pack */
>>
>> This one is strange, there should be no hole. Will take a look.
>
>Ok, so this one is due to calling class__find_holes() multiple times
>on the same struct. Last member's bit_hole/hole are not zeroed out. I
>have a fix, which I'll add to my last patchset. I'll also drop last
>patch (partially disabling padding recalc


I dropped it already

 for bitfield). Will send
>soon.

Thanks, 


>
>>
>> >
>> >         /* size: 200, cachelines: 4, members: 7 */
>> >         /* sum members: 176, holes: 1, sum holes: 80 */
>> >         /* last cacheline: 8 bytes */
>> >
>> >         /* BRAIN FART ALERT! 200 bytes != 176 (member bytes) + 0
>(member bits) + 80 (byte holes) + 0 (bit holes), diff = -448 bits */
>> > };   /* saved 120 bytes and 1 cacheline! */
>> > [acme@quaco pahole]$
>> >
>> > So I'll apply this one and aim for tagging v1.13. Then we get back
>and
>> > get the bitfield reorg nailed.
>>
>> Sounds good, thanks!
>>
>> >
>> > - Arnaldo
>> >
>> > diff --git a/dwarves_reorganize.c b/dwarves_reorganize.c
>> > index f1fc2158d03f..3e4cd9c66b82 100644
>> > --- a/dwarves_reorganize.c
>> > +++ b/dwarves_reorganize.c
>> > @@ -226,12 +226,20 @@ static struct class_member *
>> >         return NULL;
>> >  }
>> >
>> > -static void class__move_member(struct class *class, struct
>class_member *dest,
>> > -                              struct class_member *from, const
>struct cu *cu,
>> > -                              int from_padding, const int verbose,
>FILE *fp)
>> > +static bool class__move_member(struct class *class, struct
>class_member *dest,
>> > +                             struct class_member *from, const
>struct cu *cu,
>> > +                             int from_padding, const int verbose,
>FILE *fp)
>> >  {
>> >         const size_t from_size = from->byte_size;
>> >         const size_t dest_size = dest->byte_size;
>> > +
>> > +#ifndef BITFIELD_REORG_ALGORITHMS_ENABLED
>> > +       /*
>> > +        * For now refuse to move a bitfield, we need to first
>fixup some BRAIN FARTs
>> > +        */
>> > +       if (from->bitfield_size != 0)
>> > +               return false;
>> > +#endif
>> >         const bool from_was_last = from->tag.node.next ==
>class__tags(class);
>> >         struct class_member *tail_from = from;
>> >         struct class_member *from_prev =
>list_entry(from->tag.node.prev,
>> > @@ -347,6 +355,8 @@ static void class__move_member(struct class
>*class, struct class_member *dest,
>> >                 class__fprintf(class, cu, fp);
>> >                 fputc('\n', fp);
>> >         }
>> > +
>> > +       return true;
>> >  }
>> >
>> >  static void class__move_bit_member(struct class *class, const
>struct cu *cu,
>> > @@ -743,11 +753,11 @@ void class__reorganize(struct class *class,
>const struct cu *cu,
>> >         size_t alignment_size;
>> >
>> >         class__find_holes(class);
>> > +#ifdef BITFIELD_REORG_ALGORITHMS_ENABLED
>> >         class__fixup_member_types(class, cu, verbose, fp);
>> > -
>> >         while (class__demote_bitfields(class, cu, verbose, fp))
>> >                 class__reorganize_bitfields(class, cu, verbose,
>fp);
>> > -
>> > +#endif
>> >         /* Now try to combine holes */
>> >  restart:
>> >         alignment_size = 0;
>> > @@ -807,10 +817,8 @@ restart:
>> >                                  * kernel.
>> >                                  */
>> >                                 if (brother_prev != member) {
>> > -                                       class__move_member(class,
>member,
>> > -                                                          brother,
>cu, 0,
>> > -                                                          verbose,
>fp);
>> > -                                       goto restart;
>> > +                                       if
>(class__move_member(class, member, brother, cu, 0, verbose, fp))
>> > +                                               goto restart;
>> >                                 }
>> >                         }
>> >                         /*
>> > @@ -824,9 +832,8 @@ restart:
>> >                             member != last_member &&
>> >                             last_member->byte_size != 0 &&
>> >                             last_member->byte_size <= member->hole)
>{
>> > -                               class__move_member(class, member,
>last_member,
>> > -                                                  cu, 1, verbose,
>fp);
>> > -                               goto restart;
>> > +                               if (class__move_member(class,
>member, last_member, cu, 1, verbose, fp))
>> > +                                       goto restart;
>> >                         }
>> >                 }
>> >         }
>> > @@ -852,10 +859,8 @@ restart:
>> >                                  * kernel.
>> >                                  */
>> >                                 if (brother_prev != member) {
>> > -                                       class__move_member(class,
>member,
>> > -                                                          brother,
>cu, 0,
>> > -                                                          verbose,
>fp);
>> > -                                       goto restart;
>> > +                                       if
>(class__move_member(class, member, brother, cu, 0, verbose, fp))
>> > +                                               goto restart;
>> >                                 }
>> >                         }
>> >                 }
>>
>> Looks good.
>>
>> Acked-by: Andrii Nakryiko <andriin@xxxxxx>





[Index of Archives]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux