Re: [PATCH v2] md: do not _put wrong device in md_seq_next

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Sep 13, 2023 at 1:26 AM Mariusz Tkaczyk
<mariusz.tkaczyk@xxxxxxxxxxxxxxx> wrote:
>
> On Tue, 12 Sep 2023 15:49:12 -0700
> Song Liu <song@xxxxxxxxxx> wrote:
>
> > On Tue, Sep 12, 2023 at 6:02 AM Mariusz Tkaczyk
> > <mariusz.tkaczyk@xxxxxxxxxxxxxxx> wrote:
> > >
> > > During working on changes proposed by Kuai [1], I determined that
> > > mddev->active is continusly decremented for array marked by MD_CLOSING.
> > > It brought me to md_seq_next() changed by [2]. I determined the regression
> > > here, if mddev_get() fails we updated mddev pointer and as a result we
> > > _put failed device.
> > >
> > > I isolated the change in md_seq_next() and tested it- issue is no longer
> > > reproducible but I don't see the root cause in this scenario. The bug
> > > is obvious so I proceed with fixing. I will submit MD_CLOSING patches
> > > separatelly.
> > >
> > > Put the device which has been _get with previous md_seq_next() call.
> > > Add guard for inproper mddev_put usage(). It shouldn't be called if
> > > there are less than 1 for mddev->active.
> > >
> > > I didn't convert atomic_t to refcount_t because refcount warns when 0 is
> > > achieved which is likely to happen for mddev->active.
> > >
> > > [1]
> > > https://lore.kernel.org/linux-raid/028a21df-4397-80aa-c2a5-7c754560f595@xxxxxxxxx/T/#m6a534677d9654a4236623661c10646d45419ee1b
> > > [2] https://bugzilla.kernel.org/show_bug.cgi?id=217798
> > >
> > > Fixes: 12a6caf27324 ("md: only delete entries from all_mddevs when the disk
> > > is freed") Cc: Yu Kuai <yukuai3@xxxxxxxxxx>
> > > Cc: AceLan Kao <acelan@xxxxxxxxx>
> > > Signed-off-by: Mariusz Tkaczyk <mariusz.tkaczyk@xxxxxxxxxxxxxxx>
> > > ---
> > >  drivers/md/md.c | 20 +++++++++++---------
> > >  1 file changed, 11 insertions(+), 9 deletions(-)
> > >
> > > diff --git a/drivers/md/md.c b/drivers/md/md.c
> > > index 0fe7ab6e8ab9..bb654ff62765 100644
> > > --- a/drivers/md/md.c
> > > +++ b/drivers/md/md.c
> > > @@ -618,6 +618,12 @@ static void mddev_delayed_delete(struct work_struct
> > > *ws);
> > >
> > >  void mddev_put(struct mddev *mddev)
> > >  {
> > > +       /* Guard for ambiguous put. */
> > > +       if (unlikely(atomic_read(&mddev->active) < 1)) {
> > > +               pr_warn("%s: active refcount is lower than 1\n",
> > > mdname(mddev));
> > > +               return;
> > > +       }
> > > +
> >
> > Could you please explain why we need this guard? We should probably fix
> > the caller that causes this.
>
> We have an issue, so I added warning to catch similar mistakes. Please
> note that for refcount_t such warning is implemented.
>
> If you don't see it useful I can remove it.

Let's add it (or use refcount_t) in a separate patch.

> >
> > >         if (!atomic_dec_and_lock(&mddev->active, &all_mddevs_lock))
> > >                 return;
> > >         if (!mddev->raid_disks && list_empty(&mddev->disks) &&
> > > @@ -8227,19 +8233,16 @@ static void *md_seq_next(struct seq_file *seq, void
> > > *v, loff_t *pos) {
> > >         struct list_head *tmp;
> > >         struct mddev *next_mddev, *mddev = v;
> > > -       struct mddev *to_put = NULL;
> >
> > IIUC, all we need is the following:
> >
> > diff --git i/drivers/md/md.c w/drivers/md/md.c
> > index 73758b754127..a104a025084d 100644
> > --- i/drivers/md/md.c
> > +++ w/drivers/md/md.c
> > @@ -8265,7 +8265,7 @@ static void *md_seq_next(struct seq_file *seq,
> > void *v, loff_t *pos)
> >         spin_unlock(&all_mddevs_lock);
> >
> >         if (to_put)
> > -               mddev_put(mddev);
> > +               mddev_put(to_put);
> >         return next_mddev;
> >
> >  }
> >
> > Is this sufficient? If so, how about we ship this first and refactor
> > the code later
> > in a separate patch?
>
> That is correct it should be sufficient. If you don't want it in one patch I
> will drop refactor because Kuai re-implemented it already.
>
> Anyway, changes I made are simple and tested, I don't see it risk in it.
> Your proposal will make merge conflicts less likely to appear. I that matters I
> can simplify the patch.

Let's keep the fix as clean as possible. Please test the one line fix and
resubmit the patch.

Thanks,
Song




[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux