Re: [PATCH V3 md-6.12 1/1] md: add new_level sysfs interface

Xiao Ni <xni@xxxxxxxxxx> · Fri, 6 Sep 2024 10:07:26 +0800

On Thu, Sep 5, 2024 at 1:55 PM Song Liu <song@xxxxxxxxxx> wrote:
>
> On Wed, Sep 4, 2024 at 4:55 PM Xiao Ni <xni@xxxxxxxxxx> wrote:
> >
> > Reshape needs to specify a backup file when it can't update data offset
> > of member disks. For this situation, first, it starts reshape and then
> > it kicks off mdadm-grow-continue service which does backup job and
> > monitors the reshape process. The service is a new process, so it needs
> > to read superblock from member disks to get information.
> >
> > But in the first step, it doesn't update new level in superblock. So
> > in the second step, the new level that systemd service reads from
> > superblock is wrong. It can't change to the right new level after reshape
> > finishes. This interface is used to update new level during reshape
> > progress.
> >
> > Reproduce steps:
> > mdadm -CR /dev/md0 -l6 -n4 /dev/loop[0-3]
> > mdadm --wait /dev/md0
> > mdadm /dev/md0 --grow -l5 --backup=backup
> > cat /proc/mdstat
> > Personalities : [raid6] [raid5] [raid4] [raid0] [raid1] [raid10]
> > md0 : active raid6 loop3[3] loop2[2] loop1[1] loop0[0]
> >
> > Test case 07changelevels from mdadm regression tests can trigger this
> > problem.
> >
> > Signed-off-by: Xiao Ni <xni@xxxxxxxxxx>
> > ---
> > V3: explain more about the root cause
> > V2: add detail about test information
> >  drivers/md/md.c | 29 +++++++++++++++++++++++++++++
> >  1 file changed, 29 insertions(+)
> >
> > diff --git a/drivers/md/md.c b/drivers/md/md.c
> > index d3a837506a36..3c354e7a7825 100644
> > --- a/drivers/md/md.c
> > +++ b/drivers/md/md.c
> > @@ -4141,6 +4141,34 @@ level_store(struct mddev *mddev, const char *buf, size_t len)
> >  static struct md_sysfs_entry md_level =
> >  __ATTR(level, S_IRUGO|S_IWUSR, level_show, level_store);
> >
> > +static ssize_t
> > +new_level_show(struct mddev *mddev, char *page)
> > +{
> > +       return sprintf(page, "%d\n", mddev->new_level);
> > +}
> > +
> > +static ssize_t
> > +new_level_store(struct mddev *mddev, const char *buf, size_t len)
> > +{
> > +       unsigned int n;
> > +       int err;
> > +
> > +       err = kstrtouint(buf, 10, &n);
> > +       if (err < 0)
> > +               return err;
> > +       err = mddev_lock(mddev);
> > +       if (err)
> > +               return err;
> > +
> > +       mddev->new_level = n;
> > +       md_update_sb(mddev, 1);
> > +
> > +       mddev_unlock(mddev);
> > +       return len;
> > +}
> > +static struct md_sysfs_entry md_new_level =
> > +__ATTR(new_level, 0664, new_level_show, new_level_store);
>
> Actually, since we can read and write "new_level", please describe how
> it will be used (read and write).
>
> Thanks,
> Song
>

Hi Song

I rewrite the comments like this:

Now reshape supports two ways: with backup file or without backup file.
For the situation without backup file, it needs to change data offset.
It doesn't need systemd service mdadm-grow-continue. So it can finish
the reshape job in one process environment. It can know the new level
from mdadm --grow command and can change to new level after reshape
finishes.

For the situation with backup file, it needs systemd service
mdadm-grow-continue to monitor reshape progress. So there are two process
environments. One is mdadm --grow command whick kicks off reshape. It
doesn't wait reshape to finish and wake up mdadm-grow-continue service.
Two is the service. But it doesn't know the new level. Because in the
first step, it doesn't sync the information to kernel space. So the new
level which reads from superblock is wrong.

In kernel space mddev->new_level is used to record the new level when
doing reshape. This patch tries to add a new interface to help mdadm
can update new level and sync it to metadata. So it can read the right
new level when mdadm-grow-continue starts. And it can change to the new
level after reshape finishes.

Reproduce steps:
mdadm -CR /dev/md0 -l6 -n4 /dev/loop[0-3]
mdadm --wait /dev/md0
mdadm /dev/md0 --grow -l5 --backup=backup
cat /proc/mdstat
Personalities : [raid6] [raid5] [raid4] [raid0] [raid1] [raid10]
md0 : active raid6 loop3[3] loop2[2] loop1[1] loop0[0]

Test case 07changelevels from mdadm regression tests can trigger this
problem.

If you want me to re-send a formal patch, I'll do it.

Best Regards
Xiao