Re: [QUESTION] How to fix the race of "mdadm --add" and "mdadm mdadm --incremental --export"

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, 2023-03-14 at 16:59 +0100, Mariusz Tkaczyk wrote:
> On Tue, 14 Mar 2023 16:04:23 +0100
> Martin Wilck <mwilck@xxxxxxxx> wrote:
> 
> > On Tue, 2023-03-14 at 22:58 +0800, Li Xiao Keng wrote:
> > > Hi,
> > >    Here we meet a question. When we add a new disk to a raid, it
> > > may
> > > return
> > > -EBUSY.
> > >    The main process of --add(for example md0, sdf):
> > >        1.dev_open(sdf)
> > >        2.add_to_super
> > >        3.write_init_super
> > >        4.fsync(fd)
> > >        5.close(fd)
> > >        6.ioctl(ADD_NEW_DISK).
> > >    However, there will be some udev(change of sdf) event after
> > > step5.
> > > Then
> > > "/usr/sbin/mdadm --incremental --export $devnode --offroot
> > > $env{DEVLINKS}"
> > > will be run, and the sdf will be added to md0. After that, step6
> > > will
> > > return
> > > -EBUSY.
> > >    It is a problem to user. First time adding disk does not
> > > return
> > > success
> > > but disk is actually added. And I have no good idea to deal with
> > > it.
> > > Please
> > > give some great advice.  
> > 
> > I haven't looked at the code in detail, but off the top of my head,
> > it
> > should help to execute step 5 after step 6. The close() in step 5
> > triggers the uevent via inotify; doing it after the ioctl should
> > avoid
> > the above problem.
> Hi,
> That will result in EBUSY in everytime. mdadm will always handle
> descriptor and kernel will refuse to add the drive.

Why would it cause EBUSY? Please elaborate. My suggestion would avoid
the race described by Li Xiao Keng. I only suggested to postpone the
close(), nothing else. The fsync() would still be done before the
ioctl, so the metadata should be safely on disk when the ioctl is run.

This is a recurring pattern. Tools that manipulate block devices must
be aware that close-after-write triggers an uevent, and should make
sure that they don't close() such files prematurely.

> > 
> > Another obvious workaround in mdadm would be to check the state of
> > the
> > array in the EBUSY case and find out that the disk had already been
> > added.
> > 
> > But again, this was just a high-level guess.
> > 
> > Martin
> > 
> 
> Hmm... I'm not a native expert but why we cannot write metadata after
> adding
> drive to array? Why kernel can't handle that?
> 
> Ideally, we should lock device and block udev- I know that there is
> flock
> based API to do that but I'm not sure if flock() won't cause the same
> problem.

That doesn't work reliably. At least not in general. The mechanmism is
disabled for for dm devices (e.g. multipath), for example. See
https://github.com/systemd/systemd/blob/a5c0ad9a9a2964079a19a1db42f79570a3582bee/src/udev/udevd.c#L483


> There is also something like "udev-md-raid-creating.rules". Maybe we
> can reuse
> it?
> 

Unless I am mistaken, these rules are exactly those that cause the
issue we are discussing. Since these rules are also part of the mdadm
package, it might be possible to set some flag under /run that would
indicate to the rules that auto-assembly should be skipped. But that
might be racy, too.

Martin

> Thanks,
> Mariusz





[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux