Re: [PATCH v2] Fix race of "mdadm --add" and "mdadm --incremental"

Martin Wilck <mwilck@xxxxxxxx> · Thu, 07 Sep 2023 10:33:21 +0200

On Thu, 2023-09-07 at 11:02 +0800, Li Xiao Keng wrote:
> 
> 
> On 2023/9/6 21:31, Martin Wilck wrote:
> > On Wed, 2023-09-06 at 16:51 +0800, Li Xiao Keng wrote:
> > > 
> > > 
> > > On 2023/9/6 3:08, Martin Wilck wrote:
> > > > On Wed, 2023-09-06 at 00:17 +0800, Coly Li wrote:
> > > > > Hi Xiao Keng,
> > > > > 
> > > > > Thanks for the updated version, I add my comments inline.
> > > > > 
> > > > > On Tue, Sep 05, 2023 at 08:02:06PM +0800, Li Xiao Keng wrote:
> > > > > > When we add a new disk to a raid, it may return -EBUSY.
> > > > > 
> > > > > Where is above -EBUSY from? do you mean mdadm command returns
> > > > > -EBUSY or it is returned by some specific function in mdadm
> > > > > source code.
> > > > > 
> > > 
> > > Because the new disk is added to the raid by "mdadm --
> > > incremental",
> > > the "mdadm --add" will return the err.
> > > 
> > > > > > 
> > > > > > The main process of --add:
> > > > > > 1. dev_open
> > > > > > 2. store_super1(st, di->fd) in write_init_super1
> > > > > > 3. fsync(di->fd) in write_init_super1
> > > > > > 4. close(di->fd)
> > > > > > 5. ioctl(ADD_NEW_DISK)
> > > > > > 
> > > > > > However, there will be some udev(change) event after step4.
> > > > > > Then
> > > > > > "/usr/sbin/mdadm --incremental ..." will be run, and the
> > > > > > new
> > > > > > disk
> > > > > > will be add to md device. After that, ioctl will return -
> > > > > > EBUSY.
> > > > > > 
> > > > > 
> > > > > Dose returning -EBUSY hurt anything? Or only returns -EBUSY
> > > > > and
> > > > > other stuffs all work as expected?
> > > > 
> > > > IIUC, it does not. The manual --add command will fail. Li Xiao
> > > > Keng
> > > > has
> > > > described the problem in earlier emails.
> > > Yes！ The disk is add to the raid, but the manual --add command
> > > will
> > > fail.
> > > We will decide the next action based on the return value.
> > > 
> > > >  
> > > > > > Here we add map_lock before write_init_super in "mdadm --
> > > > > > add"
> > > > > > to fix this race.
> > > > > > 
> > > > > 
> > > > > I am not familiar this part of code, but I see ignoring the
> > > > > failure
> > > > > from map_lock() in Assemble() is on purpose by Neil.
> > > > > Therefore I
> > > > > just guess simply return from Assemble when map_lock() fails
> > > > > might
> > > > > not be what you wanted.
> > > > > 
> > > > > 
> > > > > > Signed-off-by: Li Xiao Keng <lixiaokeng@xxxxxxxxxx>
> > > > > > Signed-off-by: Guanqin Miao <miaoguanqin@xxxxxxxxxx>
> > > > > > ---
> > > > > >  Assemble.c |  5 ++++-
> > > > > >  Manage.c   | 25 +++++++++++++++++--------
> > > > > >  2 files changed, 21 insertions(+), 9 deletions(-)
> > > > > > 
> > > > > > diff --git a/Assemble.c b/Assemble.c
> > > > > > index 49804941..086890ed 100644
> > > > > > --- a/Assemble.c
> > > > > > +++ b/Assemble.c
> > > > > > @@ -1479,8 +1479,11 @@ try_again:
> > > > > >          * to our list.  We flag them so that we don't try
> > > > > > to
> > > > > > re-
> > > > > > add,
> > > > > >          * but can remove if they turn out to not be
> > > > > > wanted.
> > > > > >          */
> > > > > > -       if (map_lock(&map))
> > > > > > +       if (map_lock(&map)) {
> > > > > >                 pr_err("failed to get exclusive lock on
> > > > > > mapfile
> > > > > > -
> > > > > > continue anyway...\n");
> > > > > > +               return 1;
> > > > > 
> > > > > Especially when the error message noticed "continue anyway"
> > > > > but a
> > > > > return 1
> > > > > followed, the behavior might be still confusing.
> > > > 
> > > > Now as you're saying it, I recall I had the same comment last
> > > > time
> > > > ;-)
> > > > 
> > > I'm very sorry for this stupid mistake. I I find I send v1 patch
> > > but
> > > not
> > > v2. I will send patch v2 to instead of it.
> > > 
> > > -       if (map_lock(&map))
> > > -               pr_err("failed to get exclusive lock on mapfile -
> > > continue anyway...\n");
> > > +       if (map_lock(&map)) {
> > > +               pr_err("failed to get exclusive lock on mapfile
> > > when
> > > assemble raid.\n");
> > > +               return 1;
> > > +       }
> > > 
> > > > I might add that "return 1" is dangerous, as it pretends that
> > > > Manage_add() was successful and actually added a device, which
> > > > is
> > > > not
> > > > the case. In the special case that Li Xiao Keng wants to fix,
> > > > it's
> > > > true
> > > > (sort of) because the asynchronous "mdadm -I" will have added
> > > > the
> > > > device already. But there could be other races where
> > > > Assemble_map()
> > > > can't obtain the lock and still the device will not be added
> > > > later.
> > > > 
> > > 
> > > Do I missunstandings
> > > "AFAICS it would only help if the code snipped above did not only
> > > pr_err() but exit if it can't get an exclusive lock." ?
> > > 
> > 
> > > Anyway, map_lock is a blocking function. If it can't get the
> > > lock, it
> > blocks.
> > > If map_lock() return error, Assemble() return 1. When -add unlock
> > > it,
> > > Assemble() will go ahead but not return at map_lock().
> > 
> > Maybe *I* was misunderstanding. I thought map_lock() returned error
> > if
> > the lock was held by the other process. What exactly does an error
> > return from map_lock() mean? If it does not mean "lock held by
> > another
> > process", why does your patch solve the race issue?
> > 
> 
> The -add locks map_lock() before udev(change) event happen, and
> unlocks it
> until ioctl(ADD_NEW_DISK) finishing to solve the race issue. This
> makes
> manual -add return success and -incremental (triggered by uevent)
> will
> fail, which is same as the previous successful execution of the -add
> command.

OK. Then maybe you should just leave the code for handling the
map_lock() error condition as-is. It tends to confuse readers.

Martin