Re: [PATCH v2] Fix race of "mdadm --add" and "mdadm --incremental"

Li Xiao Keng <lixiaokeng@xxxxxxxxxx> · Thu, 7 Sep 2023 11:02:35 +0800

On 2023/9/6 21:31, Martin Wilck wrote:
> On Wed, 2023-09-06 at 16:51 +0800, Li Xiao Keng wrote:
>>
>>
>> On 2023/9/6 3:08, Martin Wilck wrote:
>>> On Wed, 2023-09-06 at 00:17 +0800, Coly Li wrote:
>>>> Hi Xiao Keng,
>>>>
>>>> Thanks for the updated version, I add my comments inline.
>>>>
>>>> On Tue, Sep 05, 2023 at 08:02:06PM +0800, Li Xiao Keng wrote:
>>>>> When we add a new disk to a raid, it may return -EBUSY.
>>>>
>>>> Where is above -EBUSY from? do you mean mdadm command returns
>>>> -EBUSY or it is returned by some specific function in mdadm
>>>> source code.
>>>>
>>
>> Because the new disk is added to the raid by "mdadm --incremental",
>> the "mdadm --add" will return the err.
>>
>>>>>
>>>>> The main process of --add:
>>>>> 1. dev_open
>>>>> 2. store_super1(st, di->fd) in write_init_super1
>>>>> 3. fsync(di->fd) in write_init_super1
>>>>> 4. close(di->fd)
>>>>> 5. ioctl(ADD_NEW_DISK)
>>>>>
>>>>> However, there will be some udev(change) event after step4.
>>>>> Then
>>>>> "/usr/sbin/mdadm --incremental ..." will be run, and the new
>>>>> disk
>>>>> will be add to md device. After that, ioctl will return -EBUSY.
>>>>>
>>>>
>>>> Dose returning -EBUSY hurt anything? Or only returns -EBUSY and
>>>> other stuffs all work as expected?
>>>
>>> IIUC, it does not. The manual --add command will fail. Li Xiao Keng
>>> has
>>> described the problem in earlier emails.
>> Yes！ The disk is add to the raid, but the manual --add command will
>> fail.
>> We will decide the next action based on the return value.
>>
>>>  
>>>>> Here we add map_lock before write_init_super in "mdadm --add"
>>>>> to fix this race.
>>>>>
>>>>
>>>> I am not familiar this part of code, but I see ignoring the
>>>> failure
>>>> from map_lock() in Assemble() is on purpose by Neil. Therefore I
>>>> just guess simply return from Assemble when map_lock() fails
>>>> might
>>>> not be what you wanted.
>>>>
>>>>
>>>>> Signed-off-by: Li Xiao Keng <lixiaokeng@xxxxxxxxxx>
>>>>> Signed-off-by: Guanqin Miao <miaoguanqin@xxxxxxxxxx>
>>>>> ---
>>>>>  Assemble.c |  5 ++++-
>>>>>  Manage.c   | 25 +++++++++++++++++--------
>>>>>  2 files changed, 21 insertions(+), 9 deletions(-)
>>>>>
>>>>> diff --git a/Assemble.c b/Assemble.c
>>>>> index 49804941..086890ed 100644
>>>>> --- a/Assemble.c
>>>>> +++ b/Assemble.c
>>>>> @@ -1479,8 +1479,11 @@ try_again:
>>>>>          * to our list.  We flag them so that we don't try to
>>>>> re-
>>>>> add,
>>>>>          * but can remove if they turn out to not be wanted.
>>>>>          */
>>>>> -       if (map_lock(&map))
>>>>> +       if (map_lock(&map)) {
>>>>>                 pr_err("failed to get exclusive lock on mapfile
>>>>> -
>>>>> continue anyway...\n");
>>>>> +               return 1;
>>>>
>>>> Especially when the error message noticed "continue anyway" but a
>>>> return 1
>>>> followed, the behavior might be still confusing.
>>>
>>> Now as you're saying it, I recall I had the same comment last time
>>> ;-)
>>>
>> I'm very sorry for this stupid mistake. I I find I send v1 patch but
>> not
>> v2. I will send patch v2 to instead of it.
>>
>> -       if (map_lock(&map))
>> -               pr_err("failed to get exclusive lock on mapfile -
>> continue anyway...\n");
>> +       if (map_lock(&map)) {
>> +               pr_err("failed to get exclusive lock on mapfile when
>> assemble raid.\n");
>> +               return 1;
>> +       }
>>
>>> I might add that "return 1" is dangerous, as it pretends that
>>> Manage_add() was successful and actually added a device, which is
>>> not
>>> the case. In the special case that Li Xiao Keng wants to fix, it's
>>> true
>>> (sort of) because the asynchronous "mdadm -I" will have added the
>>> device already. But there could be other races where Assemble_map()
>>> can't obtain the lock and still the device will not be added later.
>>>
>>
>> Do I missunstandings
>> "AFAICS it would only help if the code snipped above did not only
>> pr_err() but exit if it can't get an exclusive lock." ?
>>
> 
>> Anyway, map_lock is a blocking function. If it can't get the lock, it
> blocks.
>> If map_lock() return error, Assemble() return 1. When -add unlock it,
>> Assemble() will go ahead but not return at map_lock().
> 
> Maybe *I* was misunderstanding. I thought map_lock() returned error if
> the lock was held by the other process. What exactly does an error
> return from map_lock() mean? If it does not mean "lock held by another
> process", why does your patch solve the race issue?
> 

The -add locks map_lock() before udev(change) event happen, and unlocks it
until ioctl(ADD_NEW_DISK) finishing to solve the race issue. This makes
manual -add return success and -incremental (triggered by uevent) will
fail, which is same as the previous successful execution of the -add command.

> Martin
> 
> 
> .
>