On Wed, 2023-09-06 at 00:17 +0800, Coly Li wrote: > Hi Xiao Keng, > > Thanks for the updated version, I add my comments inline. > > On Tue, Sep 05, 2023 at 08:02:06PM +0800, Li Xiao Keng wrote: > > When we add a new disk to a raid, it may return -EBUSY. > > Where is above -EBUSY from? do you mean mdadm command returns > -EBUSY or it is returned by some specific function in mdadm > source code. > > > > > The main process of --add: > > 1. dev_open > > 2. store_super1(st, di->fd) in write_init_super1 > > 3. fsync(di->fd) in write_init_super1 > > 4. close(di->fd) > > 5. ioctl(ADD_NEW_DISK) > > > > However, there will be some udev(change) event after step4. Then > > "/usr/sbin/mdadm --incremental ..." will be run, and the new disk > > will be add to md device. After that, ioctl will return -EBUSY. > > > > Dose returning -EBUSY hurt anything? Or only returns -EBUSY and > other stuffs all work as expected? IIUC, it does not. The manual --add command will fail. Li Xiao Keng has described the problem in earlier emails. > > Here we add map_lock before write_init_super in "mdadm --add" > > to fix this race. > > > > I am not familiar this part of code, but I see ignoring the failure > from map_lock() in Assemble() is on purpose by Neil. Therefore I > just guess simply return from Assemble when map_lock() fails might > not be what you wanted. > > > > Signed-off-by: Li Xiao Keng <lixiaokeng@xxxxxxxxxx> > > Signed-off-by: Guanqin Miao <miaoguanqin@xxxxxxxxxx> > > --- > > Assemble.c | 5 ++++- > > Manage.c | 25 +++++++++++++++++-------- > > 2 files changed, 21 insertions(+), 9 deletions(-) > > > > diff --git a/Assemble.c b/Assemble.c > > index 49804941..086890ed 100644 > > --- a/Assemble.c > > +++ b/Assemble.c > > @@ -1479,8 +1479,11 @@ try_again: > > * to our list. We flag them so that we don't try to re- > > add, > > * but can remove if they turn out to not be wanted. > > */ > > - if (map_lock(&map)) > > + if (map_lock(&map)) { > > pr_err("failed to get exclusive lock on mapfile - > > continue anyway...\n"); > > + return 1; > > Especially when the error message noticed "continue anyway" but a > return 1 > followed, the behavior might be still confusing. Now as you're saying it, I recall I had the same comment last time ;-) I might add that "return 1" is dangerous, as it pretends that Manage_add() was successful and actually added a device, which is not the case. In the special case that Li Xiao Keng wants to fix, it's true (sort of) because the asynchronous "mdadm -I" will have added the device already. But there could be other races where Assemble_map() can't obtain the lock and still the device will not be added later. Regards Martin > > > + } > > + > > if (c->update == UOPT_UUID) > > mp = NULL; > > else > > diff --git a/Manage.c b/Manage.c > > index f54de7c6..6a101bae 100644 > > --- a/Manage.c > > +++ b/Manage.c > > @@ -703,6 +703,7 @@ int Manage_add(int fd, int tfd, struct > > mddev_dev *dv, > > struct supertype *dev_st; > > int j; > > mdu_disk_info_t disc; > > + struct map_ent *map = NULL; > > > > if (!get_dev_size(tfd, dv->devname, &ldsize)) { > > if (dv->disposition == 'M') > > @@ -900,6 +901,10 @@ int Manage_add(int fd, int tfd, struct > > mddev_dev *dv, > > disc.raid_disk = 0; > > } > > > > + if (map_lock(&map)) { > > + pr_err("failed to get exclusive lock on mapfile > > when add disk\n"); > > + return -1; > > + } > > if (array->not_persistent==0) { > > int dfd; > > if (dv->disposition == 'j') > > @@ -911,9 +916,9 @@ int Manage_add(int fd, int tfd, struct > > mddev_dev *dv, > > dfd = dev_open(dv->devname, O_RDWR | > > O_EXCL|O_DIRECT); > > if (tst->ss->add_to_super(tst, &disc, dfd, > > dv->devname, > > INVALID_SECTORS)) > > - return -1; > > + goto unlock; > > if (tst->ss->write_init_super(tst)) > > - return -1; > > + goto unlock; > > } else if (dv->disposition == 'A') { > > /* this had better be raid1. > > * As we are "--re-add"ing we must find a spare > > slot > > @@ -971,14 +976,14 @@ int Manage_add(int fd, int tfd, struct > > mddev_dev *dv, > > pr_err("add failed for %s: could not get > > exclusive access to container\n", > > dv->devname); > > tst->ss->free_super(tst); > > - return -1; > > + goto unlock; > > } > > > > /* Check if metadata handler is able to accept the > > drive */ > > if (!tst->ss->validate_geometry(tst, > > LEVEL_CONTAINER, 0, 1, NULL, > > 0, 0, dv->devname, NULL, 0, 1)) { > > close(container_fd); > > - return -1; > > + goto unlock; > > } > > > > Kill(dv->devname, NULL, 0, -1, 0); > > @@ -987,7 +992,7 @@ int Manage_add(int fd, int tfd, struct > > mddev_dev *dv, > > dv->devname, > > INVALID_SECTORS)) { > > close(dfd); > > close(container_fd); > > - return -1; > > + goto unlock; > > } > > if (!mdmon_running(tst->container_devnm)) > > tst->ss->sync_metadata(tst); > > @@ -998,7 +1003,7 @@ int Manage_add(int fd, int tfd, struct > > mddev_dev *dv, > > dv->devname); > > close(container_fd); > > tst->ss->free_super(tst); > > - return -1; > > + goto unlock; > > } > > sra->array.level = LEVEL_CONTAINER; > > /* Need to set data_offset and component_size */ > > @@ -1013,7 +1018,7 @@ int Manage_add(int fd, int tfd, struct > > mddev_dev *dv, > > pr_err("add new device to external metadata > > failed for %s\n", dv->devname); > > close(container_fd); > > sysfs_free(sra); > > - return -1; > > + goto unlock; > > } > > ping_monitor(devnm); > > sysfs_free(sra); > > @@ -1027,7 +1032,7 @@ int Manage_add(int fd, int tfd, struct > > mddev_dev *dv, > > else > > pr_err("add new device failed for > > %s as %d: %s\n", > > dv->devname, j, > > strerror(errno)); > > - return -1; > > + goto unlock; > > } > > if (dv->disposition == 'j') { > > pr_err("Journal added successfully, making > > %s read-write\n", devname); > > @@ -1038,7 +1043,11 @@ int Manage_add(int fd, int tfd, struct > > mddev_dev *dv, > > } > > if (verbose >= 0) > > pr_err("added %s\n", dv->devname); > > + map_unlock(&map); > > return 1; > > +unlock: > > + map_unlock(&map); > > + return -1; > > } > > > > int Manage_remove(struct supertype *tst, int fd, struct mddev_dev > > *dv, > > I am not challenging, just before I understand what you are trying to > fix, > it is quite hard for me to join the discussion with you for this > change. > > And, this version is much more informative, and thank you for the > effort. >