On Thu, 7 Sep 2023 19:37:44 +0800 Li Xiao Keng <lixiaokeng@xxxxxxxxxx> wrote: > There is a raid1 with sda and sdb. And we add sdc to this raid, > it may return -EBUSY. > > The main process of --add: > 1. dev_open(sdc) in Manage_add > 2. store_super1(st, di->fd) in write_init_super1 > 3. fsync(fd) in store_super1 > 4. close(di->fd) in write_init_super1 > 5. ioctl(ADD_NEW_DISK) > > Step 2 and 3 will add sdc to metadata of raid1. There will be > udev(change of sdc) event after step4. Then "/usr/sbin/mdadm > --incremental --export $devnode --offroot $env{DEVLINKS}" > will be run, and the sdc will be added to the raid1. Then > step 5 will return -EBUSY because it checks if device isn't > claimed in md_import_device()->lock_rdev()->blkdev_get_by_dev() > ->blkdev_get(). > > It will be confusing for users because sdc is added first time. > The "incremental" will get map_lock before add sdc to raid1. > So we add map_lock before write_init_super in "mdadm --add" > to fix the race of "add" and "incremental". > > Signed-off-by: Li Xiao Keng <lixiaokeng@xxxxxxxxxx> > Signed-off-by: Guanqin Miao <miaoguanqin@xxxxxxxxxx> Hello Li Xiao Keng, Thanks for the patch, I'm trying to get my head around this. As I understand, you added a mapfile locking and you are failing when the log is already claimed, but you prefer to return -1 and print error instead of returning EBUSY, right? If yes, it has nothing to do with fixing. It is just a obscure hack to make it, let say more reliable. We can either get EBUSY and know that is expected. I see no difference, your solution is same confusing as current implementation. Also, you used map_lock(). I remember that initially I proposed - my bad. We also rejected udev locking, as a buggy. In fact- I don't see ready synchronization method to avoid this race in mdadm right now. The best way, I see is: 1. Write metadata. 2. Give udev chance to process the device. 3. Add it manually if udev failed (or D_NO_UDEV). If it does not work for you, we need to add synchronization, I know that there is pidfile used for mdmon, perhaps we can reuse it? We don't need something fancy, just something to keep all synchronised and block incremental to led --add finish the action. We cannot use map_lock() and map_unlock() in this context. It claims lock on mapfile which describes every array in your system (base details, there is no info about members), see /var/run/mdadm. Add is just a tiny action to add disk and we are not going to update the map. Taking this lock will make --add more vulnerable, for example it will fail because another array is assembled or created in the same time (I know that it will be hard to achieve but it is another race - we don't want it). And we cannot allow for partially done action, you added this lock after writing metadata (according to description), we are not taking this lock to complete the action, just to hide race. Lock should be taken before writing metadata, but we know that it will not hide an issue. In this for it cannot be merged I don't see a gain, we are returning -1 instead -22, what is the difference? I don't see error message as more meaningful now. Thanks, Mariusz