On Wed, Dec 6, 2023 at 3:36 AM Yu Kuai <yukuai1@xxxxxxxxxxxxxxx> wrote: > > Hi, > > 在 2023/12/06 16:30, Song Liu 写道: > > On Sun, Dec 3, 2023 at 7:18 PM Yu Kuai <yukuai1@xxxxxxxxxxxxxxx> wrote: > >> > >> From: Yu Kuai <yukuai3@xxxxxxxxxx> > >> > >> New mddev_resume() calls are added to synchroniza IO with array > >> reconfiguration, however, this introduce a regression while adding it in > >> md_start_sync(): > >> > >> 1) someone set MD_RECOVERY_NEEDED first; > >> 2) daemon thread grab reconfig_mutex, then clear MD_RECOVERY_NEEDED and > >> queue a new sync work; > >> 3) daemon thread release reconfig_mutex; > >> 4) in md_start_sync > >> a) check that there are spares that can be added/removed, then suspend > >> the array; > >> b) remove_and_add_spares may not be called, or called without really > >> add/remove spares; > >> c) resume the array, then set MD_RECOVERY_NEEDED again! > >> > >> Loop between 2 - 4, then mddev_suspend() will be called quite often, for > >> consequence, normal IO will be quite slow. > >> > >> Fix this problem by spliting MD_RECOVERY_NEEDED out of mddev_resume(), so > >> that md_start_sync() won't set such flag and hence the loop will be broken. > > > > I hope we don't leak set_bit MD_RECOVERY_NEEDED to all call > > sites of mddev_resume(). > > There are also some other mddev_resume() that is added later and don't > need recovery, so md_start_sync() is not the only place: > > - md_setup_drive > - rdev_attr_store > - suspend_lo_store > - suspend_hi_store > - autorun_devices > - md_ioct > - r5c_disable_writeback_async > - error path from new_dev_store(), ... > > I'm not sure add a new helper is a good idea, because all above apis > should use new helper as well. I think for most of these call sites, it is OK to set MD_RECOVERY_NEEDED (although it is not needed), and md_start_sync() is the only one that may trigger "loop between 2 - 4" scenario. Did I miss something? It is already rc4, so we need to send the fix soon. Thanks, Song