On Thu, 18 Jan 2024, Yu Kuai wrote: > Hi, > > 在 2024/01/18 21:07, Mikulas Patocka 写道: > > > > > > On Thu, 18 Jan 2024, Yu Kuai wrote: > > > >> Hi, > >> > >> 在 2024/01/18 2:18, Mikulas Patocka 写道: > >>> Note that md_wakeup_thread_directly is racy - it will do nothing if the > >>> thread is already running or it may cause spurious wake-up if the thread > >>> is blocked in another subsystem. > >> > >> No, as the comment said, md_wakeup_thread_directly() is just to prevent > >> that md_wakeup_thread() can't wake up md_do_sync() if it's waiting for > >> metadata update. > > > > Yes - but what happens if you wake up the thread just a few instructions > > before it is going to sleep for metadata update? wake_up_process does > > nothing on a running process and the thread proceeds with waiting. This is > > what I thought could happen when I was making the patch. > > Please notice that in the orginal code md_wakeup_thread_directly() is > used for sync_thread, and md_wakeup_thread() should be used for > *mddev->thread* (mddev_unlock always do that) to clear > MD_RECOVERY_RUNNING. > > By the way, the root cause that MD_RECOVERY_RUNNING is not cleared is > that mddev_suspend() never stop sync_thread at all, while > md_check_recovery() won't do anything when mddev is suspended. > > Before: > 1. suspend > 2. call md_reap_sync_thread() directly to unregister sync_thread > -> notice that this is not safe. > 3. resume > > Now: > 1. suspend > 2. call stop_sync_thread() to unregister sync_thread interrupt > md_do_sync() and wait for md_check_recovery() to clear > MD_RECOVERY_RUNNING. > -> which will never happen now; > 3. resume > > I fixed this locally and the test integrity-caching.sh passed in my VM. > > Thanks, > Kuai OK, Thanks. Mikulas