On 2023-03-15 02:30, Paul Menzel wrote: > Am 15.03.23 um 07:18 schrieb Yu Kuai: >> I tested this pathset with mdadm tests, and there are no new regression, >> by the way, following test will failed with or without this patchset: >> >> 01raid6integ >> 04r1update >> 05r6tor0 >> 10ddf-create >> 10ddf-fail-spare >> 10ddf-fail-stop-readd >> 10ddf-geometry > > As you improved the tests in the past, can you confirm, these failed on > your test systems too and are fixed now? Hmm, well Yu did not claim that those tests were fixed. If you re-read what was said, the tests listed failed with or without the new changes. As I read it, Yu asserts no new regressions were created with the patch set, not that failing tests were fixed. Unfortunately, the tests listed are largely not ones I saw failing the last time I ran the tests (though it's been a few months since I last tried). I know 01raid6integ used to fail some of the time, but the other 6 tests mentioned worked the last time I ran them; and there are many other tests that failed when I ran them. (My notes on which tests are broken are included in the most recent mdadm tree in tests/*.broken) I was going to try and confirm that no new regressions were introduced by Yu's patches, but seems the tests are getting worse. I tried running the tests on the current md-next branch and found that one of the early tests, 00raid5-zero, hangs indefinitely. I quickly ran the same test on v6.3-rc2 and found that it runs just fine there. So it looks like there's already a regression in md-next that is not part of this series and I don't have the time to dig into the root cause right now. Yu's patches don't apply cleanly to v6.3-rc2 and I can't run the tests against md-next; so I didn't bother running them, but I did do a quick review. The locking changes make sense to me so it might be worth merging for correctness. However, I'm not entirely sure it's the best solution -- the md thread stuff seems like a bit of a mess and passing an mddev to thread functions that were not related to the mddev to get a lock seems to just make the mess a bit worse. For example, it seems a bit ugly to me for the lock mddev->thread_lock to protect the access of a pointer in struct r5l_log. Just spit-balling, but perhaps RCU would be more appropriate here. Then md_wakeup_thread() would just need to hold the RCU read lock when dereferencing, and md_unregister_thread() would just need to synchronize_rcu() before stopping and freeing the thread. This has the benefit of not requiring the mddev object for every md_thread and would probably require a lot less churn than the current patches. Logan