Hi,
在 2023/03/16 6:55, Logan Gunthorpe 写道:
On 2023-03-15 02:30, Paul Menzel wrote:
Am 15.03.23 um 07:18 schrieb Yu Kuai:
I tested this pathset with mdadm tests, and there are no new regression,
by the way, following test will failed with or without this patchset:
01raid6integ
04r1update
05r6tor0
10ddf-create
10ddf-fail-spare
10ddf-fail-stop-readd
10ddf-geometry
As you improved the tests in the past, can you confirm, these failed on
your test systems too and are fixed now?
Hmm, well Yu did not claim that those tests were fixed. If you re-read
what was said, the tests listed failed with or without the new changes.
As I read it, Yu asserts no new regressions were created with the patch
set, not that failing tests were fixed.
Unfortunately, the tests listed are largely not ones I saw failing the
last time I ran the tests (though it's been a few months since I last
tried). I know 01raid6integ used to fail some of the time, but the other
6 tests mentioned worked the last time I ran them; and there are many
other tests that failed when I ran them. (My notes on which tests are
broken are included in the most recent mdadm tree in tests/*.broken)
I was going to try and confirm that no new regressions were introduced
by Yu's patches, but seems the tests are getting worse. I tried running
the tests on the current md-next branch and found that one of the early
tests, 00raid5-zero, hangs indefinitely. I quickly ran the same test on
v6.3-rc2 and found that it runs just fine there. So it looks like
there's already a regression in md-next that is not part of this series
and I don't have the time to dig into the root cause right now.
Yu's patches don't apply cleanly to v6.3-rc2 and I can't run the tests
against md-next; so I didn't bother running them, but I did do a quick
review. The locking changes make sense to me so it might be worth
merging for correctness. However, I'm not entirely sure it's the best
solution -- the md thread stuff seems like a bit of a mess and passing
an mddev to thread functions that were not related to the mddev to get a
lock seems to just make the mess a bit worse.
For example, it seems a bit ugly to me for the lock mddev->thread_lock
to protect the access of a pointer in struct r5l_log. Just spit-balling,
but perhaps RCU would be more appropriate here. Then md_wakeup_thread()
would just need to hold the RCU read lock when dereferencing, and
md_unregister_thread() would just need to synchronize_rcu() before
stopping and freeing the thread. This has the benefit of not requiring
the mddev object for every md_thread and would probably require a lot
less churn than the current patches.
Thanks for your suggestion, this make sense to me. I'll try to use rcu.
Thanks,
Kuai
Logan
.