On 2022-05-30 03:55, Guoqing Jiang wrote: > I tried with 5.18.0-rc3, no problem for 07reshape5intr (will investigate > why it failed with this patch), but 07revert-grow still failed without > apply this one. > > From fail07revert-grow.log, it shows below issues. > > [ 7856.233515] mdadm[25246]: segfault at 0 ip 000000000040fe56 sp > 00007ffdcf252800 error 4 in mdadm[400000+81000] > [ 7856.233544] Code: 00 48 8d 7c 24 30 e8 79 30 ff ff 48 8d 7c 24 30 31 > f6 31 c0 e8 db 34 ff ff 85 c0 79 77 bf 26 50 46 00 b9 04 00 00 00 48 89 > de <f3> a6 0f 97 c0 1c 00 84 c0 75 18 e8 fa 36 ff ff 48 0f be 53 04 48 > > [ 7866.871747] mdadm[25463]: segfault at 0 ip 000000000040fe56 sp > 00007ffe91e39800 error 4 in mdadm[400000+81000] > [ 7866.871760] Code: 00 48 8d 7c 24 30 e8 79 30 ff ff 48 8d 7c 24 30 31 > f6 31 c0 e8 db 34 ff ff 85 c0 79 77 bf 26 50 46 00 b9 04 00 00 00 48 89 > de <f3> a6 0f 97 c0 1c 00 84 c0 75 18 e8 fa 36 ff ff 48 0f be 53 04 48 > > [ 7876.779855] ====================================================== > [ 7876.779858] WARNING: possible circular locking dependency detected > [ 7876.779861] 5.18.0-rc3-57-default #28 Tainted: G E > [ 7876.779864] ------------------------------------------------------ > [ 7876.779867] mdadm/25444 is trying to acquire lock: > [ 7876.779870] ffff991817749938 ((wq_completion)md_misc){+.+.}-{0:0}, > at: flush_workqueue+0x87/0x470 > [ 7876.779879] > but task is already holding lock: > [ 7876.779882] ffff9917c5c1c2c0 (&mddev->reconfig_mutex){+.+.}-{3:3}, > at: action_store+0x11a/0x2c0 [md_mod] > [ 7876.779892] > which lock already depends on the new lock. > Hmm, strange. I'm definitely running with lockdep and even if I try the test on my machine, on v5.18-rc3, I don't get this error. Not sure why. In any case it looks like we recently added a flush_workqueue(md_misc_wq) call in action_store() which runs with the mddev_lock() held. According to your lockdep warning, that can deadlock. That call was added in this commit: Fixes: cc1ffe61c026 ("md: add new workqueue for delete rdev") Can we maybe run flush_workqueue() before we take mddev_lock()? Logan