On Fri, 30 Aug 2024 15:27:14 +0800 Yu Kuai <yukuai1@xxxxxxxxxxxxxxx> wrote: > From: Yu Kuai <yukuai3@xxxxxxxxxx> > > The lifetime of badblocks: > > - IO error, and decide to record badblocks, and record sb_flags; > - write IO found rdev has badblocks and not yet acknowledged, then this > IO is blocked; > - daemon found sb_flags is set, update superblock and flush badblocks; > - write IO continue; > > Main idea is that badblocks will be set in memory fist, before badblocks > are acknowledged, new write request must be blocked to prevent reading > old data after power failure, and this behaviour is not necessary if rdev > is faulty in the first place. > > Yu Kuai (7): > md: add a new helper rdev_blocked() > md: don't wait faulty rdev in md_wait_for_blocked_rdev() > md: don't record new badblocks for faulty rdev > md/raid1: factor out helper to handle blocked rdev from > raid1_write_request() > md/raid1: don't wait for Faulty rdev in wait_blocked_rdev() > md/raid10: don't wait for Faulty rdev in wait_blocked_rdev() > md/raid5: don't set Faulty rdev for blocked_rdev > > drivers/md/md.c | 8 +++-- > drivers/md/md.h | 24 +++++++++++++++ > drivers/md/raid1.c | 75 +++++++++++++++++++++++---------------------- > drivers/md/raid10.c | 40 +++++++++++------------- > drivers/md/raid5.c | 13 ++++---- > 5 files changed, 92 insertions(+), 68 deletions(-) > Hi, We tested this patchset. mdmon rework: https://github.com/md-raid-utilities/mdadm/pull/66 Kernel build torvalds/linux.git master: commit e32cde8d2bd7d251a8f9b434143977ddf13dcec6 I applied this patchset on top of that. My tests proved that: - If only mdmon PR is applied - hangs are reproducible. - If only this patchset is applied - hangs are reproducible. - If both kernel patchset and mdmon rework are applied- hangs are not reproducible (at least until now). It was tricky topic (I needed to deal with weird issues related to shared descriptors in mdmon). What the most important- there is no regression detected. Thanks, Mariusz