Re: [PATCH md-6.12 0/7] md: enhance faulty chekcing for blocked handling

Mariusz Tkaczyk <mariusz.tkaczyk@xxxxxxxxxxxxxxx> · Wed, 9 Oct 2024 09:14:32 +0200

On Fri, 30 Aug 2024 15:27:14 +0800
Yu Kuai <yukuai1@xxxxxxxxxxxxxxx> wrote:

> From: Yu Kuai <yukuai3@xxxxxxxxxx>
> 
> The lifetime of badblocks:
> 
> - IO error, and decide to record badblocks, and record sb_flags;
> - write IO found rdev has badblocks and not yet acknowledged, then this
> IO is blocked;
> - daemon found sb_flags is set, update superblock and flush badblocks;
> - write IO continue;
> 
> Main idea is that badblocks will be set in memory fist, before badblocks
> are acknowledged, new write request must be blocked to prevent reading
> old data after power failure, and this behaviour is not necessary if rdev
> is faulty in the first place.
> 
> Yu Kuai (7):
>   md: add a new helper rdev_blocked()
>   md: don't wait faulty rdev in md_wait_for_blocked_rdev()
>   md: don't record new badblocks for faulty rdev
>   md/raid1: factor out helper to handle blocked rdev from
>     raid1_write_request()
>   md/raid1: don't wait for Faulty rdev in wait_blocked_rdev()
>   md/raid10: don't wait for Faulty rdev in wait_blocked_rdev()
>   md/raid5: don't set Faulty rdev for blocked_rdev
> 
>  drivers/md/md.c     |  8 +++--
>  drivers/md/md.h     | 24 +++++++++++++++
>  drivers/md/raid1.c  | 75 +++++++++++++++++++++++----------------------
>  drivers/md/raid10.c | 40 +++++++++++-------------
>  drivers/md/raid5.c  | 13 ++++----
>  5 files changed, 92 insertions(+), 68 deletions(-)
> 

Hi,
We tested this patchset.

mdmon rework:
https://github.com/md-raid-utilities/mdadm/pull/66 

Kernel build torvalds/linux.git master:
commit e32cde8d2bd7d251a8f9b434143977ddf13dcec6

I applied this patchset on top of that.

My tests proved that:
- If only mdmon PR is applied - hangs are reproducible.
- If only this patchset is applied - hangs are reproducible.
- If both kernel patchset and mdmon rework are applied- hangs are not
  reproducible (at least until now).

It was tricky topic (I needed to deal with weird issues related to shared
descriptors in mdmon).

What the most important- there is no regression detected.

Thanks,
Mariusz