On Sat, 31 Aug 2024 09:14:39 +0800 Yu Kuai <yukuai1@xxxxxxxxxxxxxxx> wrote: > Hi, > > 在 2024/08/30 18:28, Mariusz Tkaczyk 写道: > > On Fri, 30 Aug 2024 15:27:17 +0800 > > Yu Kuai <yukuai1@xxxxxxxxxxxxxxx> wrote: > > > >> From: Yu Kuai <yukuai3@xxxxxxxxxx> > >> > >> Faulty will be checked before issuing IO to the rdev, however, rdev can > >> be faulty at any time, hence it's possible that rdev_set_badblocks() > >> will be called for faulty rdev. In this case, mddev->sb_flags will be > >> set and some other path can be blocked by updating super block. > >> > >> Since faulty rdev will not be accesed anymore, there is no need to > >> record new babblocks for faulty rdev and forcing updating super block. > >> > >> Noted this is not a bugfix, just prevent updating superblock in some > >> corner cases, and will help to slice a bug related to external > >> metadata[1]. > >> > >> [1] > >> https://lore.kernel.org/all/f34452df-810b-48b2-a9b4-7f925699a9e7@xxxxxxxxxxxxxxx/ > >> > >> Signed-off-by: Yu Kuai <yukuai3@xxxxxxxxxx> > >> --- > >> drivers/md/md.c | 4 ++++ > >> 1 file changed, 4 insertions(+) > >> > >> diff --git a/drivers/md/md.c b/drivers/md/md.c > >> index 675d89597c7b..a3f7f407fe42 100644 > >> --- a/drivers/md/md.c > >> +++ b/drivers/md/md.c > >> @@ -9757,6 +9757,10 @@ int rdev_set_badblocks(struct md_rdev *rdev, > >> sector_t s, int sectors, { > >> struct mddev *mddev = rdev->mddev; > >> int rv; > >> + > >> + if (test_bit(Faulty, &rdev->flags)) > >> + return 1; > >> + > > > > Blame is volatile, this is why we need a comment here :) > > Otherwise, someone may remove that. > > Perhaps something like following? > > /* > * record new babblocks for faulty rdev will force unnecessary > * super block updating. > */ > Almost, we need to refer to external context because this is important to mention where to expect issues: /* * Recording new badblocks for faulty rdev will force unnecessary * super block updating. This is fragile for external management because * userspace daemon may trying to remove this device and deadlock may * occur. This will be probably solved in the mdadm, but it is safer to avoid * it. */ In my testing, I observed that it improves failing bios and device removal path (recording badblock is simply expensive if there are many badblocks) so the devices are removed faster but I don't have data here, this is what I saw. Obviously, it is optimization. Mariusz