On Sat, 12 Feb 2022 09:12:00 +0800 Guoqing Jiang <guoqing.jiang@xxxxxxxxx> wrote: > On 1/27/22 11:39 PM, Mariusz Tkaczyk wrote: > > Patch 62f7b1989c0 ("md raid0/linear: Mark array as 'broken' and > > fail BIOs if a member is gone") allowed to finish writes earlier > > (before level dependent actions) for non-redundant arrays. > > > > To achieve that MD_BROKEN is added to mddev->flags if drive > > disappearance is detected. This is done in is_mddev_broken() which > > is confusing and not consistent with other levels where > > error_handler() is used. This patch adds appropriate error_handler > > for raid0 and linear. > > I think the purpose of them are quite different, as said before, > error_handler > is mostly against rdev while is_mddev_broken is for mddev though it > needs to test rdev first. I changed is_mddev_broken to is_rdev_broken, because it checks the device now. On error it calls md_error and later error_handler. I unified error handling for each level. Do you consider it as wrong? > > > It also adopts md_error(), we only want to call .error_handler for > > those levels. mddev->pers->sync_request is additionally checked, > > its existence implies a level with redundancy. > > > > Usage of error_handler causes that disk failure can be requested > > from userspace. User can fail the array via #mdadm --set-faulty > > command. This is not safe and will be fixed in mdadm. > > What is the safe issue here? It would betterr to post mdadm fix > together. We can and should block user from damaging raid even if it is recoverable. It is a regression. I will fix mdadm. I don't consider it as a big risk (because it is recoverable) so I focused on kernel part first. > > > It is correctable because failed > > state is not recorded in the metadata. After next assembly array > > will be read-write again. > > I don't think it is a problem, care to explain why it can't be RW > again? failed state is not recoverable in runtime, so you need to recreate array. > > > For safety reason is better to keep MD_BROKEN in runtime only. > > Isn't MD_BROKEN runtime already? It is mddev_flags not mddev_sb_flags. Yes, and this is why I didn't propagate it. > > > Signed-off-by: Mariusz Tkaczyk<mariusz.tkaczyk@xxxxxxxxxxxxxxx> > > --- > > drivers/md/md-linear.c | 15 ++++++++++++++- > > drivers/md/md.c | 6 +++++- > > drivers/md/md.h | 10 ++-------- > > drivers/md/raid0.c | 15 ++++++++++++++- > > 4 files changed, 35 insertions(+), 11 deletions(-) > > > > diff --git a/drivers/md/md-linear.c b/drivers/md/md-linear.c > > index 1ff51647a682..3c368e3e4641 100644 > > --- a/drivers/md/md-linear.c > > +++ b/drivers/md/md-linear.c > > @@ -233,7 +233,8 @@ static bool linear_make_request(struct mddev > > *mddev, struct bio *bio) bio_sector < start_sector)) > > goto out_of_bounds; > > > > - if (unlikely(is_mddev_broken(tmp_dev->rdev, "linear"))) { > > + if (unlikely(is_rdev_broken(tmp_dev->rdev))) { > > + md_error(mddev, tmp_dev->rdev); > > [ ... ] > > > > > +static void linear_error(struct mddev *mddev, struct md_rdev *rdev) > > +{ > > + if (!test_and_set_bit(MD_BROKEN, &rdev->mddev->flags)) { > > s/rdev->mddev/mddev/ Noted. > > > + char *md_name = mdname(mddev); > > + > > + pr_crit("md/linear%s: Disk failure on %pg > > detected.\n" > > + "md/linear:%s: Cannot continue, failing > > array.\n", > > + md_name, rdev->bdev, md_name); > > The second md_name is not needed. Could you elaborate here more? Do you want to skip device name in second message? > > > + } > > +} > > + > > static void linear_quiesce(struct mddev *mddev, int state) > > { > > } > > @@ -297,6 +309,7 @@ static struct md_personality linear_personality > > = .hot_add_disk = linear_add, > > .size = linear_size, > > .quiesce = linear_quiesce, > > + .error_handler = linear_error, > > }; > > > > static int __init linear_init (void) > > diff --git a/drivers/md/md.c b/drivers/md/md.c > > index e8666bdc0d28..f888ef197765 100644 > > --- a/drivers/md/md.c > > +++ b/drivers/md/md.c > > @@ -7982,7 +7982,11 @@ void md_error(struct mddev *mddev, struct > > md_rdev *rdev) > > if (!mddev->pers || !mddev->pers->error_handler) > > return; > > - mddev->pers->error_handler(mddev,rdev); > > + mddev->pers->error_handler(mddev, rdev); > > + > > + if (!mddev->pers->sync_request) > > + return; > > The above only valid for raid0 and linear, I guess it is fine if DM > don't create LV on top > of them. But the new checking deserves some comment above. Will do, could you propose comment? > > > + > > if (mddev->degraded) > > set_bit(MD_RECOVERY_RECOVER, &mddev->recovery); > > sysfs_notify_dirent_safe(rdev->sysfs_state); > > [ ... ] > > > +static void raid0_error(struct mddev *mddev, struct md_rdev *rdev) > > +{ > > + if (!test_and_set_bit(MD_BROKEN, &rdev->mddev->flags)) { > > + char *md_name = mdname(mddev); > > + > > + pr_crit("md/raid0%s: Disk failure on %pg > > detected.\n" > > + "md/raid0:%s: Cannot continue, failing > > array.\n", > > + md_name, rdev->bdev, md_name); > > The comments for linear_error also valid here. > Noted. Thanks, Mariusz