On Fri, 8 Apr 2022 09:18:28 -0700 Song Liu <song@xxxxxxxxxx> wrote: > On Fri, Apr 8, 2022 at 7:35 AM Mariusz Tkaczyk > <mariusz.tkaczyk@xxxxxxxxxxxxxxx> wrote: > > > > On Thu, 7 Apr 2022 17:16:37 -0700 > > Song Liu <song@xxxxxxxxxx> wrote: > > > > > On Tue, Mar 22, 2022 at 8:24 AM Mariusz Tkaczyk > > > <mariusz.tkaczyk@xxxxxxxxxxxxxxx> wrote: > > > > > > > > Patch 62f7b1989c0 ("md raid0/linear: Mark array as 'broken' and > > > > fail BIOs if a member is gone") allowed to finish writes earlier > > > > (before level dependent actions) for non-redundant arrays. > > > > > > > > To achieve that MD_BROKEN is added to mddev->flags if drive > > > > disappearance is detected. This is done in is_mddev_broken() which > > > > is confusing and not consistent with other levels where > > > > error_handler() is used. This patch adds appropriate error_handler > > > > for raid0 and linear and adopt md_error() to the change. > > > > > > > > Usage of error_handler causes that disk failure can be requested > > > > from userspace. User can fail the array via #mdadm --set-faulty > > > > command. This is not safe and will be fixed in mdadm. It is > > > > correctable because failed state is not recorded in the metadata. > > > > After next assembly array will be read-write again. For safety > > > > reason is better to keep MD_BROKEN in runtime only. > > > > > > > > Signed-off-by: Mariusz Tkaczyk <mariusz.tkaczyk@xxxxxxxxxxxxxxx> > > > > > > Sorry for the late response. > > > > > > > --- > > > > drivers/md/md-linear.c | 14 +++++++++++++- > > > > drivers/md/md.c | 6 +++++- > > > > drivers/md/md.h | 10 ++-------- > > > > drivers/md/raid0.c | 14 +++++++++++++- > > > > 4 files changed, 33 insertions(+), 11 deletions(-) > > > > > > > > diff --git a/drivers/md/md-linear.c b/drivers/md/md-linear.c > > > > index 1ff51647a682..c33cd28f1dba 100644 > > > > --- a/drivers/md/md-linear.c > > > > +++ b/drivers/md/md-linear.c > > > > @@ -233,7 +233,8 @@ static bool linear_make_request(struct mddev > > > > *mddev, struct bio *bio) bio_sector < start_sector)) > > > > goto out_of_bounds; > > > > > > > > - if (unlikely(is_mddev_broken(tmp_dev->rdev, "linear"))) { > > > > + if (unlikely(is_rdev_broken(tmp_dev->rdev))) { > > > > + md_error(mddev, tmp_dev->rdev); > > > > > > I apologize if we discussed this before. Shall we just call > > > linear_error() here?If we go this way, we don't really need ... > > > > > > > bio_io_error(bio); > > > > return true; > > > > } > > > > @@ -281,6 +282,16 @@ static void linear_status (struct seq_file > > > > *seq, struct mddev *mddev) seq_printf(seq, " %dk rounding", > > > > mddev->chunk_sectors / 2); } > > > > > > > > +static void linear_error(struct mddev *mddev, struct md_rdev *rdev) > > > > +{ > > > > + if (!test_and_set_bit(MD_BROKEN, &mddev->flags)) { > > > > + char *md_name = mdname(mddev); > > > > + > > > > + pr_crit("md/linear%s: Disk failure on %pg detected, > > > > failing array.\n", > > > > + md_name, rdev->bdev); > > > > + } > > > > +} > > > > + > > > > static void linear_quiesce(struct mddev *mddev, int state) > > > > { > > > > } > > > > @@ -297,6 +308,7 @@ static struct md_personality linear_personality > > > > = .hot_add_disk = linear_add, > > > > .size = linear_size, > > > > .quiesce = linear_quiesce, > > > > + .error_handler = linear_error, > > > > > > ... set error_handler here, and ... > > > > > > > }; > > > > > > > > static int __init linear_init (void) > > > > diff --git a/drivers/md/md.c b/drivers/md/md.c > > > > index 0a89f072dae0..3354afc9d2a3 100644 > > > > --- a/drivers/md/md.c > > > > +++ b/drivers/md/md.c > > > > @@ -7985,7 +7985,11 @@ void md_error(struct mddev *mddev, struct > > > > md_rdev *rdev) > > > > > > > > if (!mddev->pers || !mddev->pers->error_handler) > > > > return; > > > > - mddev->pers->error_handler(mddev,rdev); > > > > + mddev->pers->error_handler(mddev, rdev); > > > > + > > > > + if (mddev->pers->level == 0 || mddev->pers->level == > > > > LEVEL_LINEAR) > > > > + return; > > > > > > ... this check here. > > > > > > Did I miss something? > > > > > Hi Song, > > That is correct, we can do the same for raid0. I did it this way to > > make it similar to redundant levels. > > If you think that it is overhead, I can drop it. > > Yeah, it feels like more overhead to me. > > I applied 2/3 and 3/3 to md-next. Please take a look and let me know > if anything needs to be fixed. > Now the first patch is just a code refactor with different error message. I think that we can drop it. Do you agree? Thanks, Mariusz