After thinking for a while, my words from my last email don't describe properly. For raid1/raid10, if fail_last_dev is true. The bios which are sent to member disks all have MD_FAILFAST. If there are no errors, failfast works well until the last device failure. It will not re-send the bio without MD_FAILFAST when fail_last_dev is true, because the last device has been set faulty. There is no meaning to send the bio again in this situation. So it should be right to only check faulty flag here. On Fri, Feb 11, 2022 at 3:24 PM Xiao Ni <xni@xxxxxxxxxx> wrote: > > And for raid1/raid10, it looks like fail_last_dev and FailFast want to > do opposite things. > It can fail the last and it doesn't send a rewrite bio when > fail_last_dev is true. Because the > last dev has been set faulty. There is no meaning to send the rewrite > bio. So FailFast only > works when fail_last_dev is false. > > On Fri, Feb 11, 2022 at 2:48 PM Xiao Ni <xni@xxxxxxxxxx> wrote: > > > > Hi Marisuz > > > > We don't need to consider MD_FAILFAST for raid456. Because only raid1 > > and raid10 support it. > > MD_FAILFAST_SUPPORTED is only set in raid1_run/raid10_run. So LastDev > > only be useful for > > raid1/raid10. It should be good to only check Faulty here. > > > > Best Regards > > Xiao > > > > On Wed, Feb 9, 2022 at 5:40 PM Mariusz Tkaczyk > > <mariusz.tkaczyk@xxxxxxxxxxxxxxx> wrote: > > > > > > Hi All, > > > During my work under failed arrays handling[1] improvements, I > > > discovered potential issue with "failfast" and metadata writes. In > > > commit message[2] Neil mentioned that: > > > "If we get a failure writing metadata but the device doesn't > > > fail, it must be the last device so we re-write without > > > FAILFAST". > > > > > > Obviously, this is not true for RAID456 (again)[1] but it is also not > > > true for RAID1 and RAID10 with "fail_las_dev"[3] functionality enabled. > > > > > > I did a quick check and can see that setter for "LastDev" flag is > > > called if "Faulty" on device is not set. I proposed some changes in the > > > area in my patchset[4] but after discussion we decided to drop changes > > > here. Current approach is not correct for all branches, so my proposal > > > is to change: > > > > > > diff --git a/drivers/md/md.c b/drivers/md/md.c > > > index 7b024912f1eb..3daec14ef6b2 100644 > > > --- a/drivers/md/md.c > > > +++ b/drivers/md/md.c > > > @@ -931,7 +931,7 @@ static void super_written(struct bio *bio) > > > pr_err("md: %s gets error=%d\n", __func__, > > > blk_status_to_errno(bio->bi_status)); > > > md_error(mddev, rdev); > > > - if (!test_bit(Faulty, &rdev->flags) > > > + if (test_bit(MD_BROKEN, mddev->flag) > > > && (bio->bi_opf & MD_FAILFAST)) { > > > set_bit(MD_SB_NEED_REWRITE, &mddev->sb_flags); > > > set_bit(LastDev, &rdev->flags); > > > > > > > > > It will force "LastDev" to be set on every metadata rewrite if mddevice > > > is known to be failed. > > > Do you have any other suggestions? > > > > > > + Guoqing - author of fail_last_dev. > > > + Xiao - you are familiarized with FailFast so please take a look. > > > > > > [1]https://lore.kernel.org/linux-raid/CAPhsuW54_9CTR6sh7mnQ6O77F2HNArLHGWHYsUdbNGy7pXgipQ@xxxxxxxxxxxxxx/T/#m8cf7c57429b6fd332220157186151900ce23865d > > > [2]https://git.kernel.org/pub/scm/linux/kernel/git/song/md.git/commit/?id=46533ff7fefb7e9e3539494f5873b00091caa8eb > > > [3]https://git.kernel.org/pub/scm/linux/kernel/git/song/md.git/commit/?id=9a567843f7ce > > > [4]https://lore.kernel.org/linux-raid/CAPhsuW5bV+Bz=Od9jomNHoedaEMFAXymN11J80G62GVPwSp41g@xxxxxxxxxxxxxx/ > > > > > > Thanks, > > > Mariusz > > >