On Sun, 25 Jan 2015 21:06:20 +0100 Heinz Mauelshagen <heinzm@xxxxxxxxxx> wrote: > From: Heinz Mauelshagen <heinzm@xxxxxxxxxx> > > Hi Neil, > > the reconstruct write optimization in raid5, function fetch_block causes > livelocks in LVM raid4/5 tests. > > Test scenarios: > the tests wait for full initial array resynchronization before making a > filesystem > on the raid4/5 logical volume, mounting it, writing to the filesystem > and failing > one physical volume holding a raiddev. > > In short, we're seeing livelocks on fully synchronized raid4/5 arrays > with a failed device. > > This patch fixes the issue but likely in a suboptimnal way. > > Do you think there is a better solution to avoid livelocks on > reconstruct writes? > > Regards, > Heinz > > Signed-off-by: Heinz Mauelshagen <heinzm@xxxxxxxxxx> > Tested-by: Jon Brassow <jbrassow@xxxxxxxxxx> > Tested-by: Heinz Mauelshagen <heinzm@xxxxxxxxxx> > > --- > drivers/md/raid5.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c > index c1b0d52..0fc8737 100644 > --- a/drivers/md/raid5.c > +++ b/drivers/md/raid5.c > @@ -2915,7 +2915,7 @@ static int fetch_block(struct stripe_head *sh, > struct stripe_head_state *s, > (s->failed >= 1 && fdev[0]->toread) || > (s->failed >= 2 && fdev[1]->toread) || > (sh->raid_conf->level <= 5 && s->failed && fdev[0]->towrite && > - (!test_bit(R5_Insync, &dev->flags) || > test_bit(STRIPE_PREREAD_ACTIVE, &sh->state)) && > + (!test_bit(R5_Insync, &dev->flags) || > test_bit(STRIPE_PREREAD_ACTIVE, &sh->state) || s->non_overwrite) && > !test_bit(R5_OVERWRITE, &fdev[0]->flags)) || > ((sh->raid_conf->level == 6 || > sh->sector >= sh->raid_conf->mddev->recovery_cp) That is a bit heavy handed, but knowing that fixes the problem helps a lot. I think the problem happens when processes a non-overwrite write to a failed device. fetch_block() should, in that case, pre-read all of the working device, but since (!test_bit(R5_Insync, &dev->flags) || test_bit(STRIPE_PREREAD_ACTIVE, &sh->state)) && was added, it sometimes doesn't. The root problem is that handle_stripe_dirtying is getting confused because neither rmw or rcw seem to work, so it doesn't start the chain of events to set STRIPE_PREREAD_ACTIVE. The following (which is against mainline) might fix it. Can you test? Thanks, NeilBrown diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c index c1b0d52bfcb0..793cf2861e97 100644 --- a/drivers/md/raid5.c +++ b/drivers/md/raid5.c @@ -3195,6 +3195,10 @@ static void handle_stripe_dirtying(struct r5conf *conf, (unsigned long long)sh->sector, rcw, qread, test_bit(STRIPE_DELAYED, &sh->state)); } + if (rcw > disks && rmw > disks && + !test_bit(STRIPE_PREREAD_ACTIVE, &sh->state)) + set_bit(STRIPE_DELAYED, &sh->state); + /* now if nothing is locked, and if we have enough data, * we can start a write request */ This code really really needs to be tidied up and commented better!!! Thanks, NeilBrown
Attachment:
pgp5aCAbD2yfT.pgp
Description: OpenPGP digital signature