2014-05-20 7:42 GMT+02:00 NeilBrown <neilb@xxxxxxx>: > On Thu, 15 May 2014 09:50:49 +0200 Patrik Horník <patrik@xxxxxx> wrote: > >> OK, it seems that because of that my copy operations will not be >> finished yet by next week... :) >> >> BTW this time layout is left-symetric but the problem I guess is in >> whole strip' write detection with degraded RAID6. >> >> Patrik >> >> 2014-05-15 9:18 GMT+02:00 NeilBrown <neilb@xxxxxxx>: >> > On Thu, 15 May 2014 09:04:27 +0200 Patrik Horník <patrik@xxxxxx> wrote: >> > >> >> Hello Neil, >> >> >> >> did you make some progress on this issue by any chance? >> > >> > No I haven't - sorry. >> > After 2 year, I guess I really should. >> > >> > I'll make another note for first thing next week. > > Can you try the following patch and let me know if it helps? I dont want to test it on production system... But I have some degraded array which does not have production data on it so I will think about how to test it. > I definitely reduced the number of reads significantly, but my measurements > (of a very simple test case) didn't show much speed-up. > I did not look at the patch itself but according to your description is should eliminate the problem, should it not? What was your read / write ratio after the patch? Thanks. Patrik > This is against current mainline. If you want it against another version and > it doesn't apply easily, just ask. > > Thanks, > NeilBrown > > From 98c411f93391be0dbda98d43835dd9e042faa78f Mon Sep 17 00:00:00 2001 > From: NeilBrown <neilb@xxxxxxx> > Date: Mon, 19 May 2014 11:16:49 +1000 > Subject: [PATCH] md/raid56: Don't perform reads to support writes until stripe > is ready. > MIME-Version: 1.0 > Content-Type: text/plain; charset=UTF-8 > Content-Transfer-Encoding: 8bit > > If it is found that we need to pre-read some blocks before a write > can succeed, we normally set STRIPE_DELAYED and don't actually perform > the read until STRIPE_PREREAD_ACTIVE subsequently gets set. > > However for a degraded RAID6 we currently perform the reads as soon > as we see that a write is pending. This significantly hurts > throughput. > > So: > - when handle_stripe_dirtying find a block that it wants on a device > that is failed, set STRIPE_DELAY, instead of doing nothing, and > - when fetch_block detects that a read might be required to satisfy a > write, only perform the read if STRIPE_PREREAD_ACTIVE is set, > and if we would actually need to read something to complete the write. > > This also helps RAID5, though less often as RAID5 supports a > read-modify-write cycle. For RAID5 the read is performed too early > only if the write is not a full 4K aligned write (i.e. no an > R5_OVERWRITE). > > Also clean up a couple of horrible bits of formatting. > > Reported-by: Patrik Horník <patrik@xxxxxx> > Signed-off-by: NeilBrown <neilb@xxxxxxx> > > diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c > index 633e20a96b34..d67202bd9118 100644 > --- a/drivers/md/raid5.c > +++ b/drivers/md/raid5.c > @@ -292,9 +292,12 @@ static void do_release_stripe(struct r5conf *conf, struct stripe_head *sh, > BUG_ON(atomic_read(&conf->active_stripes)==0); > if (test_bit(STRIPE_HANDLE, &sh->state)) { > if (test_bit(STRIPE_DELAYED, &sh->state) && > - !test_bit(STRIPE_PREREAD_ACTIVE, &sh->state)) > + !test_bit(STRIPE_PREREAD_ACTIVE, &sh->state)) { > list_add_tail(&sh->lru, &conf->delayed_list); > - else if (test_bit(STRIPE_BIT_DELAY, &sh->state) && > + if (atomic_read(&conf->preread_active_stripes) > + < IO_THRESHOLD) > + md_wakeup_thread(conf->mddev->thread); > + } else if (test_bit(STRIPE_BIT_DELAY, &sh->state) && > sh->bm_seq - conf->seq_write > 0) > list_add_tail(&sh->lru, &conf->bitmap_list); > else { > @@ -2908,8 +2911,11 @@ static int fetch_block(struct stripe_head *sh, struct stripe_head_state *s, > (s->failed >= 1 && fdev[0]->toread) || > (s->failed >= 2 && fdev[1]->toread) || > (sh->raid_conf->level <= 5 && s->failed && fdev[0]->towrite && > + (!test_bit(R5_Insync, &dev->flags) || test_bit(STRIPE_PREREAD_ACTIVE, &sh->state)) && > !test_bit(R5_OVERWRITE, &fdev[0]->flags)) || > - (sh->raid_conf->level == 6 && s->failed && s->to_write))) { > + (sh->raid_conf->level == 6 && s->failed && s->to_write && > + s->towrite < sh->raid_conf->raid_disks - 2 && > + (!test_bit(R5_Insync, &dev->flags) || test_bit(STRIPE_PREREAD_ACTIVE, &sh->state))))) { > /* we would like to get this block, possibly by computing it, > * otherwise read it if the backing disk is insync > */ > @@ -3115,7 +3121,8 @@ static void handle_stripe_dirtying(struct r5conf *conf, > !test_bit(R5_LOCKED, &dev->flags) && > !(test_bit(R5_UPTODATE, &dev->flags) || > test_bit(R5_Wantcompute, &dev->flags))) { > - if (test_bit(R5_Insync, &dev->flags)) rcw++; > + if (test_bit(R5_Insync, &dev->flags)) > + rcw++; > else > rcw += 2*disks; > } > @@ -3136,10 +3143,10 @@ static void handle_stripe_dirtying(struct r5conf *conf, > !(test_bit(R5_UPTODATE, &dev->flags) || > test_bit(R5_Wantcompute, &dev->flags)) && > test_bit(R5_Insync, &dev->flags)) { > - if ( > - test_bit(STRIPE_PREREAD_ACTIVE, &sh->state)) { > - pr_debug("Read_old block " > - "%d for r-m-w\n", i); > + if (test_bit(STRIPE_PREREAD_ACTIVE, > + &sh->state)) { > + pr_debug("Read_old block %d for r-m-w\n", > + i); > set_bit(R5_LOCKED, &dev->flags); > set_bit(R5_Wantread, &dev->flags); > s->locked++; > @@ -3162,10 +3169,9 @@ static void handle_stripe_dirtying(struct r5conf *conf, > !(test_bit(R5_UPTODATE, &dev->flags) || > test_bit(R5_Wantcompute, &dev->flags))) { > rcw++; > - if (!test_bit(R5_Insync, &dev->flags)) > - continue; /* it's a failed drive */ > - if ( > - test_bit(STRIPE_PREREAD_ACTIVE, &sh->state)) { > + if (test_bit(R5_Insync, &dev->flags) && > + test_bit(STRIPE_PREREAD_ACTIVE, > + &sh->state)) { > pr_debug("Read_old block " > "%d for Reconstruct\n", i); > set_bit(R5_LOCKED, &dev->flags); -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html