On Tue, 26 May 2015 17:10:05 -0700 Shaohua Li <shli@xxxxxxxxxx> wrote: > On Wed, May 27, 2015 at 09:34:51AM +1000, NeilBrown wrote: > > On Wed, 27 May 2015 08:35:32 +1000 NeilBrown <neilb@xxxxxxx> wrote: > > > > > Could you please review and possibly test the patch below? > > > > > > > well... that patch had a fairly obvious double-lock bug. > > Try this one. > > (oh, just saw your email that you spotted the lock bug :-) > > > > > > NeilBrown > > > > From: NeilBrown <neilb@xxxxxxx> > > Date: Wed, 27 May 2015 08:43:45 +1000 > > Subject: [PATCH] md/raid5: close race between STRIPE_BIT_DELAY and batching. > > > > The first time a write is added to a stripe, we need to set the > > bitmap bits (if a bitmap is active). > > While doing that the stripe is not locked and other writes could > > be added and then the stripe could be added to a batch. > > Once it has entered the batch it is too large to set STRIPE_BIT_DELAY > > as the batch head has taken over when the stripe will be written. > > > > We cannot hold the spinlock while adding the bitmap bit, > > so introduce a new stripe_head flag 'STRIPE_BITMAP_PENDING' which > > indicates that adding to the bitmap is pending. This prevents > > the stripe from being added to a batch. > > > > Only the first thread to add a write to a stripe can set this bit, > > so it is safe for it to clear it again when it is done. > > > > Reported-by: Shaohua Li <shli@xxxxxxxxxx> > > Signed-off-by: NeilBrown <neilb@xxxxxxx> > > > > diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c > > index 73b5376dad3b..dae587ecdf71 100644 > > --- a/drivers/md/raid5.c > > +++ b/drivers/md/raid5.c > > @@ -760,6 +760,7 @@ static void unlock_two_stripes(struct stripe_head *sh1, struct stripe_head *sh2) > > static bool stripe_can_batch(struct stripe_head *sh) > > { > > return test_bit(STRIPE_BATCH_READY, &sh->state) && > > + !test_bit(STRIPE_BITMAP_PENDING, &sh->state) && > > is_full_stripe_write(sh); > > } > > > > @@ -3007,14 +3008,27 @@ static int add_stripe_bio(struct stripe_head *sh, struct bio *bi, int dd_idx, > > pr_debug("added bi b#%llu to stripe s#%llu, disk %d.\n", > > (unsigned long long)(*bip)->bi_iter.bi_sector, > > (unsigned long long)sh->sector, dd_idx); > > - spin_unlock_irq(&sh->stripe_lock); > > > > if (conf->mddev->bitmap && firstwrite) { > > + /* Cannot hold spinlock over bitmap_startwrite, > > + * but must ensure this isn't added to a batch until > > + * we have added to the bitmap and set bm_seq. > > + * So set STRIPE_BITMAP_PENDING to prevent > > + * batching. > > + * Only the first thread to add a write to a stripe > > + * can set this bit, so we "own" it. > > + */ > > + WARN_ON(test_bit(STRIPE_BITMAP_PENDING, &sh->state)); > I keep hitting this. the firstwrite is set for every device. > I wonder why I'm not.... maybe because I'm using /dev/loop devices and they behave a bit differently. Of course 'firstwrite' is not per-stripe, it is per-device-in-a-stripe. So it will be set for every device. Which is rather pointless really, we only need to set the bitmap once for the whole stripe. Maybe it was easier the other way. Could you try this? Thanks! NeilBrown From: NeilBrown <neilb@xxxxxxx> Date: Wed, 27 May 2015 08:43:45 +1000 Subject: [PATCH] md/raid5: close race between STRIPE_BIT_DELAY and batching. When we add a write to a stripe we need to make sure the bitmap bit is set. While doing that the stripe is not locked so it could be added to a batch after which further changes to STRIPE_BIT_DELAY and ->bm_seq are ineffective. So we need to hold off adding to a stripe until bitmap_startwrite has completed at least once, and we need to avoid further changes to STRIPE_BIT_DELAY once the stripe has been added to a batch. If a bitmap_startwrite() completes after the stripe was added to a batch, it will not have set the bit, only incremented a counter, so no extra delay of the stripe is needed. Reported-by: Shaohua Li <shli@xxxxxxxxxx> Signed-off-by: NeilBrown <neilb@xxxxxxx> diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c index 73b5376dad3b..5d5ce97c2210 100644 --- a/drivers/md/raid5.c +++ b/drivers/md/raid5.c @@ -760,6 +760,7 @@ static void unlock_two_stripes(struct stripe_head *sh1, struct stripe_head *sh2) static bool stripe_can_batch(struct stripe_head *sh) { return test_bit(STRIPE_BATCH_READY, &sh->state) && + !test_bit(STRIPE_BITMAP_PENDING, &sh->state) && is_full_stripe_write(sh); } @@ -3007,14 +3008,32 @@ static int add_stripe_bio(struct stripe_head *sh, struct bio *bi, int dd_idx, pr_debug("added bi b#%llu to stripe s#%llu, disk %d.\n", (unsigned long long)(*bip)->bi_iter.bi_sector, (unsigned long long)sh->sector, dd_idx); - spin_unlock_irq(&sh->stripe_lock); if (conf->mddev->bitmap && firstwrite) { + /* Cannot hold spinlock over bitmap_startwrite, + * but must ensure this isn't added to a batch until + * we have added to the bitmap and set bm_seq. + * So set STRIPE_BITMAP_PENDING to prevent + * batching. + * If multiple add_stripe_bio() calls race here they + * much all set STRIPE_BITMAP_PENDING. So only the first one + * to complete "bitmap_startwrite" gets to set + * STRIPE_BIT_DELAY. This is important as once a stripe + * is added to a batch, STRIPE_BIT_DELAY cannot be changed + * any more. + */ + set_bit(STRIPE_BITMAP_PENDING, &sh->state); + spin_unlock_irq(&sh->stripe_lock); bitmap_startwrite(conf->mddev->bitmap, sh->sector, STRIPE_SECTORS, 0); - sh->bm_seq = conf->seq_flush+1; - set_bit(STRIPE_BIT_DELAY, &sh->state); + spin_lock_irq(&sh->stripe_lock); + clear_bit(STRIPE_BITMAP_PENDING, &sh->state); + if (!sh->batch_head) { + sh->bm_seq = conf->seq_flush+1; + set_bit(STRIPE_BIT_DELAY, &sh->state); + } } + spin_unlock_irq(&sh->stripe_lock); if (stripe_can_batch(sh)) stripe_add_to_batch_list(conf, sh); diff --git a/drivers/md/raid5.h b/drivers/md/raid5.h index d7b2bc8b756f..02c3bf8fbfe7 100644 --- a/drivers/md/raid5.h +++ b/drivers/md/raid5.h @@ -337,6 +337,9 @@ enum { STRIPE_ON_RELEASE_LIST, STRIPE_BATCH_READY, STRIPE_BATCH_ERR, + STRIPE_BITMAP_PENDING, /* Being added to bitmap, don't add + * to batch yet. + */ }; #define STRIPE_EXPAND_SYNC_FLAGS \
Attachment:
pgplrXCNZYoDK.pgp
Description: OpenPGP digital signature