On Wed, 12 Sep 2012 12:09:53 +0800 Shaohua Li <shli@xxxxxxxxxx> wrote: > On Tue, Sep 11, 2012 at 02:10:08PM +1000, NeilBrown wrote: > > > + bi = orig_bi; > > > + if (bi->bi_rw & REQ_DISCARD) { > > > + dd_idx++; > > > + while (dd_idx == sh->pd_idx || dd_idx == sh->qd_idx) > > > + dd_idx++; > > > + if (dd_idx < sh->disks) > > > + goto again; > > > + } > > > > I'm afraid there there is something else here that I can't make my self happy > > with. > > > > You added a new "goto again" loop inside add_stripe_bio, and to compensate > > the increase logical_sector in make_request so that it doesn't call > > add_stripe_bio so many times. > > This means that to control flow between make_request and add_stripe_bio is > > very different depending on whether it is a discard or not. That make the > > code harder to understand and easier to break later. > > > > I think it would to create a completely separate "make_trim_request()" which > > handles the trim/discard case, and leave the current code as it is. > > > > If/when you do send a patch to do that, please also resend the other raid5 > > patch which comes after this one. I tend not to keep patches once I've > > devices not to apply them immediately. It also reduces the chance of > > confusion if you just send whatever you want me to apply. > > I'm afraid there will be a lot of duplicate code doing this way. How about > below patch? I made it clearer for discard. If you like it, I'll send other > patches. I don't think there will be that much duplicate code. And even if there is some, it is worth it to get a clearer control flow. I tried writing something and realised that your code is (I think) racy. You really need the whole stripe to be either all-discard, or no-discards. So you need to hold the stripe_lock while adding the discard bios to every device. Your code doesn't do that. Here is what I came up with. It only addresses the 'make_request' part of the patch and isn't even compile tested but it should show the general shape of the solution, and show how little code is duplicated. NeilBrown diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c index 7031b86..ca80290 100644 --- a/drivers/md/raid5.c +++ b/drivers/md/raid5.c @@ -4071,6 +4071,73 @@ static void release_stripe_plug(struct mddev *mddev, release_stripe(sh); } +static void make_discard_request(struct mddev *mddev, struct bio *bi) +{ + sector_t logical_sector, last_sector; + int stripe_sectors; + struct r5conf *conf = mddev->private; + + if (mddev->reshape_position != MaxSector) + /* Skip discard while reshape is happening */ + return; + + logical_sector = bi->bi_sector & ~((sector_t)STRIPE_SECTORS-1); + last_sector = bi->bi_sector + (bi->bi_size>>9); + + bi->bi_next = NULL; + bi->bi_phys_segments = 1; /* over-loaded to count active stripes */ + + stripe_sectors = conf->chunk_sectors * + (conf->raid_disks - conf->max_degraded); + logical_sector = DIV_ROUND_UP_SECTOR_T(logical_sector, + stripe_sectors); + sector_div(last_sector, stripe_sectors); + + for (;logical_sector < last_sector; + logical_sector += STRIPE_SECTORS) { + sh = get_active_stripe(conf, logical_sector, previous, + 0, 0); + again: + prepare_to_wait(&conf->wait_for_overlap, &w, + TASK_UNINTERRUPTIBLE); + spin_lock_irq(&sh->stripe_lock); + for (d = 0; d < conf->raid_disks; d++) { + if (sh->dev[d].towrite || + sh->dev[d].toread) { + set_bit(R5_Overwrite, &sh->dev[d].flags); + spin_unlock_irq(&sh->stripe_lock); + schedule(); + goto again; + } + } + finish_wait(&conf->wait_for_overlap, &w); + for (d = 0; d < conf->raid_disks; d++) + if (d != conf->pd_idx && + d != conf->qd_idx) { + sh->dev[d].towrite = bi; + set_bit(R5_OVERWRITE, &sh->dev[d].flags); + } + spin_unlock_irq(&sh->stripe_lock); + if (conf->mddev->bitmap) { + for (d = 0; d < conf->raid_disks - conf->max_degraded; + d++) + bitmap_startwrite(mddev->bitmap, + sh->sector, + STRIPE_SECTORS, + 0); + sh->bm_seq = conf->seq_flush + 1; + set_bit(STRIPE_BIT_DELAY, &sh->state); + } + + set_bit(STRIPE_HANDLE, &sh->state); + clear_bit(STRIPE_DELAYED, &sh->state); + if (!test_and_set_bit(STRIPE_PREREAD_ACTIVE, + &sh->state)) + atomic_inc(&conf->preread_active_stripes); + release_stripe_plug(mddev, sh); + } +} + static void make_request(struct mddev *mddev, struct bio * bi) { struct r5conf *conf = mddev->private; @@ -4093,6 +4160,11 @@ static void make_request(struct mddev *mddev, struct bio * bi) chunk_aligned_read(mddev,bi)) return; + if (unlikely(bi->bi_rw & REQ_DISCARD)) { + make_discard_request(mddev, bi); + return; + } + logical_sector = bi->bi_sector & ~((sector_t)STRIPE_SECTORS-1); last_sector = bi->bi_sector + (bi->bi_size>>9); bi->bi_next = NULL;
Attachment:
signature.asc
Description: PGP signature