Re: [patch 6/7 v2] MD: raid5 trim support

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, 12 Sep 2012 12:09:53 +0800 Shaohua Li <shli@xxxxxxxxxx> wrote:

> On Tue, Sep 11, 2012 at 02:10:08PM +1000, NeilBrown wrote:
> > > +	bi = orig_bi;
> > > +	if (bi->bi_rw & REQ_DISCARD) {
> > > +		dd_idx++;
> > > +		while (dd_idx == sh->pd_idx || dd_idx == sh->qd_idx)
> > > +			dd_idx++;
> > > +		if (dd_idx < sh->disks)
> > > +			goto again;
> > > +	}
> > 
> > I'm afraid there there is something else here that I can't make my self happy
> > with.
> > 
> > You added a new "goto again" loop inside add_stripe_bio, and to compensate
> > the increase logical_sector in make_request so that it doesn't call
> > add_stripe_bio so many times.
> > This means that to control flow between make_request and add_stripe_bio is
> > very different depending on whether it is a discard or not.  That make the
> > code harder to understand and easier to break later.
> > 
> > I think it would to create a completely separate "make_trim_request()" which
> > handles the trim/discard case, and leave the current code as it is.
> > 
> > If/when you do send a patch to do that, please also resend the other raid5
> > patch which comes after this one.  I tend not to keep patches once I've
> > devices not to apply them immediately.  It also reduces the chance of
> > confusion if you just send whatever you want me to apply.
> 
> I'm afraid there will be a lot of duplicate code doing this way. How about
> below patch? I made it clearer for discard. If you like it, I'll send other
> patches.

I don't think there will be that much duplicate code.  And even if there is
some, it is worth it to get a clearer control flow.

I tried writing something and realised that your code is (I think) racy.
You really need the whole stripe to be either all-discard, or no-discards.
So you need to hold the stripe_lock while adding the discard bios to every
device.  Your code doesn't do that.

Here is what I came up with.  It only addresses the 'make_request' part of
the patch and isn't even compile tested but it should show the general shape
of the solution, and show how little code is duplicated.

NeilBrown

diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index 7031b86..ca80290 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -4071,6 +4071,73 @@ static void release_stripe_plug(struct mddev *mddev,
 		release_stripe(sh);
 }
 
+static void make_discard_request(struct mddev *mddev, struct bio *bi)
+{
+	sector_t logical_sector, last_sector;
+	int stripe_sectors;
+	struct r5conf *conf = mddev->private;
+
+	if (mddev->reshape_position != MaxSector)
+		/* Skip discard while reshape is happening */
+		return;
+
+	logical_sector = bi->bi_sector & ~((sector_t)STRIPE_SECTORS-1);
+	last_sector = bi->bi_sector + (bi->bi_size>>9);
+
+	bi->bi_next = NULL;
+	bi->bi_phys_segments = 1;	/* over-loaded to count active stripes */
+
+	stripe_sectors = conf->chunk_sectors *
+		(conf->raid_disks - conf->max_degraded);
+	logical_sector = DIV_ROUND_UP_SECTOR_T(logical_sector,
+					       stripe_sectors);
+	sector_div(last_sector, stripe_sectors);
+
+	for (;logical_sector < last_sector;
+	     logical_sector += STRIPE_SECTORS) {
+		sh = get_active_stripe(conf, logical_sector, previous,
+				       0, 0);
+	again:
+		prepare_to_wait(&conf->wait_for_overlap, &w,
+				TASK_UNINTERRUPTIBLE);
+		spin_lock_irq(&sh->stripe_lock);
+		for (d = 0; d < conf->raid_disks; d++) {
+			if (sh->dev[d].towrite ||
+			    sh->dev[d].toread) {
+				set_bit(R5_Overwrite, &sh->dev[d].flags);
+				spin_unlock_irq(&sh->stripe_lock);
+				schedule();
+				goto again;
+			}
+		}
+		finish_wait(&conf->wait_for_overlap, &w);
+		for (d = 0; d < conf->raid_disks; d++)
+			if (d != conf->pd_idx &&
+			    d != conf->qd_idx) {
+				sh->dev[d].towrite = bi;
+				set_bit(R5_OVERWRITE, &sh->dev[d].flags);
+			}
+		spin_unlock_irq(&sh->stripe_lock);
+		if (conf->mddev->bitmap) {
+			for (d = 0; d < conf->raid_disks - conf->max_degraded;
+			     d++)
+				bitmap_startwrite(mddev->bitmap,
+						  sh->sector,
+						  STRIPE_SECTORS,
+						  0);
+			sh->bm_seq = conf->seq_flush + 1;
+			set_bit(STRIPE_BIT_DELAY, &sh->state);
+		}
+
+		set_bit(STRIPE_HANDLE, &sh->state);
+		clear_bit(STRIPE_DELAYED, &sh->state);
+		if (!test_and_set_bit(STRIPE_PREREAD_ACTIVE,
+				      &sh->state))
+			atomic_inc(&conf->preread_active_stripes);
+		release_stripe_plug(mddev, sh);
+	}
+}
+
 static void make_request(struct mddev *mddev, struct bio * bi)
 {
 	struct r5conf *conf = mddev->private;
@@ -4093,6 +4160,11 @@ static void make_request(struct mddev *mddev, struct bio * bi)
 	     chunk_aligned_read(mddev,bi))
 		return;
 
+	if (unlikely(bi->bi_rw & REQ_DISCARD)) {
+		make_discard_request(mddev, bi);
+		return;
+	}
+
 	logical_sector = bi->bi_sector & ~((sector_t)STRIPE_SECTORS-1);
 	last_sector = bi->bi_sector + (bi->bi_size>>9);
 	bi->bi_next = NULL;

Attachment: signature.asc
Description: PGP signature


[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux