Re: MD Raid1, ext4 and write same

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, 12 Dec 2012, Martin K. Petersen wrote:

> >>>>> "Joe" == Joe Lawrence <Joe.Lawrence@xxxxxxxxxxx> writes:
> 
> Joe> I can confirm the same issue with 3.7.0.  If anyone else is running
> Joe> with raid1 and disks that support write same, can you give this a
> Joe> try?
> 
> Your patch looks good to me (the do_same one). We'll need raid10.c and
> raid5.c to be fixed up in a similar fashion.
> 
> -- 
> Martin K. Petersen	Oracle Linux Engineering

Hi Martin,

I took a look at raid5 and I don't think it suffers from the same problem 
(ie, cloned write bios missing the flag).  A quick mkfs/mount test showed 
that the blkdev_issue_write_same() calls all succeeded anyway.

So I added the same logic to raid10 and it similarly passes my quick 
tests.  I don't know what else might create WRITE SAME cmds at the moment 
(I tried dd'ing a bunch of zeros and that didn't seem to spawn any), so 
all I did was to mkfs/mount/fio/umount/fsck.  MD recovery seemed happy if 
did this with a degraded array and brought the partner in later.

One question I do have though, I'm not sure about any write bitmap 
implications of this.  I noticed in raid0_run you call:

  blk_queue_max_write_same_sectors(mddev->queue, mddev->chunk_sectors);

which should keep the acceptable LBA range inside a bitmap 'chunk'?  Am I 
right in understanding that this would keep any write same from ranging 
across bitmap bits?  In my testing, my MD chunksize was 512K but my SAS 
disks write_same_sectors was only 64K... so I think I inadvertently missed 
this necessary step.

Thanks,

-- Joe


>From c3ebb7a21850f1ff83c5498655e4f5a18aa883fd Mon Sep 17 00:00:00 2001
From: Joe Lawrence <joe.lawrence@xxxxxxxxxxx>
Date: Wed, 12 Dec 2012 17:03:40 -0500
Subject: [PATCH] md: raid1,10: Copy REQ_WRITE_SAME flag in cloned write
 bios

If the mddev's max_write_same_sectors are non-zero, the block layer may send
WRITE_SAME requests.  When cloning these bios in raid1,10 write cases, make
sure we add this flag to the new bios.

Signed-off-by: Joe Lawrence <joe.lawrence@xxxxxxxxxxx>
---
 drivers/md/raid1.c  | 4 +++-
 drivers/md/raid10.c | 7 +++++--
 2 files changed, 8 insertions(+), 3 deletions(-)

diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
index a0f7309..85aba6a 100644
--- a/drivers/md/raid1.c
+++ b/drivers/md/raid1.c
@@ -1001,6 +1001,7 @@ static void make_request(struct mddev *mddev, struct bio * bio)
 	const unsigned long do_flush_fua = (bio->bi_rw & (REQ_FLUSH | REQ_FUA));
 	const unsigned long do_discard = (bio->bi_rw
 					  & (REQ_DISCARD | REQ_SECURE));
+	const unsigned long do_same = (bio->bi_rw & REQ_WRITE_SAME);
 	struct md_rdev *blocked_rdev;
 	struct blk_plug_cb *cb;
 	struct raid1_plug_cb *plug = NULL;
@@ -1302,7 +1303,8 @@ read_again:
 				   conf->mirrors[i].rdev->data_offset);
 		mbio->bi_bdev = conf->mirrors[i].rdev->bdev;
 		mbio->bi_end_io	= raid1_end_write_request;
-		mbio->bi_rw = WRITE | do_flush_fua | do_sync | do_discard;
+		mbio->bi_rw =
+			WRITE | do_flush_fua | do_sync | do_discard | do_same;
 		mbio->bi_private = r1_bio;
 
 		atomic_inc(&r1_bio->remaining);
diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
index c9acbd7..fdb4a6e 100644
--- a/drivers/md/raid10.c
+++ b/drivers/md/raid10.c
@@ -1106,6 +1106,7 @@ static void make_request(struct mddev *mddev, struct bio * bio)
 	const unsigned long do_fua = (bio->bi_rw & REQ_FUA);
 	const unsigned long do_discard = (bio->bi_rw
 					  & (REQ_DISCARD | REQ_SECURE));
+	const unsigned long do_same = (bio->bi_rw & REQ_WRITE_SAME);
 	unsigned long flags;
 	struct md_rdev *blocked_rdev;
 	struct blk_plug_cb *cb;
@@ -1461,7 +1462,8 @@ retry_write:
 							      rdev));
 			mbio->bi_bdev = rdev->bdev;
 			mbio->bi_end_io	= raid10_end_write_request;
-			mbio->bi_rw = WRITE | do_sync | do_fua | do_discard;
+			mbio->bi_rw =
+				WRITE | do_sync | do_fua | do_discard | do_same;
 			mbio->bi_private = r10_bio;
 
 			atomic_inc(&r10_bio->remaining);
@@ -1503,7 +1505,8 @@ retry_write:
 						   r10_bio, rdev));
 			mbio->bi_bdev = rdev->bdev;
 			mbio->bi_end_io	= raid10_end_write_request;
-			mbio->bi_rw = WRITE | do_sync | do_fua | do_discard;
+			mbio->bi_rw =
+				WRITE | do_sync | do_fua | do_discard | do_same;
 			mbio->bi_private = r10_bio;
 
 			atomic_inc(&r10_bio->remaining);
-- 
1.7.11.7

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux