Re: An old "write-mostly" read balance issue

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sun, 08 Feb 2015 18:40:07 +0300 Dark Penguin <darkpenguin@xxxxxxxxx> wrote:

> There is an old issue about RAID1 read-balancing when "write-mostly" 
> disks are present.
> 
> The problem is, according to the manual, "md driver will avoid reading 
> from these devices if at all possible".
> 
> One way to understand this statement is that these drives will never be 
> read from, except when the main drive can not be read from. There are A 
> LOT of situations when this is the expected and desired behaviour:
> - People mirroring an SSD with an HDD and suffering a performance loss;
> - People mirroring a fast HDD with a slow HDD for reliability, for 
> example, mirroring a 300Gb WD Raptor to a 300Gb partition on a 3Tb 5900 
> "green" drive for backup; since the larger drive may be used for 
> something other than this RAID, many would prefer it to be spared the 
> workload.
> - In my case, I have a home RAID1 storage, which is idle 95% of the 
> time, and 95% of the remaining 5% I only read from it. So I want one of 
> the drives to spin down and never turn on, in order to avoid wearing 
> down the mechanics. They say, "The best way to keep a device from 
> breaking is to turn it off and not use it". :) But even if I simply 
> retrieve the contents of my volume, that request is apparently enough to 
> load the first drive to 100% for a split second, which causes the second 
> drive to spin up, which is extremely undesirable.
> 
> I've spent a lot of time looking for the answer "why does it spin up", 
> and "normal forum users" couldn not even help me, but then I found out 
> that there is another way to read that statement: apparently, there are 
> other people who would like to see whatever little benefit reading from 
> the second drive could give them. I can not say which side is a 
> majority, but I respect their wishes as well, and personally I'm fine 
> with any default behaviour as long as I have what I need.
> 
> I've found a patch for that:
> http://marc.info/?l=linux-raid&m=135982797322422
> Apparently, it can be used with any kernel, but I'm not good enough to 
> make sure nothing's broken everytime I upgrade the kernel, and frankly, 
> I think there are A LOT of people who wish to see the behaviour I would 
> expect. So my plea is for the developers to accept this patch and make 
> this behaviour optional, if not default. At least give us a compile 
> option to build the kernel this way! There are people out there who use 
> RAID1 at home and not in production, and therefore care less about 
> performance than home storage idling, and who understand the words "if 
> at all possible" in the more obvious way! I think that's the whole 
> reason why the "write-mostly" option is there in the first place, but if 
> there are people who don't agree with me - I'm not going to argue, they 
> can have it their way, just give us the option to do what we want, too!
> 
> 

Hi,
 thanks for reporting this.  It is definitely a bug.  It was introduced by 

commit 9dedf60313fa4dddfd5b9b226a0ef12a512bf9dc
    md/raid1: read balance chooses idlest disk for SSD


 I don't recall seeing the patch from Tomas Hodek which you provided a link
for  - sorry Tomas.

I prefer the second of the two patches.  I will submit the following to Linus
some time this week.

Thanks for pursuing this Dark Penguin.

NeilBrown

From: Tomas Hodek <tomas.hodek@xxxxxxxx>
Date: Mon, 23 Feb 2015 11:00:38 +1100
Subject: [PATCH] Subject: md/raid1: fix read balance when a drive is
 write-mostly.

When a drive is marked write-mostly it should only be the
target of reads if there is no other option.

This behaviour was broken by

commit 9dedf60313fa4dddfd5b9b226a0ef12a512bf9dc
    md/raid1: read balance chooses idlest disk for SSD

which causes a write-mostly device to be *preferred* is some cases.

Restore correct behaviour by checking and setting
best_dist_disk and best_pending_disk rather than best_disk.

We only need to test one of these as they are both changed
from -1 or >=0 at the same time.

As we leave min_pending and best_dist unchanged, any non-write-mostly
device will appear better than the write-mostly device.

Reported-by: tomas.hodek@xxxxxxxx
Reported-by: Dark Penguin <darkpenguin@xxxxxxxxx>
Signed-off-by: NeilBrown <neilb@xxxxxxx>
Link: http://marc.info/?l=linux-raid&m=135982797322422
Fixes: 9dedf60313fa4dddfd5b9b226a0ef12a512bf9dc
Cc: stable@xxxxxxxxxxxxxxx (3.6+)

diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
index 0b6349f9c5c5..7742e0999bf2 100644
--- a/drivers/md/raid1.c
+++ b/drivers/md/raid1.c
@@ -560,7 +560,7 @@ static int read_balance(struct r1conf *conf, struct r1bio *r1_bio, int *max_sect
 		if (test_bit(WriteMostly, &rdev->flags)) {
 			/* Don't balance among write-mostly, just
 			 * use the first as a last resort */
-			if (best_disk < 0) {
+			if (best_dist_disk < 0) {
 				if (is_badblock(rdev, this_sector, sectors,
 						&first_bad, &bad_sectors)) {
 					if (first_bad < this_sector)
@@ -569,7 +569,8 @@ static int read_balance(struct r1conf *conf, struct r1bio *r1_bio, int *max_sect
 					best_good_sectors = first_bad - this_sector;
 				} else
 					best_good_sectors = sectors;
-				best_disk = disk;
+				best_dist_disk = disk;
+				best_pending_disk = disk;
 			}
 			continue;
 		}

Attachment: pgpNgolVNaHDf.pgp
Description: OpenPGP digital signature


[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux