On Sun, 08 Feb 2015 18:40:07 +0300 Dark Penguin <darkpenguin@xxxxxxxxx> wrote: > There is an old issue about RAID1 read-balancing when "write-mostly" > disks are present. > > The problem is, according to the manual, "md driver will avoid reading > from these devices if at all possible". > > One way to understand this statement is that these drives will never be > read from, except when the main drive can not be read from. There are A > LOT of situations when this is the expected and desired behaviour: > - People mirroring an SSD with an HDD and suffering a performance loss; > - People mirroring a fast HDD with a slow HDD for reliability, for > example, mirroring a 300Gb WD Raptor to a 300Gb partition on a 3Tb 5900 > "green" drive for backup; since the larger drive may be used for > something other than this RAID, many would prefer it to be spared the > workload. > - In my case, I have a home RAID1 storage, which is idle 95% of the > time, and 95% of the remaining 5% I only read from it. So I want one of > the drives to spin down and never turn on, in order to avoid wearing > down the mechanics. They say, "The best way to keep a device from > breaking is to turn it off and not use it". :) But even if I simply > retrieve the contents of my volume, that request is apparently enough to > load the first drive to 100% for a split second, which causes the second > drive to spin up, which is extremely undesirable. > > I've spent a lot of time looking for the answer "why does it spin up", > and "normal forum users" couldn not even help me, but then I found out > that there is another way to read that statement: apparently, there are > other people who would like to see whatever little benefit reading from > the second drive could give them. I can not say which side is a > majority, but I respect their wishes as well, and personally I'm fine > with any default behaviour as long as I have what I need. > > I've found a patch for that: > http://marc.info/?l=linux-raid&m=135982797322422 > Apparently, it can be used with any kernel, but I'm not good enough to > make sure nothing's broken everytime I upgrade the kernel, and frankly, > I think there are A LOT of people who wish to see the behaviour I would > expect. So my plea is for the developers to accept this patch and make > this behaviour optional, if not default. At least give us a compile > option to build the kernel this way! There are people out there who use > RAID1 at home and not in production, and therefore care less about > performance than home storage idling, and who understand the words "if > at all possible" in the more obvious way! I think that's the whole > reason why the "write-mostly" option is there in the first place, but if > there are people who don't agree with me - I'm not going to argue, they > can have it their way, just give us the option to do what we want, too! > > Hi, thanks for reporting this. It is definitely a bug. It was introduced by commit 9dedf60313fa4dddfd5b9b226a0ef12a512bf9dc md/raid1: read balance chooses idlest disk for SSD I don't recall seeing the patch from Tomas Hodek which you provided a link for - sorry Tomas. I prefer the second of the two patches. I will submit the following to Linus some time this week. Thanks for pursuing this Dark Penguin. NeilBrown From: Tomas Hodek <tomas.hodek@xxxxxxxx> Date: Mon, 23 Feb 2015 11:00:38 +1100 Subject: [PATCH] Subject: md/raid1: fix read balance when a drive is write-mostly. When a drive is marked write-mostly it should only be the target of reads if there is no other option. This behaviour was broken by commit 9dedf60313fa4dddfd5b9b226a0ef12a512bf9dc md/raid1: read balance chooses idlest disk for SSD which causes a write-mostly device to be *preferred* is some cases. Restore correct behaviour by checking and setting best_dist_disk and best_pending_disk rather than best_disk. We only need to test one of these as they are both changed from -1 or >=0 at the same time. As we leave min_pending and best_dist unchanged, any non-write-mostly device will appear better than the write-mostly device. Reported-by: tomas.hodek@xxxxxxxx Reported-by: Dark Penguin <darkpenguin@xxxxxxxxx> Signed-off-by: NeilBrown <neilb@xxxxxxx> Link: http://marc.info/?l=linux-raid&m=135982797322422 Fixes: 9dedf60313fa4dddfd5b9b226a0ef12a512bf9dc Cc: stable@xxxxxxxxxxxxxxx (3.6+) diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c index 0b6349f9c5c5..7742e0999bf2 100644 --- a/drivers/md/raid1.c +++ b/drivers/md/raid1.c @@ -560,7 +560,7 @@ static int read_balance(struct r1conf *conf, struct r1bio *r1_bio, int *max_sect if (test_bit(WriteMostly, &rdev->flags)) { /* Don't balance among write-mostly, just * use the first as a last resort */ - if (best_disk < 0) { + if (best_dist_disk < 0) { if (is_badblock(rdev, this_sector, sectors, &first_bad, &bad_sectors)) { if (first_bad < this_sector) @@ -569,7 +569,8 @@ static int read_balance(struct r1conf *conf, struct r1bio *r1_bio, int *max_sect best_good_sectors = first_bad - this_sector; } else best_good_sectors = sectors; - best_disk = disk; + best_dist_disk = disk; + best_pending_disk = disk; } continue; }
Attachment:
pgpNgolVNaHDf.pgp
Description: OpenPGP digital signature