[PATCH] fix write-mostly logic

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi, RAID developers!

I noticed that write-mostly logic in current kernel is broken.
It seems that read_balance() always chooses write-mostly disk when one exists,
unless other normal disk happens to have zero outstanding requests.


This patch fixes it - tested on 3.13.7 but should apply cleanly to git trunk.
BTW good_sectors logic looks broken too, but I couldn't figure out 
what that code is supposed to do, so no fix for that.

I think that the following commit broke it:
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/drivers/md/raid1.c?id=9dedf60313fa4dddfd5b9b226a0ef12a512bf9dc

"best" logic was split to "best_dist" and "best_pending" but no change
was made for write-mostly branch and for good_sectors too.

~
:wq
                                        With best regards, 
                                           Vladimir Savkin. 

--- linux-3.10.33/drivers/md/raid1.c.orig	2014-03-16 01:11:43.000000000 +0400
+++ linux-3.10.33/drivers/md/raid1.c	2014-03-16 01:23:31.000000000 +0400
@@ -498,6 +498,8 @@
 	int sectors;
 	int best_good_sectors;
 	int best_disk, best_dist_disk, best_pending_disk;
+	int writemostly_disk;
+	int writemostly_good_sectors;
 	int has_nonrot_disk;
 	int disk;
 	sector_t best_dist;
@@ -519,7 +521,9 @@
 	best_dist = MaxSector;
 	best_pending_disk = -1;
 	min_pending = UINT_MAX;
+	writemostly_disk = -1;
 	best_good_sectors = 0;
+	writemostly_good_sectors = 0;
 	has_nonrot_disk = 0;
 	choose_next_idle = 0;
 
@@ -548,16 +552,16 @@
 		if (test_bit(WriteMostly, &rdev->flags)) {
 			/* Don't balance among write-mostly, just
 			 * use the first as a last resort */
-			if (best_disk < 0) {
+			if (writemostly_disk < 0) {
 				if (is_badblock(rdev, this_sector, sectors,
 						&first_bad, &bad_sectors)) {
 					if (first_bad < this_sector)
 						/* Cannot use this */
 						continue;
-					best_good_sectors = first_bad - this_sector;
+					writemostly_good_sectors = first_bad - this_sector;
 				} else
-					best_good_sectors = sectors;
-				best_disk = disk;
+					writemostly_good_sectors = sectors;
+				writemostly_disk = disk;
 			}
 			continue;
 		}
@@ -664,6 +668,14 @@
 			best_disk = best_dist_disk;
 	}
 
+        /* 
+         * If there is still no good disk, try write-mostly.
+	 */
+	if ( (best_disk == -1) && (writemostly_disk >= 0) ) {
+		best_disk = writemostly_disk;
+		best_good_sectors = writemostly_good_sectors;
+	}
+	
 	if (best_disk >= 0) {
 		rdev = rcu_dereference(conf->mirrors[best_disk].rdev);
 		if (!rdev)

[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux