[RFC PATCH NOT FOR INCLUSION] Concerns about logic of raid1.c::fix_sync_read_error()

Alex Lyakas <alex@xxxxxxxxxx> · Tue, 9 Jul 2024 12:46:55 +0300

fix_sync_read_error() is called when a SYNC read (resync, repair or check) failes to read from any devices.
This funciton retries the reads starting from "read_disk" until it suceeds, otherwise it aborts.
If it succeeds to read from a particular device, it attempts to write the same data backwards until it returns to "read_disk".
Then it retries the reads backwards, starting from the device that it succeeded to read from, till "read_disk".
On any failure of these reads/writes, assuming bad blocks feature is disabled, md_error() is called on a particular rdev,
which will kick it out the array, and set MD_RECOVERY_INTR, which will eventually abort the resync/check/repair.

The issue that I see, is that this function is oriented on the "read_disk" to not fail.
I have marked in the patch some suspicious lines.
This function reads and writes always from/into pages of the "read_disk" bio, and eventually it marks the "read_disk"
bio as "success".

Consider the following scenario:
- We have RAID1 going through "repair" with drives A, B; all were selected for reading in raid1_sync_request().
- A was marked as "read_disk".
- Reads from both drives failed, so fix_sync_read_error() is called.
- fix_sync_read_error() retries the reads starting from A, but A still fails the read, whereas B succeeds.
- Now the code writes the same data back to A, which let's assume fails.
- md_error() is called, A is marked as Faulty, MD_RECOVERY_INTR is set
- r1_bio->bios[A]->bi_end_io = NULL;
- Now the code skips reading data from A, because bi_end_io is set to NULL.
- bio->bi_status = 0; // this marks the A's bios as success, but bi_end_io is NULL on it
- fix_sync_read_error() returns success
- process_checks() is called
- process_checks() executes the loop to find "primary", but no primary can be selected, because:
  - A has bi_end_io set to NULL
  - B has bad bi_status
- As a result, "primary" will be set to out of bounds, and r1_bio->bios[primary] will access some invalid memory location.

Can you please review the scenario above, and please let me know whether I am missing something?
---
 drivers/md/raid1.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
index 7b8a71c..2eb1d0b 100644
--- a/drivers/md/raid1.c
+++ b/drivers/md/raid1.c
@@ -2119,10 +2119,11 @@ static int fix_sync_read_error(struct r1bio *r1_bio)
 	 */
 	struct mddev *mddev = r1_bio->mddev;
 	struct r1conf *conf = mddev->private;
 	struct bio *bio = r1_bio->bios[r1_bio->read_disk];
 	struct page **pages = get_resync_pages(bio)->pages;
+	===> these pages belong to the bio on "read_disk"
 	sector_t sect = r1_bio->sector;
 	int sectors = r1_bio->sectors;
 	int idx = 0;
 	struct md_rdev *rdev;
 
@@ -2153,10 +2154,11 @@ static int fix_sync_read_error(struct r1bio *r1_bio)
 				 * active, and resync is currently active
 				 */
 				rdev = conf->mirrors[d].rdev;
 				if (sync_page_io(rdev, sect, s<<9,
 						 pages[idx],
+						 ===> we are reading into pages of "read_disk"
 						 REQ_OP_READ, false)) {
 					success = 1;
 					break;
 				}
 			}
@@ -2206,10 +2208,11 @@ static int fix_sync_read_error(struct r1bio *r1_bio)
 			if (r1_bio->bios[d]->bi_end_io != end_sync_read)
 				continue;
 			rdev = conf->mirrors[d].rdev;
 			if (r1_sync_page_io(rdev, sect, s,
 					    pages[idx],
+					    ===> we are writing pages from "read_disk"
 					    REQ_OP_WRITE) == 0) {
 				r1_bio->bios[d]->bi_end_io = NULL;
 				rdev_dec_pending(rdev, mddev);
 			}
 		}
@@ -2221,19 +2224,21 @@ static int fix_sync_read_error(struct r1bio *r1_bio)
 			if (r1_bio->bios[d]->bi_end_io != end_sync_read)
 				continue;
 			rdev = conf->mirrors[d].rdev;
 			if (r1_sync_page_io(rdev, sect, s,
 					    pages[idx],
+					    ===> we are reading into pages of "read_disk"
 					    REQ_OP_READ) != 0)
 				atomic_add(s, &rdev->corrected_errors);
 		}
 		sectors -= s;
 		sect += s;
 		idx ++;
 	}
 	set_bit(R1BIO_Uptodate, &r1_bio->state);
 	bio->bi_status = 0;
+	==> This bio belongs to "read_disk", but the appropriate rdev might have failed to "fix" the error
 	return 1;
 }
 
 static void process_checks(struct r1bio *r1_bio)
 {
-- 
1.9.1