fix_sync_read_error() is called when a SYNC read (resync, repair or check) failes to read from any devices. This funciton retries the reads starting from "read_disk" until it suceeds, otherwise it aborts. If it succeeds to read from a particular device, it attempts to write the same data backwards until it returns to "read_disk". Then it retries the reads backwards, starting from the device that it succeeded to read from, till "read_disk". On any failure of these reads/writes, assuming bad blocks feature is disabled, md_error() is called on a particular rdev, which will kick it out the array, and set MD_RECOVERY_INTR, which will eventually abort the resync/check/repair. The issue that I see, is that this function is oriented on the "read_disk" to not fail. I have marked in the patch some suspicious lines. This function reads and writes always from/into pages of the "read_disk" bio, and eventually it marks the "read_disk" bio as "success". Consider the following scenario: - We have RAID1 going through "repair" with drives A, B; all were selected for reading in raid1_sync_request(). - A was marked as "read_disk". - Reads from both drives failed, so fix_sync_read_error() is called. - fix_sync_read_error() retries the reads starting from A, but A still fails the read, whereas B succeeds. - Now the code writes the same data back to A, which let's assume fails. - md_error() is called, A is marked as Faulty, MD_RECOVERY_INTR is set - r1_bio->bios[A]->bi_end_io = NULL; - Now the code skips reading data from A, because bi_end_io is set to NULL. - bio->bi_status = 0; // this marks the A's bios as success, but bi_end_io is NULL on it - fix_sync_read_error() returns success - process_checks() is called - process_checks() executes the loop to find "primary", but no primary can be selected, because: - A has bi_end_io set to NULL - B has bad bi_status - As a result, "primary" will be set to out of bounds, and r1_bio->bios[primary] will access some invalid memory location. Can you please review the scenario above, and please let me know whether I am missing something? --- drivers/md/raid1.c | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c index 7b8a71c..2eb1d0b 100644 --- a/drivers/md/raid1.c +++ b/drivers/md/raid1.c @@ -2119,10 +2119,11 @@ static int fix_sync_read_error(struct r1bio *r1_bio) */ struct mddev *mddev = r1_bio->mddev; struct r1conf *conf = mddev->private; struct bio *bio = r1_bio->bios[r1_bio->read_disk]; struct page **pages = get_resync_pages(bio)->pages; + ===> these pages belong to the bio on "read_disk" sector_t sect = r1_bio->sector; int sectors = r1_bio->sectors; int idx = 0; struct md_rdev *rdev; @@ -2153,10 +2154,11 @@ static int fix_sync_read_error(struct r1bio *r1_bio) * active, and resync is currently active */ rdev = conf->mirrors[d].rdev; if (sync_page_io(rdev, sect, s<<9, pages[idx], + ===> we are reading into pages of "read_disk" REQ_OP_READ, false)) { success = 1; break; } } @@ -2206,10 +2208,11 @@ static int fix_sync_read_error(struct r1bio *r1_bio) if (r1_bio->bios[d]->bi_end_io != end_sync_read) continue; rdev = conf->mirrors[d].rdev; if (r1_sync_page_io(rdev, sect, s, pages[idx], + ===> we are writing pages from "read_disk" REQ_OP_WRITE) == 0) { r1_bio->bios[d]->bi_end_io = NULL; rdev_dec_pending(rdev, mddev); } } @@ -2221,19 +2224,21 @@ static int fix_sync_read_error(struct r1bio *r1_bio) if (r1_bio->bios[d]->bi_end_io != end_sync_read) continue; rdev = conf->mirrors[d].rdev; if (r1_sync_page_io(rdev, sect, s, pages[idx], + ===> we are reading into pages of "read_disk" REQ_OP_READ) != 0) atomic_add(s, &rdev->corrected_errors); } sectors -= s; sect += s; idx ++; } set_bit(R1BIO_Uptodate, &r1_bio->state); bio->bi_status = 0; + ==> This bio belongs to "read_disk", but the appropriate rdev might have failed to "fix" the error return 1; } static void process_checks(struct r1bio *r1_bio) { -- 1.9.1