On Sun, 02 Feb 2014 16:24:48 +0400 Michael Tokarev <mjt@xxxxxxxxxx> wrote: > Hello. > > This is a followup for a somewhat old thread, -- > http://thread.gmane.org/gmane.linux.raid/44503 > with the required details. > > Initial problem was that with a raid1 array on a > few drives and one of them having a bad sector, > runnig `repair' action does not actually fix the > error, it looks like the raid1 code does not see > the error. > > This is a production host, so it is very difficult to > experiment. When I initially hit this issue there, I > tried various ways to fix the issue, one of them was > to removing the bad drive from the array and adding > it back. This forced all sectors to be re-written, > and the problem went away. > > Now, the same issue happened again - another drive > developed a bad sector, and again, md `repair' action > does not fix it. > > So I added some debugging as requested in the original > thread, and re-run `repair' action. > > Here's the changes I added to 3.10 raid1.c file: > > --- ../linux-3.10/drivers/md/raid1.c 2014-02-02 16:01:55.003119836 +0400 > +++ drivers/md/raid1.c 2014-02-02 16:07:37.913808336 +0400 > @@ -1636,6 +1636,8 @@ static void end_sync_read(struct bio *bi > */ > if (test_bit(BIO_UPTODATE, &bio->bi_flags)) > set_bit(R1BIO_Uptodate, &r1_bio->state); > +else > +printk("end_sync_read: !BIO_UPTODATE\n"); > > if (atomic_dec_and_test(&r1_bio->remaining)) > reschedule_retry(r1_bio); > @@ -1749,6 +1751,7 @@ static int fix_sync_read_error(struct r1 > * active, and resync is currently active > */ > rdev = conf->mirrors[d].rdev; > +printk("fix_sync_read_error: calling sync_page_io(%Li, %Li<<9)\n", (uint64_t)sect, (uint64_t)s); > if (sync_page_io(rdev, sect, s<<9, > bio->bi_io_vec[idx].bv_page, > READ, false)) { > @@ -1931,10 +1934,12 @@ static void sync_request_write(struct md > > bio = r1_bio->bios[r1_bio->read_disk]; > > - if (!test_bit(R1BIO_Uptodate, &r1_bio->state)) > + if (!test_bit(R1BIO_Uptodate, &r1_bio->state)) { > /* ouch - failed to read all of that. */ > +printk("sync_request_write: !R1BIO_Uptodate\n"); > if (!fix_sync_read_error(r1_bio)) > return; > + } > > if (test_bit(MD_RECOVERY_REQUESTED, &mddev->recovery)) > if (process_checks(r1_bio) < 0) > > > And here is the whole dmesg result from the repair run: > > [ 74.288227] md: requested-resync of RAID array md1 > [ 74.288719] md: minimum _guaranteed_ speed: 1000 KB/sec/disk. > [ 74.289329] md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for requested-resync. > [ 74.290404] md: using 128k window, over a total of 2096064k. > [ 179.213754] ata6.00: exception Emask 0x0 SAct 0x7fffffff SErr 0x0 action 0x0 > [ 179.214500] ata6.00: irq_stat 0x40000008 > [ 179.214909] ata6.00: failed command: READ FPDMA QUEUED > [ 179.215452] ata6.00: cmd 60/80:00:00:3e:3e/00:00:00:00:00/40 tag 0 ncq 65536 in > [ 179.215452] res 41/40:00:23:3e:3e/00:00:00:00:00/40 Emask 0x409 (media error) <F> > [ 179.217087] ata6.00: status: { DRDY ERR } > [ 179.217500] ata6.00: error: { UNC } > [ 179.220185] ata6.00: configured for UDMA/133 > [ 179.220656] sd 5:0:0:0: [sdd] Unhandled sense code > [ 179.221149] sd 5:0:0:0: [sdd] > [ 179.221476] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE > [ 179.222062] sd 5:0:0:0: [sdd] > [ 179.222398] Sense Key : Medium Error [current] [descriptor] > [ 179.223034] Descriptor sense data with sense descriptors (in hex): > [ 179.223704] 72 03 11 04 00 00 00 0c 00 0a 80 00 00 00 00 00 > [ 179.224726] 00 3e 3e 23 > [ 179.225169] sd 5:0:0:0: [sdd] > [ 179.225494] Add. Sense: Unrecovered read error - auto reallocate failed > [ 179.226215] sd 5:0:0:0: [sdd] CDB: > [ 179.226577] Read(10): 28 00 00 3e 3e 00 00 00 80 00 > [ 179.227344] end_request: I/O error, dev sdd, sector 4079139 > [ 179.227926] end_sync_read: !BIO_UPTODATE > [ 179.228359] ata6: EH complete > [ 181.926457] md: md1: requested-resync done. > > > So the only printk I've added is seen: "end_sync_read: !BIO_UPTODATE", and > it looks like rewriting code is never called. > > > This is root array of a production machine, so I can reboot it only at > late evenings or at weekends. But this time I finally want to fix the > bug for real, so I will not try to fix the erroneous drive until we will > be able to fix the code. Just one thing: the fixing process might be > a bit slow. > > Thanks, > > /mjt Hi, thanks for the testing and report. I see what the problem is now. We only try to fix a read error when all reads failed rather than when any read fails. Most of the time there is only one read so this makes no difference. However for 'check' and 'repair' we read all devices so the current code will only try to repair a read error if *every* device failed, which of course would be pointless. This patch (against v3.10) should fix it. It only leaves R1BIO_Uptodate set if no failure is seen. NeilBrown diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c index 6e17f8181c4b..ba38ef6c612b 100644 --- a/drivers/md/raid1.c +++ b/drivers/md/raid1.c @@ -1633,8 +1633,8 @@ static void end_sync_read(struct bio *bio, int error) * or re-read if the read failed. * We don't do much here, just schedule handling by raid1d */ - if (test_bit(BIO_UPTODATE, &bio->bi_flags)) - set_bit(R1BIO_Uptodate, &r1_bio->state); + if (!test_bit(BIO_UPTODATE, &bio->bi_flags)) + clear_bit(R1BIO_Uptodate, &r1_bio->state); if (atomic_dec_and_test(&r1_bio->remaining)) reschedule_retry(r1_bio); @@ -2609,6 +2609,7 @@ static sector_t sync_request(struct mddev *mddev, sector_t sector_nr, int *skipp /* For a user-requested sync, we read all readable devices and do a * compare */ + set_bit(R1BIO_UPTODATE, &r1_bio->state); if (test_bit(MD_RECOVERY_REQUESTED, &mddev->recovery)) { atomic_set(&r1_bio->remaining, read_targets); for (i = 0; i < conf->raid_disks * 2 && read_targets; i++) {
Attachment:
signature.asc
Description: PGP signature