Re: read errors not corrected when doing check on RAID6

NeilBrown <neilb@xxxxxxx> · Thu, 16 Jan 2014 09:59:00 +1100

On Sun, 12 Jan 2014 07:41:14 +0100 (CET) Mikael Abrahamsson
<swmike@xxxxxxxxx> wrote:

> 
> Hello.
> 
> Running Debian with 3.11-0.bpo.2-amd64 with mdadm 3.2.5 (later upgraded to 
> 3.3 to do the replace function described).
> 
> /dev/sds on the system had 9 pending sectors according to smartctl. I 
> issued a check and got several read during the check. I started the check 
> by taking what debian has in its crontab:
> 
> /usr/share/mdadm/checkarray --cron --all --idle --quiet
> 
> This resulted in this in dmesg (there are two arrays), I don't know why it 
> didn't do the RAID conf printout for md2.
> 
> [  569.034227] RAID conf printout:
> [  569.034233]  --- level:6 rd:10 wd:10
> [  569.034236]  disk 0, o:1, dev:sdn
> [  569.034238]  disk 1, o:1, dev:sdh
> [  569.034241]  disk 2, o:1, dev:sdj
> [  569.034243]  disk 3, o:1, dev:sdd
> [  569.034245]  disk 4, o:1, dev:sds
> [  569.034248]  disk 5, o:1, dev:sdv
> [  569.034250]  disk 6, o:1, dev:sdb
> [  569.034252]  disk 7, o:1, dev:sdc
> [  569.034254]  disk 8, o:1, dev:sdf
> [  569.034257]  disk 9, o:1, dev:sde
> [  569.034292] md: data-check of RAID array md0
> [  569.034337] md: minimum _guaranteed_  speed: 1000 KB/sec/disk.
> [  569.034384] md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for data-check.
> [  569.034440] md: using 128k window, over a total of 1953512960k.
> [  569.520326] md: data-check of RAID array md2
> [  569.520379] md: minimum _guaranteed_  speed: 1000 KB/sec/disk.
> [  569.520427] md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for data-check.
> [  569.520488] md: using 128k window, over a total of 2930265408k.
> [  662.413059] perf samples too long (2504 > 2500), lowering kernel.perf_event_max_sample_rate to 50000
> [  714.577123] ata13.01: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
> [  714.577187] ata13.01: failed command: READ DMA EXT
> [  714.577245] ata13.01: cmd 25/00:00:58:3a:7c/00:04:00:00:00/f0 tag 0 dma 524288 in
> [  714.577245]          res 51/40:28:30:3c:7c/40:02:00:00:00/f0 Emask 0x9 (media error)
> [  714.577330] ata13.01: status: { DRDY ERR }
> [  714.577381] ata13.01: error: { UNC }
> [  714.796272] ata13.00: configured for UDMA/133
> [  714.804347] ata13.01: configured for UDMA/133
> [  714.804767] sd 13:0:1:0: [sds] Unhandled sense code
> [  714.804887] sd 13:0:1:0: [sds]
> [  714.804988] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
> [  714.805094] sd 13:0:1:0: [sds]
> [  714.805194] Sense Key : Medium Error [current] [descriptor]
> [  714.805491] Descriptor sense data with sense descriptors (in hex):
> [  714.805665]         72 03 11 04 00 00 00 0c 00 0a 80 00 00 00 00 00
> [  714.806845]         00 7c 3c 30
> [  714.807257] sd 13:0:1:0: [sds]
> [  714.807357] Add. Sense: Unrecovered read error - auto reallocate failed
> [  714.807531] sd 13:0:1:0: [sds] CDB:
> [  714.807632] Read(10): 28 00 00 7c 3a 58 00 04 00 00
> [  714.808494] end_request: I/O error, dev sds, sector 8141872
> [  714.808638] ata13: EH complete
> 
> There were even more errors (not included), but md didn't seem to correct 
> them. After the check, the number of pending sectors on /dev/sds was still 
> 9.
> 
> The reason to upgrade fmor kernel 3.2 to 3.11 was to gain the use of the 
> replace function. I then issued:
> 
> mdadm /dev/md0 --replace /dev/sds
> 
> dmesg:
> 
> [74791.213852] RAID conf printout:
> [74791.213858]  --- level:6 rd:10 wd:10
> [74791.213861]  disk 0, o:1, dev:sdn
> [74791.213864]  disk 1, o:1, dev:sdh
> [74791.213866]  disk 2, o:1, dev:sdj
> [74791.213869]  disk 3, o:1, dev:sdd
> [74791.213871]  disk 4, o:1, dev:sds
> [74791.213874]  disk 5, o:1, dev:sdv
> [74791.213876]  disk 6, o:1, dev:sdb
> [74791.213879]  disk 7, o:1, dev:sdc
> [74791.213881]  disk 8, o:1, dev:sdf
> [74791.213884]  disk 9, o:1, dev:sde
> [74791.252331] md: recovery of RAID array md0
> [74791.252371] md: minimum _guaranteed_  speed: 1000 KB/sec/disk.
> [74791.252405] md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for recovery.
> [74791.252466] md: using 128k window, over a total of 1953512960k.
> [74872.590157] ata13.01: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
> [74872.590193] ata13.01: failed command: READ DMA EXT
> [74872.590226] ata13.01: cmd 25/00:00:38:fd:7b/00:04:00:00:00/f0 tag 0 dma 524288 in
> [74872.590226]          res 51/40:48:f0:fe:7b/40:02:00:00:00/f0 Emask 0x9 (media error)
> [74872.590315] ata13.01: status: { DRDY ERR }
> [74872.590341] ata13.01: error: { UNC }
> [74872.820239] ata13.00: configured for UDMA/133
> [74872.828258] ata13.01: configured for UDMA/133
> [74872.828571] sd 13:0:1:0: [sds] Unhandled sense code
> [74872.828601] sd 13:0:1:0: [sds]
> [74872.828624] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
> [74872.828652] sd 13:0:1:0: [sds]
> [74872.828675] Sense Key : Medium Error [current] [descriptor]
> [74872.828710] Descriptor sense data with sense descriptors (in hex):
> [74872.828753]         72 03 11 04 00 00 00 0c 00 0a 80 00 00 00 00 00
> [74872.828816]         00 7b fe f0
> [74872.828850] sd 13:0:1:0: [sds]
> [74872.828874] Add. Sense: Unrecovered read error - auto reallocate failed
> [74872.828904] sd 13:0:1:0: [sds] CDB:
> [74872.828928] Read(10): 28 00 00 7b fd 38 00 04 00 00
> [74872.828979] end_request: I/O error, dev sds, sector 8126192
> [74872.829055] ata13: EH complete
> [74877.699161] ata13.01: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
> [74877.699197] ata13.01: failed command: READ DMA
> [74877.699228] ata13.01: cmd c8/00:08:f0:fe:7b/00:00:00:00:00/f0 tag 0 dma 4096 in
> [74877.699228]          res 51/40:08:f0:fe:7b/40:02:00:00:00/f0 Emask 0x9 (media error)
> [74877.699317] ata13.01: status: { DRDY ERR }
> [74877.699342] ata13.01: error: { UNC }
> [74877.928227] ata13.00: configured for UDMA/133
> [74877.936278] ata13.01: configured for UDMA/133
> [74877.936317] sd 13:0:1:0: [sds] Unhandled sense code
> [74877.936343] sd 13:0:1:0: [sds]
> [74877.936367] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
> [74877.936395] sd 13:0:1:0: [sds]
> [74877.936417] Sense Key : Medium Error [current] [descriptor]
> [74877.936450] Descriptor sense data with sense descriptors (in hex):
> [74877.936479]         72 03 11 04 00 00 00 0c 00 0a 80 00 00 00 00 00
> [74877.936541]         00 7b fe f0
> [74877.936573] sd 13:0:1:0: [sds]
> [74877.936597] Add. Sense: Unrecovered read error - auto reallocate failed
> [74877.936628] sd 13:0:1:0: [sds] CDB:
> [74877.936651] Read(10): 28 00 00 7b fe f0 00 00 08 00
> [74877.936701] end_request: I/O error, dev sds, sector 8126192
> [74877.936730] end_request: I/O error, dev sds, sector 8126192
> [74877.936771] ata13: EH complete
> [74884.058379] ata13.01: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
> [74884.058416] ata13.01: failed command: READ DMA EXT
> [74884.058447] ata13.01: cmd 25/00:00:38:39:7c/00:04:00:00:00/f0 tag 0 dma 524288 in
> [74884.058447]          res 51/40:08:30:3c:7c/40:01:00:00:00/f0 Emask 0x9 (media error)
> [74884.058535] ata13.01: status: { DRDY ERR }
> [74884.058561] ata13.01: error: { UNC }
> [74884.288226] ata13.00: configured for UDMA/133
> [74884.296277] ata13.01: configured for UDMA/133
> [74884.296570] sd 13:0:1:0: [sds] Unhandled sense code
> [74884.296598] sd 13:0:1:0: [sds]
> [74884.296622] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
> [74884.296650] sd 13:0:1:0: [sds]
> [74884.296673] Sense Key : Medium Error [current] [descriptor]
> [74884.296708] Descriptor sense data with sense descriptors (in hex):
> [74884.296739]         72 03 11 04 00 00 00 0c 00 0a 80 00 00 00 00 00
> [74884.296805]         00 7c 3c 30
> [74884.296838] sd 13:0:1:0: [sds]
> [74884.296862] Add. Sense: Unrecovered read error - auto reallocate failed
> [74884.296892] sd 13:0:1:0: [sds] CDB:
> [74884.296916] Read(10): 28 00 00 7c 39 38 00 04 00 00
> [74884.296967] end_request: I/O error, dev sds, sector 8141872
> [74884.297020] ata13: EH complete
> [74889.334006] ata13.01: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0
> [74889.334041] ata13.01: failed command: READ DMA
> [74889.334072] ata13.01: cmd c8/00:08:30:3c:7c/00:00:00:00:00/f0 tag 0 dma 4096 in
> [74889.334072]          res 51/40:08:30:3c:7c/40:01:00:00:00/f0 Emask 0x9 (media error)
> [74889.334159] ata13.01: status: { DRDY ERR }
> [74889.334184] ata13.01: error: { UNC }
> [74889.564227] ata13.00: configured for UDMA/133
> [74889.572286] ata13.01: configured for UDMA/133
> [74889.572322] sd 13:0:1:0: [sds] Unhandled sense code
> [74889.572349] sd 13:0:1:0: [sds]
> [74889.572372] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
> [74889.572399] sd 13:0:1:0: [sds]
> [74889.572422] Sense Key : Medium Error [current] [descriptor]
> [74889.572455] Descriptor sense data with sense descriptors (in hex):
> [74889.572484]         72 03 11 04 00 00 00 0c 00 0a 80 00 00 00 00 00
> [74889.572546]         00 7c 3c 30
> [74889.572579] sd 13:0:1:0: [sds]
> [74889.572603] Add. Sense: Unrecovered read error - auto reallocate failed
> [74889.572633] sd 13:0:1:0: [sds] CDB:
> [74889.572657] Read(10): 28 00 00 7c 3c 30 00 00 08 00
> [74889.572708] end_request: I/O error, dev sds, sector 8141872
> [74889.572737] end_request: I/O error, dev sds, sector 8141872
> [74889.572774] ata13: EH complete
> [74891.334029] md/raid:md0: read error corrected (8 sectors at 8126192 on sds)
> [74891.353112] md/raid:md0: read error corrected (8 sectors at 8141872 on sds)
> 
> So here now in the replace function, md decided to correct the read errors 
> on /dev/sds. Now the pending sectors are down to 7 on /dev/sds. There were 
> some more errors which were also corrected, and now the pending count is 
> down to 5. Recovery is still ongoing.
> 
> So my question is, what did I do wrong the first time when doing the 
> check? I expected it to correct the read errors on the drive then, but it 
> seems it didn't. Should I have issued a "repair" instead?
> 

Don't be too hasty in assuming you did something wrong.  When working with
computers, it is usually the computer that did something wrongly :-)

A 'check' certainly should correct read errors.  The code for handling read
errors makes no distinction between 'check' or 'repair' or regular IO.

When md/raid[56] gets a read error, the first thing it does is retry the read
with REQ_FLUSH (or in later kernels: REQ_NOMERGE) set.  This ensure that each
1-page read goes down to the drive as a stand-alone request so we know with
more precision which sector had the error (usually multiple blocks are merged
so an error might mean "one block in a 128K region near here failed").
If this second read fails, then that required data is generated from
elsewhere and a write is scheduled.

My only guess is that in your case the second read succeeded when you were
doing a 'check', but it failed when you were doing a 'replace'.

To check this you can looked for "end_request: I/O error" in the kernel logs.
In the fragments you included there is just one error during the 'check', but
3 for each block during the 'replace'.  Given the timestamps, the 2nd and 3rd
are probably for the same request - some logging weirdness maybe.
Obviously you only provided fragments so we can't read too much into what we
see.  Looking at the complete logs could be more helpful.

NeilBrown
Attachment:
signature.asc

Description: PGP signature