MD RAID hot-replace wants to rewrite to the source! (and fails, and kicks)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Dear Neil and other RAID developers,

I came across a problem of want_replacement not working as intended. Or at least not working as I think was intended.

The kernel is this, quite old
    3.4.34 (vanilla)
apologies if the behaviour has been fixed, but I don't recall to having seen this problem mentioned in patches or on the ML so I suppose this report is still valid.

Disk sdl in our storage server had shown one read error in smart check long and was apparently dying rapidly.
I issued want_replacement so to replicate sdl to the spare sdm.
It started correctly, like this:

    md54 : active raid5 sde1[0] sdh1[2] sdm1[5](R) sdf1[1] sdl1[6] sdi1[3]
11603730432 blocks super 1.2 level 5, 1024k chunk, algorithm 2 [5/5] [UUUUU] [>....................] recovery = 0.2% (6609916/2900932608) finish=598.6min speed=80573K/sec
            bitmap: 1/173 pages [4KB], 8192KB chunk

but then it met read errors in sdl (source) and instead of skipping those or rewriting those on just sdm, it tried to rewrite those (only or also) on sdl, i.e. the source. Sdl was dying and was not capable of accepting the rewrites, so it was kicked by MD, the array went to degraded state, and the process restarted (or maybe continued) with reconstruction with one failed disk.

So it went from this:

    md54 : active raid5 sde1[0] sdh1[2] sdm1[5](R) sdf1[1] sdl1[6] sdi1[3]
11603730432 blocks super 1.2 level 5, 1024k chunk, algorithm 2 [5/5] [UUUUU] [>....................] recovery = 0.3% (11584992/2900932608) finish=1279.9min speed=37621K/sec
            bitmap: 1/173 pages [4KB], 8192KB chunk

To this:

md54 : active raid5 sde1[0] sdh1[2] sdm1[5](R) sdf1[1] sdl1[6](F) sdi1[3] 11603730432 blocks super 1.2 level 5, 1024k chunk, algorithm 2 [5/4] [UUUU_] [>....................] recovery = 1.4% (41691236/2900932608) finish=578.6min speed=82352K/sec
            bitmap: 1/173 pages [4KB], 8192KB chunk

From dmesg one cannot detect that MD performed a rewrite on sdl, but I can tell that from smartctl -x /dev/sdl, for which the last recorded error is:

Error 17 [16] occurred at disk power-on lifetime: 15114 hours (629 days + 18 hours) When the command that caused the error occurred, the device was active or idle.

    After command completion occurred, registers were:
    ER -- ST COUNT  LBA_48  LH LM LL DV DC
    -- -- -- == -- == == == -- -- -- -- --
40 -- 51 00 00 00 00 01 62 a8 a0 40 00 Error: WP at LBA = 0x0162a8a0 = 23242912

    Commands leading to the command that caused the error were:
CR FEATR COUNT LBA_48 LH LM LL DV DC Powered_Up_Time Command/Feature_Name -- == -- == -- == == == -- -- -- -- -- --------------- -------------------- 61 04 00 00 40 00 00 01 62 ab c0 40 00 10d+00:06:33.287 WRITE FPDMA QUEUED 61 04 00 00 38 00 00 01 62 9f c0 40 00 10d+00:06:33.287 WRITE FPDMA QUEUED 61 04 00 00 30 00 00 01 62 c3 c0 40 00 10d+00:06:33.285 WRITE FPDMA QUEUED 61 04 00 00 28 00 00 01 62 c7 c0 40 00 10d+00:06:33.276 WRITE FPDMA QUEUED 60 04 00 00 20 00 00 01 62 5f c0 40 00 10d+00:06:33.276 READ FPDMA QUEUED

which is a write error, and was not present before the replacement operation.

I thought the whole point of want_replacement was to NOT attempt to correct read errors onto the source. There is no point in doing that, doing that would just risk to make the source fail.

During a replacement read errors should IMHO remain read errors on the source. Even if it happens afterwards that normal array operation wants to read from exactly such sector which has not been rewritten, it will still receive a read error from the source. No problem because after this it has other 2 places to read such data from: sdm (replacement drive, if it has already passed that point), or reconstruction from the other members of the array. After this, again MD would try to perform a rewrite: also such rewrite (not coming from the reconstruction process) should again go just to the replacement drive and never to the source. Read errors on the source should remain read errors.

Here is an extract of the dmesg:

( LOTS of "Unhandled error code" and "read error corrected" for sdl1 before this )
    [865020.041576] end_request: I/O error, dev sdl, sector 23240640
    [865020.041831] sd 4:0:11:0: [sdl] Unhandled error code
[865020.041834] sd 4:0:11:0: [sdl] Result: hostbyte=DID_SOFT_ERROR driverbyte=DRIVER_OK [865020.041837] sd 4:0:11:0: [sdl] CDB: Read(10): 28 00 01 62 af c0 00 04 00 00
    [865020.041842] end_request: I/O error, dev sdl, sector 23244736
    [865020.041994] sd 4:0:11:0: [sdl] Unhandled error code
[865020.041996] sd 4:0:11:0: [sdl] Result: hostbyte=DID_SOFT_ERROR driverbyte=DRIVER_OK [865020.041998] sd 4:0:11:0: [sdl] CDB: Read(10): 28 00 01 62 b3 c0 00 04 00 00
    [865020.042004] end_request: I/O error, dev sdl, sector 23245760
    [865020.042153] sd 4:0:11:0: [sdl] Unhandled error code
[865020.042155] sd 4:0:11:0: [sdl] Result: hostbyte=DID_SOFT_ERROR driverbyte=DRIVER_OK [865020.042158] sd 4:0:11:0: [sdl] CDB: Read(10): 28 00 01 62 b7 c0 00 04 00 00
    [865020.042163] end_request: I/O error, dev sdl, sector 23246784
    [865020.042315] sd 4:0:11:0: [sdl] Unhandled error code
[865020.042317] sd 4:0:11:0: [sdl] Result: hostbyte=DID_SOFT_ERROR driverbyte=DRIVER_OK [865020.042320] sd 4:0:11:0: [sdl] CDB: Read(10): 28 00 01 62 bb c0 00 04 00 00
    [865020.042325] end_request: I/O error, dev sdl, sector 23247808
    [865020.042476] sd 4:0:11:0: [sdl] Unhandled error code
[865020.042478] sd 4:0:11:0: [sdl] Result: hostbyte=DID_SOFT_ERROR driverbyte=DRIVER_OK [865020.042481] sd 4:0:11:0: [sdl] CDB: Read(10): 28 00 01 62 bf c0 00 04 00 00
    [865020.042486] end_request: I/O error, dev sdl, sector 23248832
    [865020.042636] sd 4:0:11:0: [sdl] Unhandled error code
[865020.042638] sd 4:0:11:0: [sdl] Result: hostbyte=DID_SOFT_ERROR driverbyte=DRIVER_OK [865020.042641] sd 4:0:11:0: [sdl] CDB: Read(10): 28 00 01 62 c7 c0 00 04 00 00
    [865020.042647] end_request: I/O error, dev sdl, sector 23250880
    [865020.042797] sd 4:0:11:0: [sdl] Unhandled error code
[865020.042799] sd 4:0:11:0: [sdl] Result: hostbyte=DID_SOFT_ERROR driverbyte=DRIVER_OK [865020.042801] sd 4:0:11:0: [sdl] CDB: Read(10): 28 00 01 62 cf c0 00 04 00 00
    [865020.042807] end_request: I/O error, dev sdl, sector 23252928
    [865020.042956] sd 4:0:11:0: [sdl] Unhandled error code
[865020.042958] sd 4:0:11:0: [sdl] Result: hostbyte=DID_SOFT_ERROR driverbyte=DRIVER_OK [865020.042961] sd 4:0:11:0: [sdl] CDB: Read(10): 28 00 01 62 d3 c0 00 04 00 00
    [865020.042966] end_request: I/O error, dev sdl, sector 23253952
[865020.100097] mpt2sas0: log_info(0x31080000): originator(PL), code(0x08), sub_code(0x0000) [865020.100124] mpt2sas0: log_info(0x31080000): originator(PL), code(0x08), sub_code(0x0000) [865020.100133] mpt2sas0: log_info(0x31080000): originator(PL), code(0x08), sub_code(0x0000) [865020.100142] mpt2sas0: log_info(0x31080000): originator(PL), code(0x08), sub_code(0x0000)
    [865020.100152] sd 4:0:11:0: [sdl] Unhandled error code
[865020.100157] mpt2sas0: log_info(0x31080000): originator(PL), code(0x08), sub_code(0x0000) [865020.100169] mpt2sas0: log_info(0x31080000): originator(PL), code(0x08), sub_code(0x0000) [865020.100174] sd 4:0:11:0: [sdl] Result: hostbyte=DID_SOFT_ERROR driverbyte=DRIVER_OK [865020.100181] sd 4:0:11:0: [sdl] CDB: Read(10): 28 00 01 62 d7 c0 00 04 00 00
    [865020.100202] end_request: I/O error, dev sdl, sector 23254976
[865020.106716] mpt2sas0: log_info(0x31080000): originator(PL), code(0x08), sub_code(0x0000) [865020.106725] mpt2sas0: log_info(0x31080000): originator(PL), code(0x08), sub_code(0x0000) [865020.106734] mpt2sas0: log_info(0x31080000): originator(PL), code(0x08), sub_code(0x0000) [865020.106743] mpt2sas0: log_info(0x31080000): originator(PL), code(0x08), sub_code(0x0000) [865020.106752] mpt2sas0: log_info(0x31080000): originator(PL), code(0x08), sub_code(0x0000) [865020.106761] mpt2sas0: log_info(0x31080000): originator(PL), code(0x08), sub_code(0x0000) [865020.106771] mpt2sas0: log_info(0x31080000): originator(PL), code(0x08), sub_code(0x0000) [865020.106780] mpt2sas0: log_info(0x31080000): originator(PL), code(0x08), sub_code(0x0000)
    [865020.106902] sd 4:0:11:0: [sdl] Unhandled sense code
[865020.106906] sd 4:0:11:0: [sdl] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
    [865020.106910] sd 4:0:11:0: [sdl]  Sense Key : Medium Error [current]
    [865020.106915] Info fld=0x162abc0
    [865020.106917] sd 4:0:11:0: [sdl]  Add. Sense: Unrecovered read error
[865020.106923] sd 4:0:11:0: [sdl] CDB: Read(10): 28 00 01 62 ab c0 00 04 00 00 [865020.106932] end_request: critical target error, dev sdl, sector 23243712
    [865022.912537] sd 4:0:11:0: [sdl] Unhandled sense code
[865022.912543] sd 4:0:11:0: [sdl] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
    [865022.912548] sd 4:0:11:0: [sdl]  Sense Key : Medium Error [current]
    [865022.912554] Info fld=0x162dfc0
    [865022.912556] sd 4:0:11:0: [sdl]  Add. Sense: Unrecovered read error
[865022.912561] sd 4:0:11:0: [sdl] CDB: Read(10): 28 00 01 62 df c0 00 00 08 00 [865022.912571] end_request: critical target error, dev sdl, sector 23257024
    [865022.951721] raid5_end_read_request: 118 callbacks suppressed
[865022.951726] md/raid:md54: read error corrected (8 sectors at 23177152 on sdl1) [865022.951733] md/raid:md54: read error corrected (8 sectors at 23177160 on sdl1) [865022.951737] md/raid:md54: read error corrected (8 sectors at 23177168 on sdl1) [865022.951741] md/raid:md54: read error corrected (8 sectors at 23177176 on sdl1) [865022.951744] md/raid:md54: read error corrected (8 sectors at 23177184 on sdl1) [865022.951748] md/raid:md54: read error corrected (8 sectors at 23177192 on sdl1) [865022.951751] md/raid:md54: read error corrected (8 sectors at 23177200 on sdl1) [865022.951755] md/raid:md54: read error corrected (8 sectors at 23177208 on sdl1) [865022.951758] md/raid:md54: read error corrected (8 sectors at 23177216 on sdl1) [865022.951762] md/raid:md54: read error corrected (8 sectors at 23177224 on sdl1) [865029.982668] mpt2sas0: log_info(0x31080000): originator(PL), code(0x08), sub_code(0x0000) [865029.982685] mpt2sas0: log_info(0x31080000): originator(PL), code(0x08), sub_code(0x0000) [865029.982694] mpt2sas0: log_info(0x31080000): originator(PL), code(0x08), sub_code(0x0000) [865029.982722] mpt2sas0: log_info(0x31080000): originator(PL), code(0x08), sub_code(0x0000) [865029.982731] mpt2sas0: log_info(0x31080000): originator(PL), code(0x08), sub_code(0x0000) [865029.982740] mpt2sas0: log_info(0x31080000): originator(PL), code(0x08), sub_code(0x0000) [865029.982764] mpt2sas0: log_info(0x31080000): originator(PL), code(0x08), sub_code(0x0000) [865029.982804] mpt2sas0: log_info(0x31080000): originator(PL), code(0x08), sub_code(0x0000) [865029.982812] mpt2sas0: log_info(0x31080000): originator(PL), code(0x08), sub_code(0x0000) [865029.982821] mpt2sas0: log_info(0x31080000): originator(PL), code(0x08), sub_code(0x0000) [865029.982830] mpt2sas0: log_info(0x31080000): originator(PL), code(0x08), sub_code(0x0000) [865029.982839] mpt2sas0: log_info(0x31080000): originator(PL), code(0x08), sub_code(0x0000) [865029.982848] mpt2sas0: log_info(0x31080000): originator(PL), code(0x08), sub_code(0x0000) [865029.982857] mpt2sas0: log_info(0x31080000): originator(PL), code(0x08), sub_code(0x0000) [865029.982866] mpt2sas0: log_info(0x31080000): originator(PL), code(0x08), sub_code(0x0000) [865029.982875] mpt2sas0: log_info(0x31080000): originator(PL), code(0x08), sub_code(0x0000) [865029.982884] mpt2sas0: log_info(0x31080000): originator(PL), code(0x08), sub_code(0x0000) [865029.982893] mpt2sas0: log_info(0x31080000): originator(PL), code(0x08), sub_code(0x0000) [865029.982903] mpt2sas0: log_info(0x31080000): originator(PL), code(0x08), sub_code(0x0000) [865029.982912] mpt2sas0: log_info(0x31080000): originator(PL), code(0x08), sub_code(0x0000) [865029.982921] mpt2sas0: log_info(0x31080000): originator(PL), code(0x08), sub_code(0x0000) [865029.982930] mpt2sas0: log_info(0x31080000): originator(PL), code(0x08), sub_code(0x0000) [865029.982939] mpt2sas0: log_info(0x31080000): originator(PL), code(0x08), sub_code(0x0000) [865029.982948] mpt2sas0: log_info(0x31080000): originator(PL), code(0x08), sub_code(0x0000) [865029.982957] mpt2sas0: log_info(0x31080000): originator(PL), code(0x08), sub_code(0x0000)
    [865029.983003] sd 4:0:11:0: [sdl] Unhandled sense code
[865029.983008] sd 4:0:11:0: [sdl] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
    [865029.983013] sd 4:0:11:0: [sdl]  Sense Key : Medium Error [current]
    [865029.983018] Info fld=0x1625c27
    [865029.983021] sd 4:0:11:0: [sdl]  Add. Sense: Unrecovered read error
[865029.983026] sd 4:0:11:0: [sdl] CDB: Read(10): 28 00 01 62 5b c0 00 04 00 00 [865029.983036] end_request: critical target error, dev sdl, sector 23223232 [865029.990723] md/raid:md54: read error NOT corrected!! (sector 23190464 on sdl1).
    [865029.990728] md/raid:md54: Disk failure on sdl1, disabling device.
    [865029.990729] md/raid:md54: Operation continuing on 4 devices.
[865030.004221] md/raid:md54: read error not correctable (sector 23190472 on sdl1). [865030.004224] md/raid:md54: read error not correctable (sector 23190480 on sdl1). [865030.004227] md/raid:md54: read error not correctable (sector 23190488 on sdl1). [865030.004231] md/raid:md54: read error not correctable (sector 23190496 on sdl1). [865030.004234] md/raid:md54: read error not correctable (sector 23190504 on sdl1). [865030.004237] md/raid:md54: read error not correctable (sector 23190512 on sdl1). [865030.004240] md/raid:md54: read error not correctable (sector 23190520 on sdl1). [865030.004243] md/raid:md54: read error not correctable (sector 23190528 on sdl1). [865030.004246] md/raid:md54: read error not correctable (sector 23190536 on sdl1). [865030.004250] md/raid:md54: read error not correctable (sector 23190544 on sdl1).
    [865030.063169] raid5_end_read_request: 1142 callbacks suppressed
[865030.063174] md/raid:md54: read error corrected (8 sectors at 23200704 on sdl1) [865030.063181] md/raid:md54: read error corrected (8 sectors at 23200712 on sdl1) [865030.063185] md/raid:md54: read error corrected (8 sectors at 23200720 on sdl1) [865030.063189] md/raid:md54: read error corrected (8 sectors at 23200728 on sdl1) [865030.063192] md/raid:md54: read error corrected (8 sectors at 23200736 on sdl1) [865030.063196] md/raid:md54: read error corrected (8 sectors at 23200744 on sdl1) [865030.063199] md/raid:md54: read error corrected (8 sectors at 23200752 on sdl1) [865030.063203] md/raid:md54: read error corrected (8 sectors at 23200760 on sdl1) [865030.063207] md/raid:md54: read error corrected (8 sectors at 23200768 on sdl1) [865030.063210] md/raid:md54: read error corrected (8 sectors at 23200776 on sdl1)
    [865031.472160] md: md54: recovery done.
    [865031.586603] md: recovery of RAID array md54
    [865031.586629] md: minimum _guaranteed_  speed: 20000 KB/sec/disk.
[865031.586633] md: using maximum available idle IO bandwidth (but not more than 80000 KB/sec) for recovery.
    [865031.586647] md: using 128k window, over a total of 2900932608k.
    [865031.586650] md: resuming recovery of md54 from checkpoint.
[865038.511922] mpt2sas0: log_info(0x31080000): originator(PL), code(0x08), sub_code(0x0000) [865038.511934] mpt2sas0: log_info(0x31080000): originator(PL), code(0x08), sub_code(0x0000)
    [865038.512018] sd 4:0:11:0: [sdl] Unhandled sense code
    [865038.512032] sd 4:0:11:0: [sdl]
[865038.512037] sd 4:0:11:0: attempting task abort! scmd(ffff880063d16500) [865038.512044] sd 4:0:11:0: [sdl] CDB: Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
    [865038.512050] sd 4:0:11:0: [sdl]  Sense Key : Medium Error [current]
    [865038.512057] ATA command pass through(16)Info fld=0x162a8a0
    [865038.512062] :
[865038.512064] sd 4:0:11:0: [sdl] 85Add. Sense: Unrecovered read error 08 2e 00 d0 00 01 00 00 00 4f 00 c2 00 b0 00
    [865038.512082]
[865038.512085] scsi target4:0:11: handle(0x001a), sas_address(0x443322110a000000), phy(10) [865038.512091] scsi target4:0:11: enclosure_logical_id(0x500062b2000c2f80), slot(10) [865038.512096] sd 4:0:11:0: [sdl] CDB: Read(10): 28 00 01 62 a7 c0 00 04 00 00 [865038.512106] end_request: critical target error, dev sdl, sector 23242688
    [865038.519787] raid5_end_read_request: 117 callbacks suppressed
[865038.519790] md/raid:md54: read error not correctable (sector 23209920 on sdl1). [865038.519796] md/raid:md54: read error not correctable (sector 23209928 on sdl1). [865038.519799] md/raid:md54: read error not correctable (sector 23209936 on sdl1). [865038.519803] md/raid:md54: read error not correctable (sector 23209944 on sdl1). [865038.519806] md/raid:md54: read error not correctable (sector 23209952 on sdl1). [865038.519809] md/raid:md54: read error not correctable (sector 23209960 on sdl1). [865038.519813] md/raid:md54: read error not correctable (sector 23209968 on sdl1). [865038.519816] md/raid:md54: read error not correctable (sector 23209976 on sdl1). [865038.519819] md/raid:md54: read error not correctable (sector 23209984 on sdl1). [865038.519822] md/raid:md54: read error not correctable (sector 23209992 on sdl1).
    [865042.322285] sd 4:0:11:0: task abort: SUCCESS scmd(ffff880063d16500)
    [865043.156822] raid5_end_read_request: 2294 callbacks suppressed
[865043.156828] md/raid:md54: read error corrected (8 sectors at 23211968 on sdm1) [865043.156835] md/raid:md54: read error corrected (8 sectors at 23211976 on sdm1) [865043.156838] md/raid:md54: read error corrected (8 sectors at 23211984 on sdm1) [865043.156842] md/raid:md54: read error corrected (8 sectors at 23211992 on sdm1) [865043.156845] md/raid:md54: read error corrected (8 sectors at 23212000 on sdm1) [865043.156848] md/raid:md54: read error corrected (8 sectors at 23212008 on sdm1) [865043.156852] md/raid:md54: read error corrected (8 sectors at 23212016 on sdm1) [865043.156855] md/raid:md54: read error corrected (8 sectors at 23212024 on sdm1) [865043.156858] md/raid:md54: read error corrected (8 sectors at 23212032 on sdm1) [865043.156861] md/raid:md54: read error corrected (8 sectors at 23212040 on sdm1)
    -- End of dmesg.---

(The recovery is still running. Let's hope I don't have ANY bad sector on the other 4 disks... )

Also you can notice some strange dmesg logs at 865043 which are a bit confusing. I am guessing it is a leftover of the reconstruction process which saw sdm1 suddenly take the place of sdl1 while there were still ongoing rewrites. Note that smartctl -x /dev/sdm does not report any read or write error during operation, so I am quite sure it cannot really be a read error from sdm1.

Thank you
JJ

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux