Hello all,
I am testing a system with a failing disk.
This is an MD raid5 with bitmap, disks are over LSI SAS. A pretty normal
setup.
I can show very long dmesgs in which most read errors are apparently not
corrected. However upper layers such as the filesystem do not complain
either, e.g. the filesystem does not go readonly, and no "read error"
received from userspace. So everything actually works, but I can't
understand why!?
Here is one piece of dmesg in which only at [2204.845894] some errors
get corrected by MD, so I see at least 3 errors before that (at time
729, 1207, 2071) which are apparently ignored by everything:
[ 289.360928] EXT4-fs (dm-0): mounted filesystem with ordered data
mode. Opts: (null)
[ 729.141449] sd 6:0:33:0: [sdah] Unhandled sense code
[ 729.141460] sd 6:0:33:0: [sdah]
[ 729.141463] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[ 729.141466] sd 6:0:33:0: [sdah]
[ 729.141467] Sense Key : Medium Error [current]
[ 729.141471] Info fld=0xcba7e3c
[ 729.141473] sd 6:0:33:0: [sdah]
[ 729.141476] Add. Sense: Unrecovered read error
[ 729.141478] sd 6:0:33:0: [sdah] CDB:
[ 729.141480] Read(10): 28 00 0c ba 7e 00 00 00 a0 00
[ 729.141488] end_request: critical medium error, dev sdah, sector
213548604
[ 781.088413] perf samples too long (2510 > 2500), lowering
kernel.perf_event_max_sample_rate to 50000
[ 1207.475752] sd 6:0:33:0: [sdah] Unhandled sense code
[ 1207.475761] sd 6:0:33:0: [sdah]
[ 1207.475762] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[ 1207.475764] sd 6:0:33:0: [sdah]
[ 1207.475765] Sense Key : Medium Error [current]
[ 1207.475767] Info fld=0xd2d89d2
[ 1207.475769] sd 6:0:33:0: [sdah]
[ 1207.475770] Add. Sense: Unrecovered read error
[ 1207.475772] sd 6:0:33:0: [sdah] CDB:
[ 1207.475773] Read(10): 28 00 0d 2d 88 c0 00 01 98 00
[ 1207.475778] end_request: critical medium error, dev sdah, sector
221088210
[ 2071.445584] sd 6:0:33:0: [sdah] Unhandled sense code
[ 2071.445596] sd 6:0:33:0: [sdah]
[ 2071.445599] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[ 2071.445601] sd 6:0:33:0: [sdah]
[ 2071.445603] Sense Key : Medium Error [current]
[ 2071.445607] Info fld=0xc8fd800
[ 2071.445612] sd 6:0:33:0: [sdah]
[ 2071.445614] Add. Sense: Unrecovered read error
[ 2071.445615] sd 6:0:33:0: [sdah] CDB:
[ 2071.445617] Read(10): 28 00 0c 8f d8 00 00 01 c8 00
[ 2071.445622] end_request: critical medium error, dev sdah, sector
210753536
[ 2201.018508] sd 6:0:33:0: [sdah] Unhandled sense code
[ 2201.018522] sd 6:0:33:0: [sdah]
[ 2201.018525] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[ 2201.018528] sd 6:0:33:0: [sdah]
[ 2201.018530] Sense Key : Medium Error [current]
[ 2201.018534] Info fld=0xc8fb450
[ 2201.018537] sd 6:0:33:0: [sdah]
[ 2201.018546] Add. Sense: Unrecovered read error
[ 2201.018551] sd 6:0:33:0: [sdah] CDB:
[ 2201.018552] Read(10): 28 00 0c 8f b4 48 00 00 38 00
[ 2201.018561] end_request: critical medium error, dev sdah, sector
210744400
[ 2203.651727] sd 6:0:33:0: [sdah] Unhandled sense code
[ 2203.651740] sd 6:0:33:0: [sdah]
[ 2203.651743] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[ 2203.651745] sd 6:0:33:0: [sdah]
[ 2203.651747] Sense Key : Medium Error [current]
[ 2203.651752] Info fld=0xc8fb450
[ 2203.651754] sd 6:0:33:0: [sdah]
[ 2203.651756] Add. Sense: Unrecovered read error
[ 2203.651759] sd 6:0:33:0: [sdah] CDB:
[ 2203.651761] Read(10): 28 00 0c 8f b4 50 00 00 30 00
[ 2203.651769] end_request: critical medium error, dev sdah, sector
210744400
[ 2204.845894] md/raid:md201: read error corrected (8 sectors at 996432
on sdah2)
[ 2204.845912] md/raid:md201: read error corrected (8 sectors at 996440
on sdah2)
[ 2204.845915] md/raid:md201: read error corrected (8 sectors at 996448
on sdah2)
[ 2204.845918] md/raid:md201: read error corrected (8 sectors at 996456
on sdah2)
[ 2204.845920] md/raid:md201: read error corrected (8 sectors at 996464
on sdah2)
[ 2204.845923] md/raid:md201: read error corrected (8 sectors at 996472
on sdah2)
Here is a time in which they get corrected a bit more often, but as you
can see most are still skipped:
[97939.727497] sd 6:0:33:0: [sdah] Unhandled sense code
[97939.727512] sd 6:0:33:0: [sdah]
[97939.727515] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[97939.727518] sd 6:0:33:0: [sdah]
[97939.727520] Sense Key : Medium Error [current]
[97939.727524] Info fld=0xd439400
[97939.727526] sd 6:0:33:0: [sdah]
[97939.727529] Add. Sense: Unrecovered read error
[97939.727531] sd 6:0:33:0: [sdah] CDB:
[97939.727533] Read(10): 28 00 0d 43 94 00 00 00 28 00
[97939.727541] end_request: critical medium error, dev sdah, sector
222532608
[97942.216365] sd 6:0:33:0: [sdah] Unhandled sense code
[97942.216378] sd 6:0:33:0: [sdah]
[97942.216381] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[97942.216382] sd 6:0:33:0: [sdah]
[97942.216384] Sense Key : Medium Error [current]
[97942.216387] Info fld=0xd439400
[97942.216388] sd 6:0:33:0: [sdah]
[97942.216390] Add. Sense: Unrecovered read error
[97942.216391] sd 6:0:33:0: [sdah] CDB:
[97942.216393] Read(10): 28 00 0d 43 94 00 00 00 28 00
[97942.216398] end_request: critical medium error, dev sdah, sector
222532608
[97942.625805] md/raid:md201: read error corrected (8 sectors at
12784640 on sdah2)
[97942.625884] md/raid:md201: read error corrected (8 sectors at
12784648 on sdah2)
[97942.625887] md/raid:md201: read error corrected (8 sectors at
12784656 on sdah2)
[97942.625888] md/raid:md201: read error corrected (8 sectors at
12784664 on sdah2)
[97942.625890] md/raid:md201: read error corrected (8 sectors at
12784672 on sdah2)
[98112.230660] sd 6:0:33:0: [sdah] Unhandled sense code
[98112.230687] sd 6:0:33:0: [sdah]
[98112.230690] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[98112.230692] sd 6:0:33:0: [sdah]
[98112.230694] Sense Key : Medium Error [current]
[98112.230698] Info fld=0xcbaca40
[98112.230700] sd 6:0:33:0: [sdah]
[98112.230703] Add. Sense: Unrecovered read error
[98112.230705] sd 6:0:33:0: [sdah] CDB:
[98112.230707] Read(10): 28 00 0c ba ca 40 00 00 08 00
[98112.230715] end_request: critical medium error, dev sdah, sector
213568064
[99107.714394] sd 6:0:33:0: [sdah] Unhandled sense code
[99107.714443] sd 6:0:33:0: [sdah]
[99107.714444] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[99107.714446] sd 6:0:33:0: [sdah]
[99107.714447] Sense Key : Medium Error [current]
[99107.714450] Info fld=0xcba46c8
[99107.714451] sd 6:0:33:0: [sdah]
[99107.714453] Add. Sense: Unrecovered read error
[99107.714455] sd 6:0:33:0: [sdah] CDB:
[99107.714456] Read(10): 28 00 0c ba 46 c0 00 00 20 00
[99107.714461] end_request: critical medium error, dev sdah, sector
213534408
[99110.123110] sd 6:0:33:0: [sdah] Unhandled sense code
[99110.123167] sd 6:0:33:0: [sdah]
[99110.123170] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[99110.123173] sd 6:0:33:0: [sdah]
[99110.123175] Sense Key : Medium Error [current]
[99110.123179] Info fld=0xcba46c8
[99110.123181] sd 6:0:33:0: [sdah]
[99110.123184] Add. Sense: Unrecovered read error
[99110.123187] sd 6:0:33:0: [sdah] CDB:
[99110.123189] Read(10): 28 00 0c ba 46 c0 00 00 20 00
[99110.123197] end_request: critical medium error, dev sdah, sector
213534408
[99111.169398] md/raid:md201: read error corrected (8 sectors at 3786440
on sdah2)
[99111.169404] md/raid:md201: read error corrected (8 sectors at 3786448
on sdah2)
[99111.169406] md/raid:md201: read error corrected (8 sectors at 3786456
on sdah2)
[101221.285568] mpt2sas0: _scsih_sas_broadcast_primitive_event: enter:
phy number(1), width(16)
[101221.288095] mpt2sas0: _scsih_sas_broadcast_primitive_event: enter:
phy number(1), width(16)
[101221.290937] mpt2sas0: _scsih_sas_broadcast_primitive_event: enter:
phy number(1), width(16)
[101221.293768] mpt2sas0: _scsih_sas_broadcast_primitive_event: enter:
phy number(1), width(16)
[101491.327771] sd 6:0:33:0: [sdah] Unhandled sense code
[101491.327813] sd 6:0:33:0: [sdah]
[101491.327815] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[101491.327817] sd 6:0:33:0: [sdah]
[101491.327819] Sense Key : Medium Error [current]
[101491.327822] Info fld=0xd2d7c1c
[101491.327824] sd 6:0:33:0: [sdah]
[101491.327826] Add. Sense: Unrecovered read error
[101491.327828] sd 6:0:33:0: [sdah] CDB:
[101491.327830] Read(10): 28 00 0d 2d 7c 18 00 00 08 00
[101491.327836] end_request: critical medium error, dev sdah, sector
221084700
[112965.864443] sd 6:0:33:0: [sdah] Unhandled sense code
[112965.864469] sd 6:0:33:0: [sdah]
[112965.864471] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[112965.864474] sd 6:0:33:0: [sdah]
[112965.864476] Sense Key : Medium Error [current]
[112965.864480] Info fld=0xc8e1cb1
[112968.322232] sd 6:0:33:0: [sdah]
[112968.322233] Add. Sense: Unrecovered read error
[112968.322235] sd 6:0:33:0: [sdah] CDB:
[112968.322236] Read(10): 28 00 0c 8e 1c 00 00 00 d8 00
[112968.322241] end_request: critical medium error, dev sdah, sector
210640049
[112969.127941] md/raid:md201: read error corrected (8 sectors at 892080
on sdah2)
[112969.127952] md/raid:md201: read error corrected (8 sectors at 892088
on sdah2)
[112969.127954] md/raid:md201: read error corrected (8 sectors at 892096
on sdah2)
[112969.127955] md/raid:md201: read error corrected (8 sectors at 892104
on sdah2)
[112969.127957] md/raid:md201: read error corrected (8 sectors at 892112
on sdah2)
[113352.100011] sd 6:0:33:0: [sdah] Unhandled sense code
[113352.100068] sd 6:0:33:0: [sdah]
[113352.100071] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[113352.100074] sd 6:0:33:0: [sdah]
[113352.100076] Sense Key : Medium Error [current]
[113352.100080] Info fld=0xc8e8448
[113352.100083] sd 6:0:33:0: [sdah]
[113352.100086] Add. Sense: Unrecovered read error
[113352.100088] sd 6:0:33:0: [sdah] CDB:
[113352.100090] Read(10): 28 00 0c 8e 84 30 00 00 38 00
[113352.100099] end_request: critical medium error, dev sdah, sector
210666568
[113354.850395] sd 6:0:33:0: [sdah] Unhandled sense code
[113354.850404] sd 6:0:33:0: [sdah]
[113354.850406] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[113354.850408] sd 6:0:33:0: [sdah]
[113354.850409] Sense Key : Medium Error [current]
[113354.850412] Info fld=0xc8e8448
[113354.850414] sd 6:0:33:0: [sdah]
[113354.850416] Add. Sense: Unrecovered read error
[113354.850417] sd 6:0:33:0: [sdah] CDB:
[113354.850419] Read(10): 28 00 0c 8e 84 30 00 00 38 00
[113354.850424] end_request: critical medium error, dev sdah, sector
210666568
[113355.387298] md/raid:md201: read error corrected (8 sectors at 918600
on sdah2)
[113355.387303] md/raid:md201: read error corrected (8 sectors at 918608
on sdah2)
[113355.387305] md/raid:md201: read error corrected (8 sectors at 918616
on sdah2)
[113355.387307] md/raid:md201: read error corrected (8 sectors at 918624
on sdah2)
As I wrote above, no error is noticed by userspace, so it actually
works, but I don't know why!?
Thanks for info
EW
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html