On 2018-06-11 12:07 PM, Ted Cabeen wrote:
I'm seeing a similar behavior on my system, but across multiple devices on a SAS
drive array (front bays on a Supermicro-based system with onboard mpt3sas card).
The Sense Key here doesn't show a medium error, and the multiple-drive behavior
makes me think it's more likely either a controller or cable problem.
Interestingly, the issue only shows up under heavy load (specifically a ZFS scrub).
During my next downtime window, I'm going to try to re-create the problem while
capturing a blktrace. Any other things to try at that time, or a filter-mask I
should apply?
[Wed Jun 6 14:30:19 2018] blk_update_request: I/O error, dev sdn, sector
1757633640
[Wed Jun 6 14:37:10 2018] sd 15:0:5:0: unaligned partial completion avoided
(xfer_cnt=3072, sector_sz=4096)
[Wed Jun 6 14:37:10 2018] sd 15:0:5:0: [sdr] FAILED Result: hostbyte=DID_OK
driverbyte=DRIVER_SENSE
[Wed Jun 6 14:37:10 2018] sd 15:0:5:0: [sdr] Sense Key : Aborted Command
[current] [descriptor]
[Wed Jun 6 14:37:10 2018] sd 15:0:5:0: [sdr] Add. Sense: Nak received
[Wed Jun 6 14:37:10 2018] sd 15:0:5:0: [sdr] CDB: Read(10) 28 00 07 8a c1 ca 00
00 01 00
[Wed Jun 6 14:37:10 2018] blk_update_request: I/O error, dev sdr, sector
1012272720
[Wed Jun 6 15:20:43 2018] sd 15:0:8:0: unaligned partial completion avoided
(xfer_cnt=52224, sector_sz=4096)
[Wed Jun 6 15:20:43 2018] sd 15:0:8:0: [sdu] FAILED Result: hostbyte=DID_OK
driverbyte=DRIVER_SENSE
[Wed Jun 6 15:20:43 2018] sd 15:0:8:0: [sdu] Sense Key : Aborted Command
[current] [descriptor]
[Wed Jun 6 15:20:43 2018] sd 15:0:8:0: [sdu] Add. Sense: Nak received
[Wed Jun 6 15:20:43 2018] sd 15:0:8:0: [sdu] CDB: Read(10) 28 00 12 ab dc 52 00
00 19 00
[Wed Jun 6 15:20:43 2018] blk_update_request: I/O error, dev sdu, sector
2506023568
[Wed Jun 6 15:46:20 2018] sd 15:0:2:0: unaligned partial completion avoided
(xfer_cnt=11264, sector_sz=4096)
[Wed Jun 6 15:46:20 2018] sd 15:0:2:0: [sdo] FAILED Result: hostbyte=DID_OK
driverbyte=DRIVER_SENSE
[Wed Jun 6 15:46:20 2018] sd 15:0:2:0: [sdo] Sense Key : Aborted Command
[current] [descriptor]
[Wed Jun 6 15:46:20 2018] sd 15:0:2:0: [sdo] Add. Sense: Nak received
[Wed Jun 6 15:46:20 2018] sd 15:0:2:0: [sdo] CDB: Read(10) 28 00 40 a8 ef b5 00
00 03 00
[Wed Jun 6 15:46:20 2018] blk_update_request: I/O error, dev sdo, sector
8678505896
I have also seen Aborted Command sense when doing heavy testing on one or
more SAS disks behind a SAS expander. I put it down to a temporary lack of
paths available (on the link between the host's HBA and the expander)
when one of those SAS disks tries to get a connection back to the host
with the data (data-in transfer) from an earlier READ command.
In my code (ddpt and sg_dd) I treat it as a "retry" type error and in
my experience that works. IOW a follow-up READ with the same parameters
is successful.
Doug Gilbert