On 2021-09-15 10:43 a.m., James Bottomley wrote:
On Wed, 2021-09-15 at 08:04 -0500, wenxiong@xxxxxxxxxxxxxxxxxx wrote:
From: Wen Xiong <wenxiong@xxxxxxxxxxxxxxxxxx>
Setting scsi logging level with error=3, we saw some errors from
enclosues:
[108017.360833] ses 0:0:9:0: tag#641 Done: NEEDS_RETRY Result:
hostbyte=DID_ERROR driverbyte=DRIVER_OK cmd_age=0s
[108017.360838] ses 0:0:9:0: tag#641 CDB: Receive Diagnostic 1c 01 01
00 20 00
[108017.427778] ses 0:0:9:0: Power-on or device reset occurred
[108017.427784] ses 0:0:9:0: tag#641 Done: SUCCESS Result:
hostbyte=DID_OK driverbyte=DRIVER_OK cmd_age=0s
[108017.427788] ses 0:0:9:0: tag#641 CDB: Receive Diagnostic 1c 01 01
00 20 00
[108017.427791] ses 0:0:9:0: tag#641 Sense Key : Unit Attention
[current]
[108017.427793] ses 0:0:9:0: tag#641 Add. Sense: Bus device reset
function occurred
[108017.427801] ses 0:0:9:0: Failed to get diagnostic page 0x1
[108017.427804] ses 0:0:9:0: Failed to bind enclosure -19
[108017.427895] ses 0:0:10:0: Attached Enclosure device
[108017.427942] ses 0:0:10:0: Attached scsi generic sg18 type 13
As Martin's suggestion, the patch checks to retry on NOT_READY as
well as
UNIT_ATTENTION with ASC 0x29.
This looks fine to me. I think the reason expecting_cc_ua doesn't work
for you is that you're getting > 1 reset per command. expecting_cc_ua
automatically resets after eating the first unit attention.
Rather that simply consuming UAs, what do you think of a fixed length
FIFO, say 8 entries, that holds the asc,ascq of the last 8 UAs together
with a timestamp of when it was received (with a boot time epoch).
Then allow the user space to see that FIFO via sysfs (e.g. under
/sys/class/scsi_device/<hctl>). Remembering the previous UA may also be
useful for the mid-level UA processing. For example after a firmware
upgrade, there may be UAs for both INQUIRY data change and device reset.
Also the first device reset after a reboot (power cycle) is expected,
having one later, for example when part of a disk is mounted, is a bit
more suspicious.
Doug Gilbert