Re: mpt2sas driver behaving strange with a failed SATA disk behind SAS expander.

Ravi Shankar <ravi.v.shankar@xxxxxxxxxx> · Wed, 17 Aug 2011 11:35:00 -0700

On 08/17/11 07:25, Fredrik Lindgren wrote:
Hello,

I'm seeing something strange on a Supermicro 847E16-R1400. It has SAS 
expanders
with SATA disks behind them (Seagate Barracuda XT). The SAS card is a 
LSI SAS9211-8i.

When doing disk IO on the disks (they are all configured in MD raids) 
suddenly IO will
stop and these messages are printed on the console about once every 
second:

mpt2sas0: log_info(0x31110610): originator(PL), code(0x11), 
sub_code(0x0610)

From what I understand this means:

PL_LOGINFO_CODE_RESET (0x00110000)
PL_LOGINFO_SUB_CODE_SATA_NON_NCQ_RW_ERR_BIT_SET (0x00000600)

So a disk is acting up, generating errors? What does the last "10" 
mean in the sub_code,
is that an identifier for which disk it is?

After some time, the message changed:

mpt2sas0: log info(0x31111000): originator(PL), code(0x11), sub 
code(0x1000)

Now the disk seems to have died completely?

PL_LOGINFO_CODE_RESET (0x00110000)
PL_LOGINFO_SUB_CODE_DSCVRY_SATA_INIT_TIMEOUT (0x00001000)

I think sub code (0x610) indicates "Error in SATA ReadLogExt SATA 
command" and subsequently the disk drive failed
to initialize (SATA initialization timeout). Since you've connected 
through Expander, the link between Disk and Expander
should be actively transmitting FIS frames. You can verify whether Disk 
link is up by checking Expander Routing Tables.

Reduce the link speed (from 6 to 3 Gb/s) between HBA-Exp-Disk and try 
disabling Native Cmd Queuing and see whether it helps.
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html