On 10/30/2016 01:43 PM, Andrey Grodzovsky wrote:
Problem:
This is a work around for a bug with LSI Fusion MPT SAS2 when
pefroming secure erase. Due to the very long time the operation
takes commands issued during the erase will time out and will trigger
execution of abort hook. Even though the abort hook is called for
the specifc command which timed out this leads to entire device halt
(scsi_state terminated) and premature termination of the secured erase.
Actually, it is _not_ the erase command which times out, it's the
successive commands which time out, as the controller is unable to
process them while erase is running.
I suspect a bug in the SAT-layer from the mpt3sas firmware, which simply
does not return 'busy' for additional commands when erase is in progress.
That being said, this issue was obscured prior to implementing
asynchronous aborts, as originally a timeout would be invoking SCSI EH,
which would wait for all outstanding commands to complete.
So by the time SCSI EH was invoked the erase command was already
completed, allowing for a successful retry of the failing command.
With asynchronous aborts we don't have this option, as the abort will
succeed, but the command cannot be retried as the original erase command
is still running.
In the light of the above I guess we need something like the attached
patch. I'm not utterly proud of if, but I guess it's the best we can do
for the moment.
Cheers,
Hannes
--
Dr. Hannes Reinecke zSeries & Storage
hare@xxxxxxx +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg)
>From 1556746987c3b4c1a1a4705625280b1136554f89 Mon Sep 17 00:00:00 2001
From: Hannes Reinecke <hare@xxxxxxx>
Date: Sun, 30 Oct 2016 14:24:44 +0100
Subject: [PATCH] mpt3sas: hack: disable concurrent commands for ATA_16/ATA_12
There's a bug in the mpt3sas driver/firmware which would not return
BUSY if it's busy processing requests (eg 'erase') and cannot
respond to other commands. Hence these commands will timeout
and eventually start the error handler.
This patch disallows request processing whenever an ATA_12 or
ATA_16 command is received, thereby avoiding this problem.
Signed-off-by: Hannes Reinecke <hare@xxxxxxxx>
---
drivers/scsi/mpt3sas/mpt3sas_scsih.c | 11 +++++++++++
1 file changed, 11 insertions(+)
diff --git a/drivers/scsi/mpt3sas/mpt3sas_scsih.c b/drivers/scsi/mpt3sas/mpt3sas_scsih.c
index 97987e7..18b9f09 100644
--- a/drivers/scsi/mpt3sas/mpt3sas_scsih.c
+++ b/drivers/scsi/mpt3sas/mpt3sas_scsih.c
@@ -4096,6 +4096,13 @@ scsih_qcmd(struct Scsi_Host *shost, struct scsi_cmnd *scmd)
sas_device_priv_data->block)
return SCSI_MLQUEUE_DEVICE_BUSY;
+ /*
+ * Hack: block the device for any ATA_12/ATA_16 command
+ */
+ if (scmd->cmnd[0] == 0xa1 || scmd->cmnd[0] == 0x85) {
+ sas_device_priv_data = scmd->device->hostdata;
+ _scsih_internal_device_block(scmd->device, sas_device_priv_data);
+ }
if (scmd->sc_data_direction == DMA_FROM_DEVICE)
mpi_control = MPI2_SCSIIO_CONTROL_READ;
else if (scmd->sc_data_direction == DMA_TO_DEVICE)
@@ -4835,6 +4842,10 @@ _scsih_io_done(struct MPT3SAS_ADAPTER *ioc, u16 smid, u8 msix_index, u32 reply)
out:
+ if (scmd->cmnd[0] == 0xa1 || scmd->cmnd[0] == 0x85) {
+ sas_device_priv_data = scmd->device->hostdata;
+ _scsih_internal_device_unblock(scmd->device, sas_device_priv_data);
+ }
scsi_dma_unmap(scmd);
scmd->scsi_done(scmd);
--
2.6.6