Re: [PATCH] [SCSI] mpt3sas: Fix secure erase premature termination (v2)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sat, Nov 5, 2016 at 6:47 PM, Andrey Grodzovsky <andrey2805@xxxxxxxxx> wrote:
> On Fri, Nov 4, 2016 at 10:51 AM, Hannes Reinecke <hare@xxxxxxx> wrote:
>> On 11/04/2016 01:45 PM, Sreekanth Reddy wrote:
>>>
>>> Hi All,
>>>
>>> From last two days, I was working with my firmware team to get the
>>> required info over this issue. Here is my firmware team response
>>>
>>> "For ATA PASSTHROUGH commands, the IOC SATL will not check for the
>>> opcode and will direct it to the drive. So even though ATA PASSTHOUGH
>>> has ATA erase to the drive, IOC SATL FW will not know that and as a
>>> general logic for all ATA PASSTHOGH commands, IOC FW will pend the
>>> upcoming IOs untill the previous ATA PASSTHORUGH completes. This is as
>>> per the SAT specification for SAS controllers and we can't compare it
>>> with the SATA controllers in the on board that have full fledge SATA
>>> implementation".
>>>
>>> So this is an expected behavior from our HBA firmware. i.e. it will
>>> pend the subsequent commands if any ATA PASSTHROUGH command is going
>>> on. So their is no issue with the FW.
>>>
>> But is there a way to figure out if the firmware / SATL layer is busy
>> processing requests?
>>
>> With 'real' ATA HBAs these issue doesn't occur, as the ATA erase command is
>> a non-queued command, and hence the next command automatically has to wait
>> for the erase command to complete.
>> But this wait happens as the ATA HBA returns 'BUSY', and the linux I/O stack
>> will then reset the timeout for all consecutive commands.
>>
>> With mpt3sas _all_ commands are queued, so if there is a long-running I/O
>> command all other commands already in the queue will time out.
>>
>> Which is at least a very awkward behaviour.
>>
>> Checking with SAT-3 (section 6.2.4: Commands the SATL queues internally) the
>> implemented behaviour is standards conformant, although the standard also
>> allows for returning 'TASK SET FULL' or 'BUSY' in these cases.
>> Doing so would nicely solve this issue.
>>
>>> Today I have tried the same test case on my local setup. i.e. I have
>>> issued a secure erase command using hdparm utility and observed the
>>> same issue on 4.2.3-300.fc23.x86_64 kernel.
>>>
>>> Then after browsing over this issue, I found that some people are
>>> recommending to enable 'CONFIG_IDE_TASK_IOCTL' Kconfig flag. I had a
>>> compiled 4.4.0 kernel, so I have enabled this CONFIG_IDE_TASK_IOCTL
>>> and recompiled this 4.4.0 kernel and booted in to this kernel. Then I
>>> tried same test case and I haven't observed this issue and secure
>>> erase operation was completed successfully.
>>>
>>> So, can you please try once with CONFIG_IDE_TASK_IOCTL enabled.
>>>
>> Errm.
>> CONFIG_IDE_TASK_IOCTL is for the old IDE subsystem, which isn't in use here.
>> So this option does not make a difference when using mpt3sas, as this is a
>> 'real' SCSI driver which never calls out into any of these subsystems.
>>
>> I would be _VERY_ much surprised if that would make a difference.
>>
>> The reason why this behaviour did go unnoticed with older kernels was that a
>> command timeout would trigger SCSI EH to engage, and that in turn required
>> all outstanding commands to complete.
>> So by the time SCSI EH started the ERASE command was complete, and a retry
>> of the timed-out commands would work.
>
> Indeed, when retesting with CONFIG_IDE_TASK_IOCTL=y and. reverting the
> fix the bug is back.
>
> Thanks,
> Andrey

Hi Andrey,

We are fine with this patch with below few changes,

1. Please remove below comment. it not a bug in firmware, it is
designed like that,

/* This is a work around for a bug with LSI Fusion MPT SAS2 when
* pefroming secure erase. Due to the verly long time the operation
* takes commands issued during the erase will time out and will trigger
* execution of abort hook. This leads to device reset and premature
* termination of the secured erase.
*/

2. Use SCSI commands opcodes definitions instead of value, so replace below line

return (scmd->cmnd[0] == 0xa1 || scmd->cmnd[0] == 0x85);
as
return (scmd->cmnd[0] == ATA_12 || scmd->cmnd[0] == ATA_16);

3.  Please correct alignment for the below comment,

 /**
     * Lock the device for any subsequent command until
     * command is done.
     */

Thanks,
Sreekanth

>>
>>
>> Cheers,
>>
>> Hannes
>> --
>> Dr. Hannes Reinecke                   zSeries & Storage
>> hare@xxxxxxx                          +49 911 74053 688
>> SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
>> GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg)
--
To unsubscribe from this list: send the line "unsubscribe stable" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux Kernel]     [Kernel Development Newbies]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite Hiking]     [Linux Kernel]     [Linux SCSI]