Re: [PATCH 1/6] libata: Do not retry commands with valid autosense

Hannes Reinecke <hare@xxxxxxx> · Mon, 03 Aug 2015 20:21:19 +0200



On 08/03/2015 07:01 PM, Tejun Heo wrote:
> Hello, Hannes.
> 
> On Mon, Aug 03, 2015 at 06:47:46PM +0200, Hannes Reinecke wrote:
>> At the moment NCQ autosense is mostly used to provide the host with more
>> details for a failed I/O. The typical case here is (no small surprise)
>> ZAC disks, which use autosense to inform the host about
>> a malformed I/O.
>>
>> It is _not_ being used as a replacement for existing error behaviour,
>> (ie link errors are not being signalled with that; how could they
>> if there is no link?); in fact, during testing I"ve seen both, autosense
> 
> Hmmm?  Devices can report link error via TF bits and you're bypassing
> TF analysis completley if sense data is present.
> 
But sense data will never be present for a link error ... and I'm not
disabling that.

>> I/O failures and normal I/O failures for which autosense is
>> not set, and the normal error handling kicks in.
>>
>> It's not that I've disable the original error handler completely,
>> it's only bypassed for I/O failure where a sense code is provided.
>> And the drive surely knows which error occurs, so we'd be daft not be
>> using that.
> 
> The patches are altering EH actions in a very subtle way depending on
> *how* an error is reported, not *what* is reported, which is a pretty
> silly thing to do.  It makes things a lot more confusing to follow and
> predict.  I really don't think this is an acceptable behavior.
> 
But if we were judging on _what_ is being reported we would have to know
which  sense code the drive will be reporting, in effect carrying
a massive table of possible sense codes, and map this to the actions.
Do you really want to go that way?

>> So I think disabling autosense completely is a bit extreme...
> 
> Please restructure the feature so that it doesn't interfere with the
> usual EH behavior.  e.g. leave the EH actions alone unless explicitly
> necessary but report detailed error information upwards.  If the extra
> error information can be helpful in determining what EH actions to
> take, factoring in that information can be helpful too but I'm not too
> convinced that'd make a huge difference.
> 
It does if we can avoid the retry on the libata layer.

> Also, please consider that ATA_QCFLAG_SENSE_VALID handling assumes
> that the reporting device is an ATAPI device and the command in
> question is not a regular IO one.  That's why EH ignores AC_ERR_DEV or
> AC_ERR_OTHER if ATA_QCFLAG_SENSE_VALID is set.  This doesn't work out
> for the new ATA usage at all.
> 
I've just mapped onto ATA_QCFLAGS_SENSE_VALID as I've found this to
reasonably close to what I've been needing.
If that's the wrong choice of course I can modify this.

> For now, I've reverted the changes as this is actively detrimental.
> 
Oh. So you've tested this on a device with autosense enabled?
I haven't seen any negative effects with this patchset, autosense
enabled or not.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		      zSeries & Storage
hare@xxxxxxx			      +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg)
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html