On 08/03/2015 07:01 PM, Tejun Heo wrote: > Hello, Hannes. > > On Mon, Aug 03, 2015 at 06:47:46PM +0200, Hannes Reinecke wrote: >> At the moment NCQ autosense is mostly used to provide the host with more >> details for a failed I/O. The typical case here is (no small surprise) >> ZAC disks, which use autosense to inform the host about >> a malformed I/O. >> >> It is _not_ being used as a replacement for existing error behaviour, >> (ie link errors are not being signalled with that; how could they >> if there is no link?); in fact, during testing I"ve seen both, autosense > > Hmmm? Devices can report link error via TF bits and you're bypassing > TF analysis completley if sense data is present. > But sense data will never be present for a link error ... and I'm not disabling that. >> I/O failures and normal I/O failures for which autosense is >> not set, and the normal error handling kicks in. >> >> It's not that I've disable the original error handler completely, >> it's only bypassed for I/O failure where a sense code is provided. >> And the drive surely knows which error occurs, so we'd be daft not be >> using that. > > The patches are altering EH actions in a very subtle way depending on > *how* an error is reported, not *what* is reported, which is a pretty > silly thing to do. It makes things a lot more confusing to follow and > predict. I really don't think this is an acceptable behavior. > But if we were judging on _what_ is being reported we would have to know which sense code the drive will be reporting, in effect carrying a massive table of possible sense codes, and map this to the actions. Do you really want to go that way? >> So I think disabling autosense completely is a bit extreme... > > Please restructure the feature so that it doesn't interfere with the > usual EH behavior. e.g. leave the EH actions alone unless explicitly > necessary but report detailed error information upwards. If the extra > error information can be helpful in determining what EH actions to > take, factoring in that information can be helpful too but I'm not too > convinced that'd make a huge difference. > It does if we can avoid the retry on the libata layer. > Also, please consider that ATA_QCFLAG_SENSE_VALID handling assumes > that the reporting device is an ATAPI device and the command in > question is not a regular IO one. That's why EH ignores AC_ERR_DEV or > AC_ERR_OTHER if ATA_QCFLAG_SENSE_VALID is set. This doesn't work out > for the new ATA usage at all. > I've just mapped onto ATA_QCFLAGS_SENSE_VALID as I've found this to reasonably close to what I've been needing. If that's the wrong choice of course I can modify this. > For now, I've reverted the changes as this is actively detrimental. > Oh. So you've tested this on a device with autosense enabled? I haven't seen any negative effects with this patchset, autosense enabled or not. Cheers, Hannes -- Dr. Hannes Reinecke zSeries & Storage hare@xxxxxxx +49 911 74053 688 SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg GF: J. Hawn, J. Guild, F. Imendörffer, HRB 16746 (AG Nürnberg) -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html