On 2019/06/25 1:33, Tejun Heo wrote: > ZAC support added sense data requesting on error for both ZAC and ATA > devices. This seems to cause erratic error handling behaviors on some > SSDs where the device reports sense data availability and then > delivers the wrong content making EH take the wrong actions. The > failure mode was sporadic on a LITE-ON ssd and couldn't be reliably > reproduced. > > There is no value in requesting sense data from non-ZAC ATA devices > while there's a significant risk of introducing EH misbehaviors which > are difficult to reproduce and fix. Let's do the sense data dancing > only for ZAC devices. > > Signed-off-by: Tejun Heo <tj@xxxxxxxxxx> > Cc: Hannes Reinecke <hare@xxxxxxxxxx> > --- > drivers/ata/libata-eh.c | 8 +++++--- > 1 file changed, 5 insertions(+), 3 deletions(-) > > diff --git a/drivers/ata/libata-eh.c b/drivers/ata/libata-eh.c > index 9d687e1d4325..3bfd9da58473 100644 > --- a/drivers/ata/libata-eh.c > +++ b/drivers/ata/libata-eh.c > @@ -1469,7 +1469,7 @@ static int ata_eh_read_log_10h(struct ata_device *dev, > tf->hob_lbah = buf[10]; > tf->nsect = buf[12]; > tf->hob_nsect = buf[13]; > - if (ata_id_has_ncq_autosense(dev->id)) > + if (dev->class == ATA_DEV_ZAC && ata_id_has_ncq_autosense(dev->id)) > tf->auxiliary = buf[14] << 16 | buf[15] << 8 | buf[16]; > > return 0; > @@ -1716,7 +1716,8 @@ void ata_eh_analyze_ncq_error(struct ata_link *link) > memcpy(&qc->result_tf, &tf, sizeof(tf)); > qc->result_tf.flags = ATA_TFLAG_ISADDR | ATA_TFLAG_LBA | ATA_TFLAG_LBA48; > qc->err_mask |= AC_ERR_DEV | AC_ERR_NCQ; > - if ((qc->result_tf.command & ATA_SENSE) || qc->result_tf.auxiliary) { > + if (dev->class == ATA_DEV_ZAC && > + ((qc->result_tf.command & ATA_SENSE) || qc->result_tf.auxiliary)) { > char sense_key, asc, ascq; > > sense_key = (qc->result_tf.auxiliary >> 16) & 0xff; > @@ -1770,10 +1771,11 @@ static unsigned int ata_eh_analyze_tf(struct ata_queued_cmd *qc, > } > > switch (qc->dev->class) { > - case ATA_DEV_ATA: > case ATA_DEV_ZAC: > if (stat & ATA_SENSE) > ata_eh_request_sense(qc, qc->scsicmd); > + /* fall through */ > + case ATA_DEV_ATA: > if (err & ATA_ICRC) > qc->err_mask |= AC_ERR_ATA_BUS; > if (err & (ATA_UNC | ATA_AMNF)) > For NCQ commands, I believe it is mandatory to request sense data for the failed command to get the device out of error mode. So isn't this approach breaking anything for well behaving drives ? Wouldn't it be better to blacklist the misbehaving SSD you observed the problem with ? -- Damien Le Moal Western Digital Research