On Fri, Jan 13, 2023 at 08:57:37AM +0100, Hannes Reinecke wrote: > On 1/12/23 15:03, Niklas Cassel wrote: > > In SCSI, we get the sense data as part of the completion, for ATA > > however, we need to fetch the sense data as an extra step. For an > > aborted ATA command the sense data is fetched via libata's > > ->eh_strategy_handler(). > > > > For Command Duration Limits policy 0xD: > > The device shall complete the command without error with the additional > > sense code set to DATA CURRENTLY UNAVAILABLE. > > > > In order to handle this policy in libata, we intend to send a successful > > command via SCSI EH, and let libata's ->eh_strategy_handler() fetch the > > sense data for the good command. This is similar to how we handle an > > aborted ATA command, just that we need to read the Successful NCQ > > Commands log instead of the NCQ Command Error log. > > > > When we get a SATA completion with successful commands, ATA_SENSE will > > be set, indicating that some commands in the completion have sense data. > > > > The sense_valid bitmask in the Sense Data for Successful NCQ Commands > > log will inform exactly which commands that had sense data, which might > > be a subset of all the commands that was completed in the same > > completion. (Yet all will have ATA_SENSE set, since the status is per > > completion.) > > > > The successful commands that have e.g. a "DATA CURRENTLY UNAVAILABLE" > > sense data will have a SCSI ML byte set, so scsi_eh_flush_done_q() will > > not set the scmd->result to DID_TIME_OUT for these commands. However, > > the successful commands that did not have sense data, must not get their > > result marked as DID_TIME_OUT by SCSI EH. > > > > Add a new flag SCMD_EH_SUCCESS_CMD, which tells SCSI EH to not mark a > > command as DID_TIME_OUT, even if it has scmd->result == SAM_STAT_GOOD. > > > > This will be used by libata in a follow-up patch. > > > > Signed-off-by: Niklas Cassel <niklas.cassel@xxxxxxx> > > --- > > drivers/scsi/scsi_error.c | 3 ++- > > include/scsi/scsi_cmnd.h | 5 +++++ > > 2 files changed, 7 insertions(+), 1 deletion(-) > > > > diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c > > index 2aa2c2aee6e7..51aa5c1e31b5 100644 > > --- a/drivers/scsi/scsi_error.c > > +++ b/drivers/scsi/scsi_error.c > > @@ -2165,7 +2165,8 @@ void scsi_eh_flush_done_q(struct list_head *done_q) > > * scsi_eh_get_sense), scmd->result is already > > * set, do not set DID_TIME_OUT. > > */ > > - if (!scmd->result) > > + if (!scmd->result && > > + !(scmd->flags & SCMD_EH_SUCCESS_CMD)) > > scmd->result |= (DID_TIME_OUT << 16); > > SCSI_LOG_ERROR_RECOVERY(3, > > scmd_printk(KERN_INFO, scmd, > Wouldn't it be better to use '!scmd->result && !scsi_sense_valid(scmd)' > instead of a new flag? > After all, if we have a valid sense code we _have_ been able to communicate > with the device. And as we did so it's questionable whether it should count > as a command time out ... Hello Hannes, Thanks a lot for helping out reviewing this series! Unfortunately, your suggestion won't work. Let me explain: When you get a FIS, the ACT register will have a bit set for each command that finished, however, all the commands will share a single STATUS value (since there is just a shared STATUS field in the FIS). So let's say that tags 0-3 got finished (i.e. bits 0-3 are set in the ACT field) and the STATUS field has the "Sense Data Available" bit set. This just tells us that at least one of tags 0-3 has sense data. In order to know which of these tags that actually has sense data, we need to read the "Sense Data for Successful NCQ Commands log", which contains a sense_valid bitmask (which contains one bit for each of the 32 tags). So reading the "Sense Data for Successful NCQ Commands log" might tell us that just tag 0-1 have sense data. So, libata calls ata_qc_schedule_eh() on tags 0-3, wait until SCSI calls libata .eh_strategy_handler(). libata .eh_strategy_handler() will read the "Sense Data for Successful NCQ Commands log", which will see that there is sense data for tags 0-1, and will add sense data for those commands, and call scsi_check_sense() for tags 0-1. ata_eh_finish() will finally be called, to determine what to do with the commands that belonged to EH. The code looks like this: if (qc->flags & ATA_QCFLAG_SENSE_VALID || qc->flags & ATA_QCFLAG_EH_SUCCESS_CMD) { ata_eh_qc_complete(qc); } So it will call complete for all 4 tags, regardless is they had sense data or not. scsi_eh_flush_done_q() will soon be called, and since ata_eh_qc_complete() sets scmd->retries == scmd->allowed, none of the four commands will be retired. if (!scmd->result && !(scmd->flags & SCMD_EH_SUCCESS_CMD)) scmd->result |= (DID_TIME_OUT << 16); The 2 commands with sense data will not get DID_TIMEOUT, because scmd->result has the SCSI ML byte set (which is set by scsi_check_sense()). The 2 commands without sense data will have scmd->result == 0, so they will get DID_TIME_OUT set if we don't have the extra !(scmd->flags & SCMD_EH_SUCCESS_CMD)) condition. SCSI could add an additional check for: if (!scmd->result && !scsi_sense_valid(scmd) && !(scmd->flags & SCMD_EH_SUCCESS_CMD)) scmd->result |= (DID_TIME_OUT << 16); so that a command with does have sense data, but scsi_check_sense() did not set any SCSI ML byte, does not get DID_TIME_OUT set. However, for CDL policy 0xD, which this patch cares about, we would still need the "&& !(scmd->flags & SCMD_EH_SUCCESS_CMD))", so at least from a CDL perspective, I don't see how any benefit of also adding a check for "&& !scsi_sense_valid(scmd)". Kind regards, Niklas