On Thu, Sep 05, 2024 at 10:33:38AM +0200, Niklas Cassel wrote: Hi Niklas, Thank you so much for a thorough review and coming up with a better patch! > Hello Igor, > > On Wed, Sep 04, 2024 at 10:37:27PM +0000, Igor Pylypiv wrote: > > commit 24aeebbf8ea9 ("scsi: ata: libata: Change ata_eh_request_sense() to > > not set CHECK_CONDITION") changed the way how SAM_STAT_CHECK_CONDITION is > > set. Assignment "scmd->result = SAM_STAT_CHECK_CONDITION;" was replaced by > > set_status_byte() which does not clear the scsicmd->result. > > "which does not clear the scsicmd->result" > > scsicmd->result is a combination of: > -SCSI status byte > -SCSI ML byte > -host byte > > Please be more specific of which byte(s) that you want to clear, > both here and in other places in the commit message. > > > > > > By not clearing the scsicmd->result we end up in a state where both > > the DID_TIME_OUT host byte and the SAM_STAT_CHECK_CONDITION status > > bytes are set. > > > > > > The DID_TIME_OUT host byte is getting set a part of error handling: > > > > ata_qc_complete() > > ata_qc_schedule_eh() > > blk_abort_request() > > WRITE_ONCE(req->deadline, jiffies); > > > > blk_mq_timeout_work() > > blk_mq_check_expired() > > blk_mq_rq_timed_out() > > req->q->mq_ops->timeout() / scsi_timeout() > > set_host_byte(scmd, DID_TIME_OUT); > > I would have reorder your commit log and have this first in the commit log. > > > > > > Having the host byte set to DID_TIME_OUT for a command that didn't timeout > > is confusing. Let's bring the old behavior back by setting scmd->result to > > SAM_STAT_CHECK_CONDITION. > > I think you are missing something very important in the commit log here: > What is the user visible change before and after your change? > For ATA PT commands the 'result' value is being returned to user space so not clearing the DID_TIME_OUT is user visible. > Is there a difference is the error message in dmesg? If not, that should > be mentioned as well. > I haven't seen any dmesg errors reported for ATA PT hence I don't think there is any difference in dmesg. > > > > > Fixes: 24aeebbf8ea9 ("scsi: ata: libata: Change ata_eh_request_sense() to not set CHECK_CONDITION") > > Signed-off-by: Igor Pylypiv <ipylypiv@xxxxxxxxxx> > > --- > > drivers/ata/libata-eh.c | 2 +- > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > diff --git a/drivers/ata/libata-eh.c b/drivers/ata/libata-eh.c > > index 214b935c2ced..4927b40e782f 100644 > > --- a/drivers/ata/libata-eh.c > > +++ b/drivers/ata/libata-eh.c > > @@ -1605,7 +1605,7 @@ static unsigned int ata_eh_analyze_tf(struct ata_queued_cmd *qc) > > */ > > if (!(qc->flags & ATA_QCFLAG_SENSE_VALID) && > > (stat & ATA_SENSE) && ata_eh_request_sense(qc)) > > - set_status_byte(qc->scsicmd, SAM_STAT_CHECK_CONDITION); > > + qc->scsicmd->result = SAM_STAT_CHECK_CONDITION; > > ata_eh_analyze_tf() will only be called on commands that are owned by EH > (has ATA_QCFLAG_EH set). > > Thus this command will end up in: > ata_eh_finish() -> ata_eh_qc_complete() -> __ata_eh_qc_complete() -> > -> __ata_qc_complete() -> qc->complete_fn(). > > complete_fn will be (except in special cases): ata_scsi_qc_complete() > If you look at ata_scsi_qc_complete(), it already clears the host byte: > https://github.com/torvalds/linux/blob/v6.11-rc6/drivers/ata/libata-scsi.c#L1695-L1696 > > So could you please be more specific of what problem this change is fixing? > Is it just that you think that it makes sense to clear the host byte earlier > in the call chain? I should have mentioned that the issue is specific to ATA PT commands. The problem is that ata_scsi_qc_complete() is not clearing the host byte for ATA PT commands causing the DID_TIME_OUT to be returned. > > There are many different paths a QC can take via EH, e.g. policy 0xD NCQ > commands will not fetch sense data via ata_eh_request_sense(), so clearing > the host byte in ata_scsi_qc_complete() should be fine, otherwise we need > to do a similar change to yours in all the different code paths that sets > sense data ...which might actually be something that makes sense, but then > I would expect a patch series that changes all the locations where we set > sense data, not just in ata_eh_analyze_tf(), and then drops the clearing in > ata_scsi_qc_complete() (which was introduced in commit 7574a8377c7a ("ata: > libata-scsi: do not overwrite SCSI ML and status bytes")). > > > See this patch to see all the places where we set sense data: > https://git.kernel.org/pub/scm/linux/kernel/git/libata/linux.git/commit/?h=for-6.12&id=9526dec226f0779d72f798e7a18375bf8d414775 > > > > > Side note: > You might also be interested to know that a command that was sent via EH will > be finished using scsi_eh_finish_cmd() (called by __ata_eh_qc_complete()), > and will thus end up in scsi_eh_flush_done_q(). A command that has sense data > will have scmd->retries == scmd->allowed (set by ata_eh_qc_complete()), so you > will end up in this code path: > https://github.com/torvalds/linux/blob/v6.11-rc6/drivers/scsi/scsi_error.c#L2213-L2227 > > Which means that SCSI EH will set DID_TIME_OUT for any command that does > not have (SCSI ML byte || SCSI status byte || host byte) set. > > A command with sense data will have most often have CHECK_CONDITION set, but > there is also CDL policy 0xD commands (which will have the SCSI ML byte set). > So this special flag SCMD_FORCE_EH_SUCCESS is for commands that were completed > successfully without any SK/ASC/ASCQ in the same interrupt as other policy 0xD > commands which did have SK/ASC/ASCQ set. > For details, see 3d848ca1ebc8 ("scsi: core: Allow libata to complete successful > commands via EH"). > > > Kind regards, > Niklas