Re: [PATCH v2] ata: libata: Clear DID_TIME_OUT for ATA PT commands with sense data

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 9/9/24 22:08, Niklas Cassel wrote:
> On Mon, Sep 09, 2024 at 09:52:53PM +0900, Damien Le Moal wrote:
>> On 9/9/24 17:47, Niklas Cassel wrote:
>>> When ata_qc_complete() schedules a command for EH using
>>> ata_qc_schedule_eh(), blk_abort_request() will be called, which leads to
>>> req->q->mq_ops->timeout() / scsi_timeout() being called.
>>>
>>> scsi_timeout(), if the LLDD has no abort handler (libata has no abort
>>> handler), will set host byte to DID_TIME_OUT, and then call
>>> scsi_eh_scmd_add() to add the command to EH.
>>>
>>> Thus, when commands first enter libata's EH strategy_handler, all the
>>> commands that have been added to EH will have DID_TIME_OUT set.
>>>
>>> libata has its own flag (AC_ERR_TIMEOUT), that it sets for commands that
>>> have not received a completion at the time of entering EH.
>>>
>>> Thus, libata doesn't really care about DID_TIME_OUT at all, and currently
>>> clears the host byte at the end of EH, in ata_scsi_qc_complete(), before
>>> scsi_eh_finish_cmd() is called.
>>>
>>> However, this clearing in ata_scsi_qc_complete() is currently only done
>>> for commands that are not ATA passthrough commands.
>>>
>>> Since the host byte is visible in the completion that we return to user
>>> space for ATA passthrough commands, for ATA passthrough commands that got
>>> completed via EH (commands with sense data), the user will incorrectly see:
>>> ATA pass-through(16): transport error: Host_status=0x03 [DID_TIME_OUT]
>>>
>>> Fix this by moving the clearing of the host byte (which is currently only
>>> done for commands that are not ATA passthrough commands) from
>>> ata_scsi_qc_complete() to the start of EH (regardless if the command is
>>> ATA passthrough or not).
>>>
>>> This will make sure that we:
>>> -Correctly clear DID_TIME_OUT for both ATA passthrough commands and
>>>  commands that are not ATA passthrough commands.
>>> -Do not needlessly clear the host byte for commands that did not go via EH.
>>>  ata_scsi_qc_complete() is called both for commands that are completed
>>>  normally (without going via EH), and for commands that went via EH,
>>>  however, only commands that went via EH will have DID_TIME_OUT set.
>>>
>>> Fixes: 24aeebbf8ea9 ("scsi: ata: libata: Change ata_eh_request_sense() to not set CHECK_CONDITION")
>>> Reported-by: Igor Pylypiv <ipylypiv@xxxxxxxxxx>
>>> Closes: https://lore.kernel.org/linux-ide/ZttIN8He8TOZ7Lct@xxxxxxxxxx/
>>> Tested-by: Igor Pylypiv <ipylypiv@xxxxxxxxxx>
>>> Signed-off-by: Niklas Cassel <cassel@xxxxxxxxxx>
>>> ---
>>> Changes since v1:
>>> -Picked up tags from Igor.
>>> -Added Fixes tag.
>>> -Improved the commit message to clearly state that this is currently a
>>>  real bug for ATA PT commands with sense data.
>>>
>>>  drivers/ata/libata-eh.c   | 9 +++++++++
>>>  drivers/ata/libata-scsi.c | 3 ---
>>>  2 files changed, 9 insertions(+), 3 deletions(-)
>>>
>>> diff --git a/drivers/ata/libata-eh.c b/drivers/ata/libata-eh.c
>>> index 7de97ee8e78b..450e9bd96c97 100644
>>> --- a/drivers/ata/libata-eh.c
>>> +++ b/drivers/ata/libata-eh.c
>>> @@ -630,6 +630,15 @@ void ata_scsi_cmd_error_handler(struct Scsi_Host *host, struct ata_port *ap,
>>>  	list_for_each_entry_safe(scmd, tmp, eh_work_q, eh_entry) {
>>>  		struct ata_queued_cmd *qc;
>>>  
>>> +		/*
>>> +		 * If the scmd was added to EH, via ata_qc_schedule_eh() ->
>>> +		 * scsi_timeout() -> scsi_eh_scmd_add(), scsi_timeout() will
>>> +		 * have set DID_TIME_OUT (since libata does not have an abort
>>> +		 * handler). Thus to clear the DID_TIME_OUT, we clear the host
>>> +		 * byte (but keep the SCSI ML and status byte).
>>> +		 */
>>> +		scmd->result &= 0x0000ffff;
>>
>> I know it was like that, but why not:
>>
>> 		set_host_byte(scmd, 0);
>> or
>> 		set_host_byte(scmd, DID_OK);
>>
>> ?
> 
> No particular reason. Since we basically just moving the code,
> it made sense to keep it similar to the original code, but I
> can submit a v3 that instead does:
> set_host_byte(scmd, DID_OK);
> 
> if you prefer that.

I do prefer it. The magic 0x0000ffff mask is not exactly clear (the comment
helps though)...

Side note: patching scsi to define macros for status, ML and host byte masks and
shifts would be very nice :)

> 
> Strictly speaking, that would probably require us to drop Igor's
> Tested-by though (even if the generated code for an optimizing
> compiler ought to generate the same code).
> 
> 
>>
>> set_host_byte() uses the mask 0xff00ffff, since the upper 8 bits seem to be
>> ignored: bits [0..7] are the status byte, [16..23] are the host byte and bits
>> [8..15] are the message byte but that is unused.
> 
> Nit: 8..16 is the SCSI midlayer byte, not message byte, see
> 36ebf1e2aa14 ("scsi: core: Add error codes for internal SCSI midlayer use")
> 
> 
> Kind regards,
> Niklas

-- 
Damien Le Moal
Western Digital Research





[Index of Archives]     [Linux Filesystems]     [Linux SCSI]     [Linux RAID]     [Git]     [Kernel Newbies]     [Linux Newbie]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Samba]     [Device Mapper]

  Powered by Linux