Re: [PATCH v3 27/31] scsi: pm8001: Cleanup pm8001_queue_command()

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 2/17/22 21:49, John Garry wrote:
>>>>
>>>
>>> I figured out what is happening here and it does not help solve the
>>> mystery of my hang.
>>>
>>> Here's the steps:
>>> a. scsi_cmnd times out
>>> b. scsi error handling kicks in
>>> c. libsas attempts to abort the task, which fails
>>> d. libsas then tries IT nexus reset, which passes
>>>    - libsas assumes the scsi_cmnd has completed with failure
>>> e. error handling concludes
>>> f. scsi midlayer then retries the same scsi_cmnd
>>> g. since we did not "free" associated ccb earlier or dma unmap at d.,
>>> the dma unmap on the same scsi_cmnd causes the warn
>>>
>>> So the LLD should really free resources and dma unmap at point IT nexus
>>> reset completes, but it doesn't. I think in certain conditions dma map
>>> should not be done twice.
>>>
>>> Anyway, that can be fixed, but I still have the hang :(
>>
>> I guess (a) (cmd timeout) is only the symptom of the hang ? That is, the
>> hang is causing the timeout ?
> 
> Right
> 
>> It may be good to turn on scsi trace to see if the command was only
>> partially done, or not at all, or if it is a non-data command.
>>
> 
> I could do that. But I think that the command just does not complete. Or 
> maybe it is missed.
> 
>> And speaking of errors, I am currently testing v4 of my series and
>> noticed some weird things in the error handling. E.g., with one of the
>> test executing a report zones command with an LBA out of range, I see this:
>>
>> [23962.027105] pm80xx0:: mpi_sata_event  2788:SATA EVENT 0x23
>> [23962.036099] pm80xx0:: pm80xx_send_read_log  1863:Executing read log end
>>
> 
> I don't know why the driver even does this, but the implementation of 
> pm80xx_send_read_log() is questionable. It would be nice to not see ATA 
> code in the driver like this.

I have been thinking about this one. We should be able to avoid this
read log and rely on libata-eh to do it. All we should need to do is an
internal abort all without completing the commands. libata will do the
read log and resend the retry for the failed command (if appropriate)
and all the other aborted NCQ commands.

Need to look at how other libsas drivers are handling this. But the
above should work, I think.

Not adding this to the current series though :) That will be for another
patch series.

-- 
Damien Le Moal
Western Digital Research



[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [SCSI Target Devel]     [Linux SCSI Target Infrastructure]     [Kernel Newbies]     [IDE]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux ATA RAID]     [Linux IIO]     [Samba]     [Device Mapper]

  Powered by Linux