Re: [PATCH v3 27/31] scsi: pm8001: Cleanup pm8001_queue_command()

Damien Le Moal <damien.lemoal@xxxxxxxxxxxxxxxxxx> · Thu, 17 Feb 2022 20:47:29 +0900

On 2/17/22 18:23, John Garry wrote:
> On 17/02/2022 00:12, Damien Le Moal wrote:
>>>>> I'll have a look at it. And that is on mainline or mkp-scsi staging, and
>>>>> not your patchset.
>>>> Are you saying that my patches suppresses the above ? This is submission
>>>> path and the dma code seems to complain about alignment... So bad buffer
>>>> addresses ?
>>> Your series does not suppress it. It doesn't occur often, so I need to
>>> check more.
>>>
>>> I think the issue is that we call dma_map_sg() twice, i.e. ccb never
>>> unmapped.
>> That would be a big issue indeed. We could add a flag to CCBs to track
>> the buf_prd DMA mapping state and BUG_ON() when ccb free function is
>> called with the buffer still mapped. That should allow catching this
>> infrequent problem ?
>>
> 
> I figured out what is happening here and it does not help solve the 
> mystery of my hang.
> 
> Here's the steps:
> a. scsi_cmnd times out
> b. scsi error handling kicks in
> c. libsas attempts to abort the task, which fails
> d. libsas then tries IT nexus reset, which passes
>   - libsas assumes the scsi_cmnd has completed with failure
> e. error handling concludes
> f. scsi midlayer then retries the same scsi_cmnd
> g. since we did not "free" associated ccb earlier or dma unmap at d., 
> the dma unmap on the same scsi_cmnd causes the warn
> 
> So the LLD should really free resources and dma unmap at point IT nexus 
> reset completes, but it doesn't. I think in certain conditions dma map 
> should not be done twice.
> 
> Anyway, that can be fixed, but I still have the hang :(

One thought: could it be bug with the DMA engine of your platform ?
What if you simply run an fio workload on the disk directly (no FS),
hang happens too ?

For the bugs I fixed with my series, it was the reverse: fio worked
great but everything broke down when I ran libzbc tests...

> 
> Thanks,
> John

-- 
Damien Le Moal
Western Digital Research