On 17/02/2022 00:12, Damien Le Moal wrote:
I'll have a look at it. And that is on mainline or mkp-scsi staging, and
not your patchset.
Are you saying that my patches suppresses the above ? This is submission
path and the dma code seems to complain about alignment... So bad buffer
addresses ?
Your series does not suppress it. It doesn't occur often, so I need to
check more.
I think the issue is that we call dma_map_sg() twice, i.e. ccb never
unmapped.
That would be a big issue indeed. We could add a flag to CCBs to track
the buf_prd DMA mapping state and BUG_ON() when ccb free function is
called with the buffer still mapped. That should allow catching this
infrequent problem ?
I figured out what is happening here and it does not help solve the
mystery of my hang.
Here's the steps:
a. scsi_cmnd times out
b. scsi error handling kicks in
c. libsas attempts to abort the task, which fails
d. libsas then tries IT nexus reset, which passes
- libsas assumes the scsi_cmnd has completed with failure
e. error handling concludes
f. scsi midlayer then retries the same scsi_cmnd
g. since we did not "free" associated ccb earlier or dma unmap at d.,
the dma unmap on the same scsi_cmnd causes the warn
So the LLD should really free resources and dma unmap at point IT nexus
reset completes, but it doesn't. I think in certain conditions dma map
should not be done twice.
Anyway, that can be fixed, but I still have the hang :(
Thanks,
John