On 5/22/23 23:22, Helge Deller wrote:
It hangs in fs/aio.c:1128, function aio_complete(), in this call:
spin_lock_irqsave(&ctx->completion_lock, flags);
All code that I found and that obtains ctx->completion_lock disables IRQs.
It is not clear to me how this spinlock can be locked recursively? Is it
sure that the "spinlock recursion" report is correct?
Yes, it seems correct.
[...]
Bart, thanks to your suggestions I was able to narrow down the problem!
I got LOCKDEP working on parisc, which then reports:
raw_local_irq_restore() called with IRQs enabled
for the spin_unlock_irqrestore() in function aio_complete(), which shouldn't happen.
Finally, I found that parisc's flush_dcache_page() re-enables the IRQs
which leads to the spinlock hang in aio_complete().
So, this is NOT a bug in aio or scsci, but we need fix in the the arch code.
While checking flush_dcache_page() re-enables IRQs, I see on parisc and ARM(32):
flush_dcache_page() calls:
-> flush_dcache_mmap_lock() / flush_dcache_mmap_unlock()
which uses: xa_lock_irq() / xa_unlock_irq()
So, the call to xa_unlock_irq() re-enables the IRQs unconditionally
and triggers the hang in aio_complete().
I temporarily #defined flush_dcache_mmap_lock() to NOP and the kernel booted nicely.
Not sure yet what the best fix is...
Helge