On Tue, May 23, 2023 at 12:24:04PM +0200, Helge Deller wrote: > On 5/22/23 23:22, Helge Deller wrote: > > > > It hangs in fs/aio.c:1128, function aio_complete(), in this call: > > > > spin_lock_irqsave(&ctx->completion_lock, flags); > > > > > > All code that I found and that obtains ctx->completion_lock disables IRQs. > > > It is not clear to me how this spinlock can be locked recursively? Is it > > > sure that the "spinlock recursion" report is correct? > > > > Yes, it seems correct. > > [...] > > Bart, thanks to your suggestions I was able to narrow down the problem! > > I got LOCKDEP working on parisc, which then reports: > raw_local_irq_restore() called with IRQs enabled > for the spin_unlock_irqrestore() in function aio_complete(), which shouldn't happen. > > Finally, I found that parisc's flush_dcache_page() re-enables the IRQs > which leads to the spinlock hang in aio_complete(). > > So, this is NOT a bug in aio or scsci, but we need fix in the the arch code. You can find some of the background to this at: https://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.git/commit/?id=16ceff2d5dc9f0347ab5a08abff3f4647c2fee04 which introduced flush_dcache_mmap_lock(). It looks like Hugh had questions over whether this should be _irqsave() rather than _irq() but I guess at the time all callers had interrupts enabled, and it's only recently that someone came up with the idea of calling flush_dcache_page() with interrupts disabled. Adding another arg to flush_dcache_mmap_lock() to save the flags may be doable, but requires a patch that touches not only architectures that have a private implementation, but also various code in mm/. -- RMK's Patch system: https://www.armlinux.org.uk/developer/patches/ FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last!