On Wed, Aug 12, 2020 at 08:48:13AM -0700, Guenter Roeck wrote: > Hi, > > crbug.com/1114800 reports a hard lockup due to circular locking in the > 8250 console driver. This is seen if CONFIG_PROVE_LOCKING is enabled. > > Problem is as follows: > - serial8250_do_startup() locks the serial (console) port. > - serial8250_do_startup() then disables interrupts if interrupts are > shared, by calling disable_irq_nosync(). > - disable_irq_nosync() calls __irq_get_desc_lock() to lock the interrupt > descriptor. > - __irq_get_desc_lock() calls lock_acquire() > - If CONFIG_PROVE_LOCKING is enabled, validate_chain() and check_noncircular() > are called and identify a potential locking error. > - This locking error is reported via printk, which ultimately calls > serial8250_console_write(). > - serial8250_console_write() tries to lock the serial console port. > Since it is already locked, the system hangs and ultimately reports > a hard lockup. > > I understand we'll need to figure out and fix what lockdep complains about, > and I am working on that. However, even if that is fixed, we'll need a > solution for the recursive lock: Fixing the lockdep problem doesn't > guarantee that a similar problem (or some other log message) won't be > detected and reported sometime in the future while serial8250_do_startup() > holds the console port lock. > > Ideas, anyone ? Everything I came up with so far seems clumsy and hackish. > Turns out the situation is a bit worse than I thought. disable_irq_nosync(), when called from serial8250_do_startup(), locks the interrupt descriptor. The order of locking is serial port lock interrupt descriptor lock At the same time, __setup_irq() locks the interrupt descriptor as well. With the descriptor locked, it may report an error message using pr_err(). This in turn may call serial8250_console_write(), which will try to lock the console serial port. The lock sequence is interrupt descriptor lock serial port lock I added the lockdep splat to the bug log at crbug.com/1114800. Effectively, I think, this means we can't call disable_irq_nosync() while holding a serial port lock, or at least not while holding a serial port lock that is associated with a console. The problem was introduced (or, rather, exposed) with upstream commit 7febbcbc48fc ("serial: 8250: Check UPF_IRQ_SHARED in advance"). Guenter