On Mon 2019-05-06 09:11:37, Daniel Vetter wrote: > On Fri, May 3, 2019 at 5:14 PM Petr Mladek <pmladek@xxxxxxxx> wrote: > > On Thu 2019-05-02 16:16:43, Daniel Vetter wrote: > > > console_trylock, called from within printk, can be called from pretty > > > much anywhere. Including try_to_wake_up. Note that this isn't common, > > > usually the box is in pretty bad shape at that point already. But it > > > really doesn't help when then lockdep jumps in and spams the logs, > > > potentially obscuring the real backtrace we're really interested in. > > > One case I've seen (slightly simplified backtrace): > > > > > > Call Trace: > > > <IRQ> > > > console_trylock+0xe/0x60 > > > vprintk_emit+0xf1/0x320 > > > printk+0x4d/0x69 > > > __warn_printk+0x46/0x90 > > > native_smp_send_reschedule+0x2f/0x40 > > > check_preempt_curr+0x81/0xa0 > > > ttwu_do_wakeup+0x14/0x220 > > > try_to_wake_up+0x218/0x5f0 > > > pollwake+0x6f/0x90 > > > credit_entropy_bits+0x204/0x310 > > > add_interrupt_randomness+0x18f/0x210 > > > handle_irq+0x67/0x160 > > > do_IRQ+0x5e/0x130 > > > common_interrupt+0xf/0xf > > > </IRQ> > > > > > > This alone isn't a problem, but the spinlock in the semaphore is also > > > still held while waking up waiters (up() -> __up() -> try_to_wake_up() > > > callchain), which then closes the runqueue vs. semaphore.lock loop, > > > and upsets lockdep, which issues a circular locking splat to dmesg. > > > Worse it upsets developers, since we don't want to spam dmesg with > > > clutter when the machine is dying already. > > > > > > Fix this by creating a __down_trylock which only trylocks the > > > semaphore.lock. This isn't correct in full generality, but good enough > > > for console_lock: > > > > > > - there's only ever one console_lock holder, we won't fail spuriously > > > because someone is doing a down() or up() while there's still room > > > (unlike other semaphores with count > 1). > > > > > > - console_unlock() has one massive retry loop, which will catch anyone > > > who races the trylock against the up(). This makes sure that no > > > printk lines will get lost. Making the trylock more racy therefore > > > has no further impact. > > > > To be honest, I do not see how this could solve the problem. > > > > The circular dependency is still there. If the new __down_trylock() > > succeeds then console_unlock() will get called in the same context > > and it will still need to call up() -> try_to_wake_up(). > > > > Note that there are many other console_lock() callers that might > > happen in parallel and might appear in the wait queue. > > Hm right. It's very rare we hit this in our CI and I don't know how to > repro otherwise, so just threw this out at the wall to see if it > sticks. I'll try and come up with a new trick then. Single messages are printed from scheduler via printk_deferred(). WARN() might be solved by introducing printk deferred context, see the per-cpu variable printk_context. Best Regards, Petr _______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/intel-gfx