On Fri, May 3, 2019 at 5:14 PM Petr Mladek <pmladek@xxxxxxxx> wrote: > On Thu 2019-05-02 16:16:43, Daniel Vetter wrote: > > console_trylock, called from within printk, can be called from pretty > > much anywhere. Including try_to_wake_up. Note that this isn't common, > > usually the box is in pretty bad shape at that point already. But it > > really doesn't help when then lockdep jumps in and spams the logs, > > potentially obscuring the real backtrace we're really interested in. > > One case I've seen (slightly simplified backtrace): > > > > Call Trace: > > <IRQ> > > console_trylock+0xe/0x60 > > vprintk_emit+0xf1/0x320 > > printk+0x4d/0x69 > > __warn_printk+0x46/0x90 > > native_smp_send_reschedule+0x2f/0x40 > > check_preempt_curr+0x81/0xa0 > > ttwu_do_wakeup+0x14/0x220 > > try_to_wake_up+0x218/0x5f0 > > pollwake+0x6f/0x90 > > credit_entropy_bits+0x204/0x310 > > add_interrupt_randomness+0x18f/0x210 > > handle_irq+0x67/0x160 > > do_IRQ+0x5e/0x130 > > common_interrupt+0xf/0xf > > </IRQ> > > > > This alone isn't a problem, but the spinlock in the semaphore is also > > still held while waking up waiters (up() -> __up() -> try_to_wake_up() > > callchain), which then closes the runqueue vs. semaphore.lock loop, > > and upsets lockdep, which issues a circular locking splat to dmesg. > > Worse it upsets developers, since we don't want to spam dmesg with > > clutter when the machine is dying already. > > > > Fix this by creating a __down_trylock which only trylocks the > > semaphore.lock. This isn't correct in full generality, but good enough > > for console_lock: > > > > - there's only ever one console_lock holder, we won't fail spuriously > > because someone is doing a down() or up() while there's still room > > (unlike other semaphores with count > 1). > > > > - console_unlock() has one massive retry loop, which will catch anyone > > who races the trylock against the up(). This makes sure that no > > printk lines will get lost. Making the trylock more racy therefore > > has no further impact. > > To be honest, I do not see how this could solve the problem. > > The circular dependency is still there. If the new __down_trylock() > succeeds then console_unlock() will get called in the same context > and it will still need to call up() -> try_to_wake_up(). > > Note that there are many other console_lock() callers that might > happen in parallel and might appear in the wait queue. Hm right. It's very rare we hit this in our CI and I don't know how to repro otherwise, so just threw this out at the wall to see if it sticks. I'll try and come up with a new trick then. Thanks, Daniel -- Daniel Vetter Software Engineer, Intel Corporation +41 (0) 79 365 57 48 - http://blog.ffwll.ch _______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/intel-gfx