On Mon, May 6, 2019 at 9:48 AM Petr Mladek <pmladek@xxxxxxxx> wrote: > On Mon 2019-05-06 09:11:37, Daniel Vetter wrote: > > On Fri, May 3, 2019 at 5:14 PM Petr Mladek <pmladek@xxxxxxxx> wrote: > > > On Thu 2019-05-02 16:16:43, Daniel Vetter wrote: > > > > console_trylock, called from within printk, can be called from pretty > > > > much anywhere. Including try_to_wake_up. Note that this isn't common, > > > > usually the box is in pretty bad shape at that point already. But it > > > > really doesn't help when then lockdep jumps in and spams the logs, > > > > potentially obscuring the real backtrace we're really interested in. > > > > One case I've seen (slightly simplified backtrace): > > > > > > > > Call Trace: > > > > <IRQ> > > > > console_trylock+0xe/0x60 > > > > vprintk_emit+0xf1/0x320 > > > > printk+0x4d/0x69 > > > > __warn_printk+0x46/0x90 > > > > native_smp_send_reschedule+0x2f/0x40 > > > > check_preempt_curr+0x81/0xa0 > > > > ttwu_do_wakeup+0x14/0x220 > > > > try_to_wake_up+0x218/0x5f0 > > > > pollwake+0x6f/0x90 > > > > credit_entropy_bits+0x204/0x310 > > > > add_interrupt_randomness+0x18f/0x210 > > > > handle_irq+0x67/0x160 > > > > do_IRQ+0x5e/0x130 > > > > common_interrupt+0xf/0xf > > > > </IRQ> > > > > > > > > This alone isn't a problem, but the spinlock in the semaphore is also > > > > still held while waking up waiters (up() -> __up() -> try_to_wake_up() > > > > callchain), which then closes the runqueue vs. semaphore.lock loop, > > > > and upsets lockdep, which issues a circular locking splat to dmesg. > > > > Worse it upsets developers, since we don't want to spam dmesg with > > > > clutter when the machine is dying already. > > > > > > > > Fix this by creating a __down_trylock which only trylocks the > > > > semaphore.lock. This isn't correct in full generality, but good enough > > > > for console_lock: > > > > > > > > - there's only ever one console_lock holder, we won't fail spuriously > > > > because someone is doing a down() or up() while there's still room > > > > (unlike other semaphores with count > 1). > > > > > > > > - console_unlock() has one massive retry loop, which will catch anyone > > > > who races the trylock against the up(). This makes sure that no > > > > printk lines will get lost. Making the trylock more racy therefore > > > > has no further impact. > > > > > > To be honest, I do not see how this could solve the problem. > > > > > > The circular dependency is still there. If the new __down_trylock() > > > succeeds then console_unlock() will get called in the same context > > > and it will still need to call up() -> try_to_wake_up(). > > > > > > Note that there are many other console_lock() callers that might > > > happen in parallel and might appear in the wait queue. > > > > Hm right. It's very rare we hit this in our CI and I don't know how to > > repro otherwise, so just threw this out at the wall to see if it > > sticks. I'll try and come up with a new trick then. > > Single messages are printed from scheduler via printk_deferred(). > WARN() might be solved by introducing printk deferred context, > see the per-cpu variable printk_context. I convinced myself that I can take the wake_up_process out from under the spinlock, for the limited case of the console lock. I think that's a cleaner and more robust fix than leaking printk_context trickery into the console_unlock code. -Daniel -- Daniel Vetter Software Engineer, Intel Corporation +41 (0) 79 365 57 48 - http://blog.ffwll.ch _______________________________________________ dri-devel mailing list dri-devel@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/dri-devel