On Sat, Sep 22, 2012 at 01:06:29PM -0700, Greg KH wrote: > On Sat, Sep 22, 2012 at 07:52:11PM +0200, Daniel Vetter wrote: > > Dave Airlie recently discovered a locking bug in the fbcon layer, > > where a timer_del_sync (for the blinking cursor) deadlocks with the > > timer itself, since both (want to) hold the console_lock: > > > > https://lkml.org/lkml/2012/8/21/36 > > > > Unfortunately the console_lock isn't a plain mutex and hence has no > > lockdep support. Which resulted in a few days wasted of tracking down > > this bug (complicated by the fact that printk doesn't show anything > > when the console is locked) instead of noticing the bug much earlier > > with the lockdep splat. > > > > Hence I've figured I need to fix that for the next deadlock involving > > console_lock - and with kms/drm growing ever more complex locking > > that'll eventually happen. > > > > Now the console_lock has rather funky semantics, so after a quick irc > > discussion with Thomas Gleixner and Dave Airlie I've quickly ditched > > the original idead of switching to a real mutex (since it won't work) > > and instead opted to annotate the console_lock with lockdep > > information manually. > > > > There are a few special cases: > > - The console_lock state is protected by the console_sem, and usually > > grabbed/dropped at _lock/_unlock time. But the suspend/resume code > > drops the semaphore without dropping the console_lock (see > > suspend_console/resume_console). But since the same thread that did > > the suspend will do the resume, we don't need to fix up anything. > > > > - In the printk code there's a special trylock, only used to kick off > > the logbuffer printk'ing in console_unlock. But all that happens > > while lockdep is disable (since printk does a few other evil > > tricks). So no issue there, either. > > > > - The console_lock can also be acquired form irq context (but only > > with a trylock). lockdep already handles that. > > > > This all leaves us with annotating the normal console_lock, _unlock > > and _trylock functions. > > > > And yes, it works - simply unloading a drm kms driver resulted in > > lockdep complaining about the deadlock in fbcon_deinit: > > > > ====================================================== > > [ INFO: possible circular locking dependency detected ] > > 3.6.0-rc2+ #552 Not tainted > > ------------------------------------------------------- > > kms-reload/3577 is trying to acquire lock: > > ((&info->queue)){+.+...}, at: [<ffffffff81058c70>] wait_on_work+0x0/0xa7 > > > > but task is already holding lock: > > (console_lock){+.+.+.}, at: [<ffffffff81264686>] bind_con_driver+0x38/0x263 > > > > which lock already depends on the new lock. > > > > the existing dependency chain (in reverse order) is: > > > > -> #1 (console_lock){+.+.+.}: > > [<ffffffff81087440>] lock_acquire+0x95/0x105 > > [<ffffffff81040190>] console_lock+0x59/0x5b > > [<ffffffff81209cb6>] fb_flashcursor+0x2e/0x12c > > [<ffffffff81057c3e>] process_one_work+0x1d9/0x3b4 > > [<ffffffff810584a2>] worker_thread+0x1a7/0x24b > > [<ffffffff8105ca29>] kthread+0x7f/0x87 > > [<ffffffff813b1204>] kernel_thread_helper+0x4/0x10 > > > > -> #0 ((&info->queue)){+.+...}: > > [<ffffffff81086cb3>] __lock_acquire+0x999/0xcf6 > > [<ffffffff81087440>] lock_acquire+0x95/0x105 > > [<ffffffff81058cab>] wait_on_work+0x3b/0xa7 > > [<ffffffff81058dd6>] __cancel_work_timer+0xbf/0x102 > > [<ffffffff81058e33>] cancel_work_sync+0xb/0xd > > [<ffffffff8120a3b3>] fbcon_deinit+0x11c/0x1dc > > [<ffffffff81264793>] bind_con_driver+0x145/0x263 > > [<ffffffff81264a45>] unbind_con_driver+0x14f/0x195 > > [<ffffffff8126540c>] store_bind+0x1ad/0x1c1 > > [<ffffffff8127cbb7>] dev_attr_store+0x13/0x1f > > [<ffffffff8116d884>] sysfs_write_file+0xe9/0x121 > > [<ffffffff811145b2>] vfs_write+0x9b/0xfd > > [<ffffffff811147b7>] sys_write+0x3e/0x6b > > [<ffffffff813b0039>] system_call_fastpath+0x16/0x1b > > > > other info that might help us debug this: > > > > Possible unsafe locking scenario: > > > > CPU0 CPU1 > > ---- ---- > > lock(console_lock); > > lock((&info->queue)); > > lock(console_lock); > > lock((&info->queue)); > > > > *** DEADLOCK *** > > > > v2: Mark the lockdep_map static, noticed by Jani Nikula. > > > > Cc: Dave Airlie <airlied at gmail.com> > > Cc: Thomas Gleixner <tglx at linutronix.de> > > Cc: Alan Cox <alan at lxorguk.ukuu.org.uk> > > Cc: Peter Zijlstra <a.p.zijlstra at chello.nl> > > Signed-off-by: Daniel Vetter <daniel.vetter at ffwll.ch> > > --- > > kernel/printk.c | 9 +++++++++ > > 1 file changed, 9 insertions(+) > > So I'm guessing I should take this through the tty tree, right? Any > objections to that for 3.7? I didn't know who would be the relevant maintainer, so just spammed a few people. Would be awesome if you could merge these patches for 3.7, and at least Alan Cox seems to like them: http://marc.info/?l=linux-fbdev&m=134564125601147&w=1 Thanks, Daniel > > thanks, > > greg k-h -- Daniel Vetter Software Engineer, Intel Corporation +41 (0) 79 365 57 48 - http://blog.ffwll.ch