On Wed, May 2, 2018 at 11:06 AM Daniel Vetter <daniel.vetter@xxxxxxxx> wrote: > > On Mon, Apr 30, 2018 at 11:17 PM, Dmitry Torokhov > <dmitry.torokhov@xxxxxxxxx> wrote: > > Hi Daniel, > > > > On Mon, Apr 30, 2018 at 09:56:49PM +0200, Daniel Vetter wrote: > >> At least trackpoint_disconnect wants to remove some sysfs files, and > >> we can't remove sysfs files while holding psmouse_mutex: > >> > >> ====================================================== > >> WARNING: possible circular locking dependency detected > >> 4.16.0-rc5-g613eb885b69e-drmtip_1+ #1 Tainted: G U > >> ------------------------------------------------------ > >> kworker/0:3/102 is trying to acquire lock: > >> (kn->count#130){++++}, at: [<000000009679748b>] kernfs_remove_by_name_ns+0x3b/0x80 > >> > >> but task is already holding lock: > >> (psmouse_mutex){+.+.}, at: [<0000000014f44bcc>] psmouse_disconnect+0x62/0x160 > >> > >> which lock already depends on the new lock. > >> > >> the existing dependency chain (in reverse order) is: > >> > >> -> #1 (psmouse_mutex){+.+.}: > >> psmouse_attr_set_helper+0x28/0x140 > >> kernfs_fop_write+0xfe/0x180 > >> __vfs_write+0x1e/0x130 > >> vfs_write+0xbd/0x1b0 > >> SyS_write+0x40/0xa0 > >> do_syscall_64+0x65/0x1a0 > >> entry_SYSCALL_64_after_hwframe+0x42/0xb7 > >> > >> -> #0 (kn->count#130){++++}: > >> __kernfs_remove+0x243/0x2b0 > >> kernfs_remove_by_name_ns+0x3b/0x80 > >> remove_files.isra.0+0x2b/0x60 > >> sysfs_remove_group+0x38/0x80 > >> sysfs_remove_groups+0x24/0x40 > >> trackpoint_disconnect+0x2c/0x50 > >> psmouse_disconnect+0x8f/0x160 > >> serio_disconnect_driver+0x28/0x40 > >> serio_driver_remove+0xc/0x10 > >> device_release_driver_internal+0x15b/0x230 > >> serio_handle_event+0x1c8/0x260 > >> process_one_work+0x215/0x620 > >> worker_thread+0x48/0x3a0 > >> kthread+0xfb/0x130 > >> ret_from_fork+0x3a/0x50 > >> > >> other info that might help us debug this: > >> > >> Possible unsafe locking scenario: > >> > >> CPU0 CPU1 > >> ---- ---- > >> lock(psmouse_mutex); > >> lock(kn->count#130); > >> lock(psmouse_mutex); > >> lock(kn->count#130); > >> > >> *** DEADLOCK *** > >> > >> 6 locks held by kworker/0:3/102: > >> #0: ((wq_completion)"events_long"){+.+.}, at: [<000000002e408bfa>] process_one_work+0x191/0x620 > >> #1: (serio_event_work){+.+.}, at: [<000000002e408bfa>] process_one_work+0x191/0x620 > >> #2: (serio_mutex){+.+.}, at: [<00000000c8a49847>] serio_handle_event+0x23/0x260 > >> #3: (&dev->mutex){....}, at: [<00000000b55eee75>] device_release_driver_internal+0x2f/0x230 > >> #4: (&serio->drv_mutex){+.+.}, at: [<000000009719f997>] serio_disconnect_driver+0x16/0x40 > >> #5: (psmouse_mutex){+.+.}, at: [<0000000014f44bcc>] psmouse_disconnect+0x62/0x160 > >> > >> stack backtrace: > >> CPU: 0 PID: 102 Comm: kworker/0:3 Tainted: G U 4.16.0-rc5-g613eb885b69e-drmtip_1+ #1 > >> Hardware name: LENOVO 74591P0/74591P0, BIOS 6DET28WW (1.05 ) 07/30/2008 > >> Workqueue: events_long serio_handle_event > >> Call Trace: > >> dump_stack+0x5f/0x86 > >> print_circular_bug.isra.18+0x1d0/0x2c0 > >> __lock_acquire+0x14ae/0x1b60 > >> ? kernfs_remove_by_name_ns+0x20/0x80 > >> ? lock_acquire+0xaf/0x200 > >> lock_acquire+0xaf/0x200 > >> ? kernfs_remove_by_name_ns+0x3b/0x80 > >> __kernfs_remove+0x243/0x2b0 > >> ? kernfs_remove_by_name_ns+0x3b/0x80 > >> ? kernfs_name_hash+0xd/0x70 > >> ? kernfs_find_ns+0x7e/0x100 > >> kernfs_remove_by_name_ns+0x3b/0x80 > >> remove_files.isra.0+0x2b/0x60 > >> sysfs_remove_group+0x38/0x80 > >> sysfs_remove_groups+0x24/0x40 > >> trackpoint_disconnect+0x2c/0x50 > >> psmouse_disconnect+0x8f/0x160 > >> serio_disconnect_driver+0x28/0x40 > >> serio_driver_remove+0xc/0x10 > >> device_release_driver_internal+0x15b/0x230 > >> serio_handle_event+0x1c8/0x260 > >> process_one_work+0x215/0x620 > >> worker_thread+0x48/0x3a0 > >> ? _raw_spin_unlock_irqrestore+0x4c/0x60 > >> kthread+0xfb/0x130 > >> ? process_one_work+0x620/0x620 > >> ? _kthread_create_on_node+0x30/0x30 > >> ret_from_fork+0x3a/0x50 > >> > >> Signed-off-by: Daniel Vetter <daniel.vetter@xxxxxxxxx> > >> Cc: Dmitry Torokhov <dmitry.torokhov@xxxxxxxxx> > >> Cc: Benjamin Tissoires <benjamin.tissoires@xxxxxxxxxx> > >> Cc: Daniel Vetter <daniel.vetter@xxxxxxxx> > >> Cc: Arvind Yadav <arvind.yadav.cs@xxxxxxxxx> > >> Cc: Stephen Lyons <slysven@xxxxxxxxxxxxxxx> > >> Cc: linux-input@xxxxxxxxxxxxxxx > >> --- > >> drivers/input/mouse/psmouse-base.c | 2 ++ > >> 1 file changed, 2 insertions(+) > >> > >> diff --git a/drivers/input/mouse/psmouse-base.c b/drivers/input/mouse/psmouse-base.c > >> index 8900c3166ebf..06ccd8e22f3c 100644 > >> --- a/drivers/input/mouse/psmouse-base.c > >> +++ b/drivers/input/mouse/psmouse-base.c > >> @@ -1484,8 +1484,10 @@ static void psmouse_disconnect(struct serio *serio) > >> psmouse_deactivate(parent); > >> } > >> > >> + mutex_unlock(&psmouse_mutex); > >> if (psmouse->disconnect) > >> psmouse->disconnect(psmouse); > >> + mutex_lock(&psmouse_mutex); > > > > Why do you think it is proper to drop this mutex? It is introduced for a > > reason. > > > > I think the trace you are seeing is due to: > > > > commit 988cd7afb3f37598891ca70b4c6eb914c338c46a > > Author: Tejun Heo <tj@xxxxxxxxxx> > > Date: Mon Feb 3 14:02:58 2014 -0500 > > > > kernfs: remove kernfs_addrm_cxt > > > > where we started taking kernfs_mutex when adding/removing sysfs files. > > Ultimately I think we may have to change switching protocol to a work > > item to be done asynchronously from writing to sysfs attribute. > > Well "holding a lock while adding/removing sysfs files and holding the > same lock from sysfs read/write callbacks" is a classic deadlock. I've > made lockdep shut up about it, I have no idea how to fix it properly. > kernfs switching it's locking doesn't change anything here. > > Generally the fix is to untangle the locking: You need 1 set of locks > to protect the datastructures exposed through sysfs, and another thing > (maybe that's provided by serio already, I have no idea) to make sure > you're ordering connect/disconnect handling correct and there's not > concurrent calls to this hooks. Assuming serio does give that > guarantee then there's no need to hold the lock while unregistering > the sysfs files (we could perhaps push the lock dropping down into > trackpoint_disconnect, but that's less maintainable imo since it's not > obvious in psmouse_disconnect what's going on), and my patch is > correct. > > But I didn't do a full locking audit of what exactly serio guarantees, > and what exactly psmouse_mutex needs to protect (and where > psmouse_mutex is only held because it's convenient). Ping? Just noticed that I still have this bugfix hanging around. bugfix as in "make lockdep more useful", not necessarily fixing the locking here properly. I guess I could respin to add a FIXME comment, but not sure how useful that would be. -Daniel -- Daniel Vetter Software Engineer, Intel Corporation +41 (0) 79 365 57 48 - http://blog.ffwll.ch _______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/intel-gfx