On Mon, Dec 10, 2018 at 11:20:58AM +0100, Daniel Vetter wrote: > On Mon, Dec 10, 2018 at 11:18:32AM +0100, Daniel Vetter wrote: > > On Mon, Dec 10, 2018 at 11:06:34AM +0100, Greg Kroah-Hartman wrote: > > > On Mon, Dec 10, 2018 at 09:46:53AM +0100, Daniel Vetter wrote: > > > > Drivers might want to remove some sysfs files, which needs the same > > > > locks and ends up angering lockdep. Relevant snippet of the stack > > > > trace: > > > > > > > > kernfs_remove_by_name_ns+0x3b/0x80 > > > > bus_remove_driver+0x92/0xa0 > > > > acpi_video_unregister+0x24/0x40 > > > > i915_driver_unload+0x42/0x130 [i915] > > > > i915_pci_remove+0x19/0x30 [i915] > > > > pci_device_remove+0x36/0xb0 > > > > device_release_driver_internal+0x185/0x250 > > > > unbind_store+0xaf/0x180 > > > > kernfs_fop_write+0x104/0x190 > > > > > > > > I've stumbled over this because some new patches by Ram connect the > > > > snd-hda-intel unload (where we do use sysfs unbind) with the locking > > > > chains in the i915 unload code (but without creating a new loop), > > > > which upset our CI. But the bug is already there and can be easily > > > > reproduced by unbind i915 directly. > > > > > > This is odd, why wouldn't any driver hit this issue? And why now since > > > you say this is triggerable today? > > > > The above backtrace is triggered by unbinding i915 on current upstream > > kernels. Note: Will crash later on rather badly in the > > fbdev/fbcon/vtconsole hell, but that's separate issue (which can be worked > > around by first unbinding fbcon manually through sysfs). > > > > > I know scsi was doing some strange things like trying to remove the > > > device itself from a sysfs callback on the device, which requires it to > > > just call a different kobject function created just for that type of > > > thing. Would that also make sense to do here instead of your workqueue? > > > > Note how we blow up on unregistering sw device instances supported by i915 > > in entirely different subsystems. I guess most drivers just have sysfs > > files for their own stuff, where this is done as you describe. The problem > > is that there's an awful lot of unrelated stuff hanging off i915. > > > > Or maybe acpi_video is busted, and should be using a different function. > > You haven't said which one, and I have no idea which one it is ... > > > > And in case the context wasn't clear: This is unbinding the i915 pci > > driver which triggers the above lockdep splat recursion. > > btw another option for "fixing" this would be to annotate the mutex_lock > in kernfs_remove_by_name_ns as recursive. Which just shuts up lockdep (and > might hide some real bugs), but would get the job done since there's not > actually a deadlock here. Just lockdep being annoyed. So what's the pick? I can do the typing, but I don't understand all the driver core interactions to know what we should be doing here best. -Daniel -- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch _______________________________________________ dri-devel mailing list dri-devel@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/dri-devel