On Thu, Jun 01, 2017 at 01:14:56PM +0100, Chris Wilson wrote: > On Thu, Jun 01, 2017 at 01:54:30PM +0200, Hans de Goede wrote: > > commit a39be606f99d ("drm: Do a full device unregister when unplugging") > > causes backtraces like this one when unplugging an usb drm device while > > it is in use: > > > > usb 2-3: USB disconnect, device number 25 > > ------------[ cut here ]------------ > > WARNING: CPU: 0 PID: 242 at drivers/gpu/drm/drm_mode_config.c:424 > > drm_mode_config_cleanup+0x220/0x280 [drm] > > ... > > RIP: 0010:drm_mode_config_cleanup+0x220/0x280 [drm] > > ... > > Call Trace: > > gm12u320_modeset_cleanup+0xe/0x10 [gm12u320] > > gm12u320_driver_unload+0x35/0x70 [gm12u320] > > drm_dev_unregister+0x3c/0xe0 [drm] > > drm_unplug_dev+0x12/0x60 [drm] > > gm12u320_usb_disconnect+0x36/0x40 [gm12u320] > > usb_unbind_interface+0x72/0x280 > > device_release_driver_internal+0x158/0x210 > > device_release_driver+0x12/0x20 > > bus_remove_device+0x104/0x180 > > device_del+0x1d2/0x350 > > usb_disable_device+0x9f/0x270 > > usb_disconnect+0xc6/0x260 > > ... > > [drm:drm_mode_config_cleanup [drm]] *ERROR* connector Unknown-1 leaked! > > ------------[ cut here ]------------ > > WARNING: CPU: 0 PID: 242 at drivers/gpu/drm/drm_mode_config.c:458 > > drm_mode_config_cleanup+0x268/0x280 [drm] > > ... > > <same Call Trace> > > ---[ end trace 80df975dae439ed6 ]--- > > general protection fault: 0000 [#1] SMP > > ... > > Call Trace: > > ? __switch_to+0x225/0x450 > > drm_mode_rmfb_work_fn+0x55/0x70 [drm] > > process_one_work+0x193/0x3c0 > > worker_thread+0x4a/0x3a0 > > ... > > RIP: drm_framebuffer_remove+0x62/0x3f0 [drm] RSP: ffffb776c39dfd98 > > ---[ end trace 80df975dae439ed7 ]--- > > > > After which the system is unusable this is caused by drm_dev_unregister > > getting called immediately on unplug, which calls the drivers unload > > function which calls drm_mode_config_cleanup which removes the framebuffer > > object while userspace is still holding a reference to it. > > > > Reverting commit a39be606f99d ("drm: Do a full device unregister > > when unplugging") leads to the following oops on unplug instead, > > when userspace closes the last fd referencing the drm_dev: > > > > sysfs group 'power' not found for kobject 'card1-Unknown-1' > > ------------[ cut here ]------------ > > WARNING: CPU: 0 PID: 2459 at fs/sysfs/group.c:237 > > sysfs_remove_group+0x80/0x90 > > ... > > RIP: 0010:sysfs_remove_group+0x80/0x90 > > ... > > Call Trace: > > dpm_sysfs_remove+0x57/0x60 > > device_del+0xfd/0x350 > > device_unregister+0x1a/0x60 > > drm_sysfs_connector_remove+0x39/0x50 [drm] > > drm_connector_unregister+0x5a/0x70 [drm] > > drm_connector_unregister_all+0x45/0xa0 [drm] > > drm_modeset_unregister_all+0x12/0x30 [drm] > > drm_dev_unregister+0xca/0xe0 [drm] > > drm_put_dev+0x32/0x60 [drm] > > drm_release+0x2f3/0x380 [drm] > > __fput+0xdf/0x1e0 > > ... > > ---[ end trace ecfb91ac85688bbe ]--- > > BUG: unable to handle kernel NULL pointer dereference at 00000000000000a8 > > IP: down_write+0x1f/0x40 > > ... > > Call Trace: > > debugfs_remove_recursive+0x55/0x1b0 > > drm_debugfs_connector_remove+0x21/0x40 [drm] > > drm_connector_unregister+0x62/0x70 [drm] > > drm_connector_unregister_all+0x45/0xa0 [drm] > > drm_modeset_unregister_all+0x12/0x30 [drm] > > drm_dev_unregister+0xca/0xe0 [drm] > > drm_put_dev+0x32/0x60 [drm] > > drm_release+0x2f3/0x380 [drm] > > __fput+0xdf/0x1e0 > > ... > > ---[ end trace ecfb91ac85688bbf ]--- > > > > This is caused by the revert moving back to drm_unplug_dev calling > > drm_minor_unregister which does: > > > > device_del(minor->kdev); > > dev_set_drvdata(minor->kdev, NULL); /* safety belt */ > > drm_debugfs_cleanup(minor); > > > > Causing the sysfs entries to already be removed even though we still > > have references to them in e.g. drm_connector. > > > > Note we must call drm_minor_unregister to notify userspace of the unplug > > of the device, so calling drm_dev_unregister is not completely wrong the > > problem is that drm_dev_unregister does too much. > > > > This commit fixes drm_unplug_dev by not only reverting > > commit a39be606f99d ("drm: Do a full device unregister when unplugging") > > but by also adding a call to drm_modeset_unregister_all before the > > drm_minor_unregister calls to make sure all sysfs entries are removed > > before calling device_del(minor->kdev) thereby also fixing the second > > set of oopses caused by just reverting the commit. > > That really does sound like a step in the wrong direction though. But > that seems to because of the call to driver->unload() from the middle of > unregister that is catching me by surprise. > > Regression fix trumps niceties, so > Reviewed-by: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx> Indeed it does seem like we'll need to sort this out, but agreed let's put out the fire before renovating the house. Applied to -misc-fixes Sean > > Looks like about 50% of the remaining driver->unload callbacks can be > replaced by a driver->release callback. The rest need care to finish > demidlayering driver->load/unload. And I still think unplug needs some > polish. > -Chris > > -- > Chris Wilson, Intel Open Source Technology Centre -- Sean Paul, Software Engineer, Google / Chromium OS