On Thu, Jun 01, 2017 at 01:54:30PM +0200, Hans de Goede wrote: > commit a39be606f99d ("drm: Do a full device unregister when unplugging") > causes backtraces like this one when unplugging an usb drm device while > it is in use: > > usb 2-3: USB disconnect, device number 25 > ------------[ cut here ]------------ > WARNING: CPU: 0 PID: 242 at drivers/gpu/drm/drm_mode_config.c:424 > drm_mode_config_cleanup+0x220/0x280 [drm] > ... > RIP: 0010:drm_mode_config_cleanup+0x220/0x280 [drm] > ... > Call Trace: > gm12u320_modeset_cleanup+0xe/0x10 [gm12u320] > gm12u320_driver_unload+0x35/0x70 [gm12u320] > drm_dev_unregister+0x3c/0xe0 [drm] > drm_unplug_dev+0x12/0x60 [drm] > gm12u320_usb_disconnect+0x36/0x40 [gm12u320] > usb_unbind_interface+0x72/0x280 > device_release_driver_internal+0x158/0x210 > device_release_driver+0x12/0x20 > bus_remove_device+0x104/0x180 > device_del+0x1d2/0x350 > usb_disable_device+0x9f/0x270 > usb_disconnect+0xc6/0x260 > ... > [drm:drm_mode_config_cleanup [drm]] *ERROR* connector Unknown-1 leaked! > ------------[ cut here ]------------ > WARNING: CPU: 0 PID: 242 at drivers/gpu/drm/drm_mode_config.c:458 > drm_mode_config_cleanup+0x268/0x280 [drm] > ... > <same Call Trace> > ---[ end trace 80df975dae439ed6 ]--- > general protection fault: 0000 [#1] SMP > ... > Call Trace: > ? __switch_to+0x225/0x450 > drm_mode_rmfb_work_fn+0x55/0x70 [drm] > process_one_work+0x193/0x3c0 > worker_thread+0x4a/0x3a0 > ... > RIP: drm_framebuffer_remove+0x62/0x3f0 [drm] RSP: ffffb776c39dfd98 > ---[ end trace 80df975dae439ed7 ]--- > > After which the system is unusable this is caused by drm_dev_unregister > getting called immediately on unplug, which calls the drivers unload > function which calls drm_mode_config_cleanup which removes the framebuffer > object while userspace is still holding a reference to it. > > Reverting commit a39be606f99d ("drm: Do a full device unregister > when unplugging") leads to the following oops on unplug instead, > when userspace closes the last fd referencing the drm_dev: > > sysfs group 'power' not found for kobject 'card1-Unknown-1' > ------------[ cut here ]------------ > WARNING: CPU: 0 PID: 2459 at fs/sysfs/group.c:237 > sysfs_remove_group+0x80/0x90 > ... > RIP: 0010:sysfs_remove_group+0x80/0x90 > ... > Call Trace: > dpm_sysfs_remove+0x57/0x60 > device_del+0xfd/0x350 > device_unregister+0x1a/0x60 > drm_sysfs_connector_remove+0x39/0x50 [drm] > drm_connector_unregister+0x5a/0x70 [drm] > drm_connector_unregister_all+0x45/0xa0 [drm] > drm_modeset_unregister_all+0x12/0x30 [drm] > drm_dev_unregister+0xca/0xe0 [drm] > drm_put_dev+0x32/0x60 [drm] > drm_release+0x2f3/0x380 [drm] > __fput+0xdf/0x1e0 > ... > ---[ end trace ecfb91ac85688bbe ]--- > BUG: unable to handle kernel NULL pointer dereference at 00000000000000a8 > IP: down_write+0x1f/0x40 > ... > Call Trace: > debugfs_remove_recursive+0x55/0x1b0 > drm_debugfs_connector_remove+0x21/0x40 [drm] > drm_connector_unregister+0x62/0x70 [drm] > drm_connector_unregister_all+0x45/0xa0 [drm] > drm_modeset_unregister_all+0x12/0x30 [drm] > drm_dev_unregister+0xca/0xe0 [drm] > drm_put_dev+0x32/0x60 [drm] > drm_release+0x2f3/0x380 [drm] > __fput+0xdf/0x1e0 > ... > ---[ end trace ecfb91ac85688bbf ]--- > > This is caused by the revert moving back to drm_unplug_dev calling > drm_minor_unregister which does: > > device_del(minor->kdev); > dev_set_drvdata(minor->kdev, NULL); /* safety belt */ > drm_debugfs_cleanup(minor); > > Causing the sysfs entries to already be removed even though we still > have references to them in e.g. drm_connector. > > Note we must call drm_minor_unregister to notify userspace of the unplug > of the device, so calling drm_dev_unregister is not completely wrong the > problem is that drm_dev_unregister does too much. > > This commit fixes drm_unplug_dev by not only reverting > commit a39be606f99d ("drm: Do a full device unregister when unplugging") > but by also adding a call to drm_modeset_unregister_all before the > drm_minor_unregister calls to make sure all sysfs entries are removed > before calling device_del(minor->kdev) thereby also fixing the second > set of oopses caused by just reverting the commit. That really does sound like a step in the wrong direction though. But that seems to because of the call to driver->unload() from the middle of unregister that is catching me by surprise. Regression fix trumps niceties, so Reviewed-by: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx> Looks like about 50% of the remaining driver->unload callbacks can be replaced by a driver->release callback. The rest need care to finish demidlayering driver->load/unload. And I still think unplug needs some polish. -Chris -- Chris Wilson, Intel Open Source Technology Centre