Re: [PATCH] drm: Fix oops + Xserver hang when unplugging USB drm devices

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

On 01-06-17 14:14, Chris Wilson wrote:
On Thu, Jun 01, 2017 at 01:54:30PM +0200, Hans de Goede wrote:
commit a39be606f99d ("drm: Do a full device unregister when unplugging")
causes backtraces like this one when unplugging an usb drm device while
it is in use:

usb 2-3: USB disconnect, device number 25
------------[ cut here ]------------
WARNING: CPU: 0 PID: 242 at drivers/gpu/drm/drm_mode_config.c:424
    drm_mode_config_cleanup+0x220/0x280 [drm]
...
RIP: 0010:drm_mode_config_cleanup+0x220/0x280 [drm]
...
Call Trace:
  gm12u320_modeset_cleanup+0xe/0x10 [gm12u320]
  gm12u320_driver_unload+0x35/0x70 [gm12u320]
  drm_dev_unregister+0x3c/0xe0 [drm]
  drm_unplug_dev+0x12/0x60 [drm]
  gm12u320_usb_disconnect+0x36/0x40 [gm12u320]
  usb_unbind_interface+0x72/0x280
  device_release_driver_internal+0x158/0x210
  device_release_driver+0x12/0x20
  bus_remove_device+0x104/0x180
  device_del+0x1d2/0x350
  usb_disable_device+0x9f/0x270
  usb_disconnect+0xc6/0x260
...
[drm:drm_mode_config_cleanup [drm]] *ERROR* connector Unknown-1 leaked!
------------[ cut here ]------------
WARNING: CPU: 0 PID: 242 at drivers/gpu/drm/drm_mode_config.c:458
    drm_mode_config_cleanup+0x268/0x280 [drm]
...
<same Call Trace>
---[ end trace 80df975dae439ed6 ]---
general protection fault: 0000 [#1] SMP
...
Call Trace:
  ? __switch_to+0x225/0x450
  drm_mode_rmfb_work_fn+0x55/0x70 [drm]
  process_one_work+0x193/0x3c0
  worker_thread+0x4a/0x3a0
...
RIP: drm_framebuffer_remove+0x62/0x3f0 [drm] RSP: ffffb776c39dfd98
---[ end trace 80df975dae439ed7 ]---

After which the system is unusable this is caused by drm_dev_unregister
getting called immediately on unplug, which calls the drivers unload
function which calls drm_mode_config_cleanup which removes the framebuffer
object while userspace is still holding a reference to it.

Reverting commit a39be606f99d ("drm: Do a full device unregister
when unplugging") leads to the following oops on unplug instead,
when userspace closes the last fd referencing the drm_dev:

sysfs group 'power' not found for kobject 'card1-Unknown-1'
------------[ cut here ]------------
WARNING: CPU: 0 PID: 2459 at fs/sysfs/group.c:237
    sysfs_remove_group+0x80/0x90
...
RIP: 0010:sysfs_remove_group+0x80/0x90
...
Call Trace:
  dpm_sysfs_remove+0x57/0x60
  device_del+0xfd/0x350
  device_unregister+0x1a/0x60
  drm_sysfs_connector_remove+0x39/0x50 [drm]
  drm_connector_unregister+0x5a/0x70 [drm]
  drm_connector_unregister_all+0x45/0xa0 [drm]
  drm_modeset_unregister_all+0x12/0x30 [drm]
  drm_dev_unregister+0xca/0xe0 [drm]
  drm_put_dev+0x32/0x60 [drm]
  drm_release+0x2f3/0x380 [drm]
  __fput+0xdf/0x1e0
...
---[ end trace ecfb91ac85688bbe ]---
BUG: unable to handle kernel NULL pointer dereference at 00000000000000a8
IP: down_write+0x1f/0x40
...
Call Trace:
  debugfs_remove_recursive+0x55/0x1b0
  drm_debugfs_connector_remove+0x21/0x40 [drm]
  drm_connector_unregister+0x62/0x70 [drm]
  drm_connector_unregister_all+0x45/0xa0 [drm]
  drm_modeset_unregister_all+0x12/0x30 [drm]
  drm_dev_unregister+0xca/0xe0 [drm]
  drm_put_dev+0x32/0x60 [drm]
  drm_release+0x2f3/0x380 [drm]
  __fput+0xdf/0x1e0
...
---[ end trace ecfb91ac85688bbf ]---

This is caused by the revert moving back to drm_unplug_dev calling
drm_minor_unregister which does:

         device_del(minor->kdev);
         dev_set_drvdata(minor->kdev, NULL); /* safety belt */
         drm_debugfs_cleanup(minor);

Causing the sysfs entries to already be removed even though we still
have references to them in e.g. drm_connector.

Note we must call drm_minor_unregister to notify userspace of the unplug
of the device, so calling drm_dev_unregister is not completely wrong the
problem is that drm_dev_unregister does too much.

This commit fixes drm_unplug_dev by not only reverting
commit a39be606f99d ("drm: Do a full device unregister when unplugging")
but by also adding a call to drm_modeset_unregister_all before the
drm_minor_unregister calls to make sure all sysfs entries are removed
before calling device_del(minor->kdev) thereby also fixing the second
set of oopses caused by just reverting the commit.

That really does sound like a step in the wrong direction though. But
that seems to because of the call to driver->unload() from the middle of
unregister that is catching me by surprise. >
Regression fix trumps niceties, so

Right, I'm not entirely happy with this fix too, but I decided to go for
the simplest fix possible to get the regression fixed.

As you've seen in my reply in the "[PATCH v11] drm: Unplug drm device when
unregistering it (v8)" thread I have been thinking about nicer ways
to fix this too. Lets continue the discussion about doing a better fix
for a future kernel release there.

Reviewed-by: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx>

Thank you.

Looks like about 50% of the remaining driver->unload callbacks can be
replaced by a driver->release callback. The rest need care to finish
demidlayering driver->load/unload. And I still think unplug needs some
polish.

Regards,

Hans



[Index of Archives]     [Linux Kernel]     [Kernel Development Newbies]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite Hiking]     [Linux Kernel]     [Linux SCSI]