Thought this was going to be an easy review until I realized that there's multiple problems in nouveau this would cause issues with, even if we didn't pay attention to the -EINVAL that gets returned. The suspend/resume order in nouveau needs some fixing up to prevent this patch from causing timeouts, and then also there's a hidden surprise [ 972.294437] ============================================ [ 972.295225] WARNING: possible recursive locking detected [ 972.296042] 4.19.0-rc8mst-reprobe+ #2 Not tainted [ 972.296842] -------------------------------------------- [ 972.297614] zsh/6840 is trying to acquire lock: [ 972.298441] 0000000072e521f7 (&dev->mode_config.mutex){+.+.}, at: drm_dp_mst_edids_changed+0xc0/0x180 [drm_kms_helper] [ 972.299299] but task is already holding lock: [ 972.300991] 0000000072e521f7 (&dev->mode_config.mutex){+.+.}, at: status_store+0x34/0x180 [drm] [ 972.301875] other info that might help us debug this: [ 972.303563] Possible unsafe locking scenario: [ 972.305315] CPU0 [ 972.306182] ---- [ 972.307061] lock(&dev->mode_config.mutex); [ 972.307943] lock(&dev->mode_config.mutex); [ 972.308819] *** DEADLOCK *** [ 972.311446] May be due to missing lock nesting notation [ 972.313234] 6 locks held by zsh/6840: [ 972.314135] #0: 0000000092b5ba9d (sb_writers#4){.+.+}, at: vfs_write+0x13e/0x1a0 [ 972.315049] #1: 00000000e628e6e9 (&of->mutex){+.+.}, at: kernfs_fop_write+0xbd/0x1a0 [ 972.315980] #2: 000000000fb65e6c (kn->count#240){.+.+}, at: kernfs_fop_write+0xc5/0x1a0 [ 972.316928] #3: 0000000072e521f7 (&dev->mode_config.mutex){+.+.}, at: status_store+0x34/0x180 [drm] [ 972.317892] #4: 00000000344224c2 (crtc_ww_class_acquire){+.+.}, at: drm_helper_probe_single_connector_modes+0x66/0x6c0 [drm_kms_helper] [ 972.318863] #5: 00000000d65352e2 (crtc_ww_class_mutex){+.+.}, at: drm_modeset_lock+0x44/0x110 [drm] [ 972.319860] stack backtrace: [ 972.321800] CPU: 5 PID: 6840 Comm: zsh Kdump: loaded Not tainted 4.19.0- rc8mst-reprobe+ #2 [ 972.322772] Hardware name: LENOVO 20EQZ1J7US/20EQZ1J7US, BIOS N1EET80W (1.53 ) 09/14/2018 [ 972.323792] Call Trace: [ 972.324821] dump_stack+0x85/0xc0 [ 972.325832] __lock_acquire.cold.59+0x158/0x227 [ 972.326887] ? retint_kernel+0x10/0x10 [ 972.327918] lock_acquire+0x9e/0x170 [ 972.328948] ? drm_dp_mst_edids_changed+0xc0/0x180 [drm_kms_helper] [ 972.329986] ? drm_dp_mst_edids_changed+0xc0/0x180 [drm_kms_helper] [ 972.331066] __mutex_lock+0x68/0x900 [ 972.332094] ? drm_dp_mst_edids_changed+0xc0/0x180 [drm_kms_helper] [ 972.333146] ? __mutex_unlock_slowpath+0x4b/0x2b0 [ 972.334221] ? drm_dp_mst_edids_changed+0xc0/0x180 [drm_kms_helper] [ 972.335255] drm_dp_mst_edids_changed+0xc0/0x180 [drm_kms_helper] [ 972.336318] drm_dp_mst_topology_mgr_resume+0x104/0x190 [drm_kms_helper] [ 972.337397] nv50_display_init+0x73/0xd0 [nouveau] [ 972.338483] nouveau_display_init+0x8e/0xe0 [nouveau] [ 972.339547] nouveau_display_resume+0x39/0x250 [nouveau] [ 972.340634] ? pci_restore_standard_config+0x40/0x40 [ 972.341762] nouveau_do_resume+0x79/0x150 [nouveau] [ 972.342850] nouveau_pmops_runtime_resume+0x88/0x150 [nouveau] [ 972.343915] pci_pm_runtime_resume+0x78/0xb0 [ 972.345004] __rpm_callback+0x75/0x1b0 [ 972.346084] ? pci_restore_standard_config+0x40/0x40 [ 972.347148] rpm_callback+0x1f/0x70 [ 972.348256] ? pci_restore_standard_config+0x40/0x40 [ 972.349351] rpm_resume+0x5d4/0x810 [ 972.350446] ? drm_modeset_lock+0x44/0x110 [drm] [ 972.351573] __pm_runtime_resume+0x47/0x70 [ 972.352737] nouveau_connector_detect+0x373/0x470 [nouveau] [ 972.353841] ? _cond_resched+0x15/0x30 [ 972.354945] ? ww_mutex_lock+0x30/0x60 [ 972.356044] ? drm_modeset_lock+0x44/0x110 [drm] [ 972.357146] drm_helper_probe_single_connector_modes+0xd9/0x6c0 [drm_kms_helper] [ 972.358274] ? printk+0x58/0x6f [ 972.359400] status_store+0xb2/0x180 [drm] [ 972.360519] kernfs_fop_write+0xf0/0x1a0 [ 972.361711] __vfs_write+0x36/0x180 [ 972.362842] ? rcu_read_lock_sched_held+0x55/0x60 [ 972.363974] ? rcu_sync_lockdep_assert+0x2e/0x60 [ 972.365103] ? __sb_start_write+0x115/0x170 [ 972.366241] ? vfs_write+0x13e/0x1a0 [ 972.367365] vfs_write+0xa5/0x1a0 [ 972.368486] ksys_write+0x52/0xc0 [ 972.369596] do_syscall_64+0x60/0x1a0 [ 972.370782] entry_SYSCALL_64_after_hwframe+0x49/0xbe [ 972.371890] RIP: 0033:0x7fc93e169ef4 [ 972.373012] Code: 89 02 48 c7 c0 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 8d 05 f1 3a 2d 00 8b 00 85 c0 75 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 41 54 49 89 d4 55 48 89 f5 53 [ 972.374214] RSP: 002b:00007ffcd7ad1918 EFLAGS: 00000246 ORIG_RAX: 0000000000000001 [ 972.375421] RAX: ffffffffffffffda RBX: 0000000000000007 RCX: 00007fc93e169ef4 [ 972.376628] RDX: 0000000000000007 RSI: 000055aefe5c30c0 RDI: 0000000000000001 [ 972.377872] RBP: 000055aefe5c30c0 R08: 00007fc93e43a8c0 R09: 00007fc93f6cdb80 [ 972.379082] R10: 000000000000000a R11: 0000000000000246 R12: 00007fc93e439760 [ 972.380311] R13: 0000000000000007 R14: 00007fc93e434760 R15: 0000000000000007 I must not have noticed that back when I wrote these patches! Whoops. Since there's going to be quite a number of changes I need to make to this I'm just going to make the changes myself! I'll make sure to Cc you with the respin On Tue, 2018-10-23 at 19:19 -0700, Juston Li wrote: > From: Lyude <cpaul@xxxxxxxxxx> > > As observed with the latest ThinkPad docks, we unfortunately can't rely > on docks keeping us updated with hotplug events that happened while we > were suspended. On top of that, even if the number of connectors remains > the same between suspend and resume it's still not safe to assume that > there were no hotplugs, since a different monitor might have been > plugged into a port another monitor previously occupied. As such, we > need to go through all of the MST ports and check whether or not their > EDIDs have changed. > > In addition to that, we also now return -EINVAL from > drm_dp_mst_topology_mgr_resume to indicate to callers that they need to > reset the MST connection, and that they can't rely on any other method > of reprobing. > > Cc: stable@xxxxxxxxxxxxxxx > Signed-off-by: Lyude <cpaul@xxxxxxxxxx> > Signed-off-by: Juston Li <juston.li@xxxxxxxxx> > --- > drivers/gpu/drm/drm_dp_mst_topology.c | 94 ++++++++++++++++++++++++++- > 1 file changed, 93 insertions(+), 1 deletion(-) > > diff --git a/drivers/gpu/drm/drm_dp_mst_topology.c > b/drivers/gpu/drm/drm_dp_mst_topology.c > index 5ff1d79b86c4..88abebe52021 100644 > --- a/drivers/gpu/drm/drm_dp_mst_topology.c > +++ b/drivers/gpu/drm/drm_dp_mst_topology.c > @@ -29,6 +29,7 @@ > #include <linux/i2c.h> > #include <drm/drm_dp_mst_helper.h> > #include <drm/drmP.h> > +#include <drm/drm_edid.h> > > #include <drm/drm_fixed.h> > #include <drm/drm_atomic.h> > @@ -2201,6 +2202,64 @@ void drm_dp_mst_topology_mgr_suspend(struct > drm_dp_mst_topology_mgr *mgr) > } > EXPORT_SYMBOL(drm_dp_mst_topology_mgr_suspend); > > +static bool drm_dp_mst_edids_changed(struct drm_dp_mst_topology_mgr *mgr, > + struct drm_dp_mst_port *port) > +{ > + struct drm_device *dev; > + struct drm_connector *connector; > + struct drm_dp_mst_port *dport; > + struct drm_dp_mst_branch *mstb; > + struct edid *current_edid, *cached_edid; > + bool ret = false; > + > + port = drm_dp_get_validated_port_ref(mgr, port); > + if (!port) > + return false; > + > + mstb = drm_dp_get_validated_mstb_ref(mgr, port->mstb); > + if (mstb) { > + list_for_each_entry(dport, &port->mstb->ports, next) { > + ret = drm_dp_mst_edids_changed(mgr, dport); > + if (ret) > + break; > + } > + > + drm_dp_put_mst_branch_device(mstb); > + if (ret) > + goto out; > + } > + > + connector = port->connector; > + if (!connector || !port->aux.ddc.algo) > + goto out; > + > + dev = connector->dev; > + mutex_lock(&dev->mode_config.mutex); > + > + current_edid = drm_get_edid(connector, &port->aux.ddc); > + if (connector->edid_blob_ptr) > + cached_edid = (void *)connector->edid_blob_ptr->data; > + else > + return false; > + > + if ((current_edid && cached_edid && memcmp(current_edid, cached_edid, > + sizeof(struct edid)) != 0) > || > + (!current_edid && cached_edid) || (current_edid && !cached_edid)) > { > + ret = true; > + DRM_DEBUG_KMS("EDID on %s changed, reprobing connectors\n", > + connector->name); > + } > + > + mutex_unlock(&dev->mode_config.mutex); > + > + kfree(current_edid); > + > +out: > + drm_dp_put_port(port); > + > + return ret; > +} > + > /** > * drm_dp_mst_topology_mgr_resume() - resume the MST manager > * @mgr: manager to resume > @@ -2210,9 +2269,15 @@ EXPORT_SYMBOL(drm_dp_mst_topology_mgr_suspend); > * > * if the device fails this returns -1, and the driver should do > * a full MST reprobe, in case we were undocked. > + * > + * if the device can no longer be trusted, this returns -EINVAL > + * and the driver should unconditionally disconnect and reconnect > + * the dock. > */ > int drm_dp_mst_topology_mgr_resume(struct drm_dp_mst_topology_mgr *mgr) > { > + struct drm_dp_mst_branch *mstb; > + struct drm_dp_mst_port *port; > int ret = 0; > > mutex_lock(&mgr->lock); > @@ -2246,8 +2311,35 @@ int drm_dp_mst_topology_mgr_resume(struct > drm_dp_mst_topology_mgr *mgr) > drm_dp_check_mstb_guid(mgr->mst_primary, guid); > > ret = 0; > - } else > + > + /* > + * Some hubs also forget to notify us of hotplugs that > happened > + * while we were in suspend, so we need to verify that the > edid > + * hasn't changed for any of the connectors. If it has been, > + * we unfortunately can't rely on the dock updating us with > + * hotplug events, so indicate we need a full reconnect. > + */ > + > + /* MST's I2C helpers can't be used while holding this lock */ > + mutex_unlock(&mgr->lock); > + > + mstb = drm_dp_get_validated_mstb_ref(mgr, mgr->mst_primary); > + if (mstb) { > + list_for_each_entry(port, &mstb->ports, next) { > + if (drm_dp_mst_edids_changed(mgr, port)) { > + ret = -EINVAL; > + break; > + } > + } > + > + drm_dp_put_mst_branch_device(mstb); > + } > + } else { > ret = -1; > + mutex_unlock(&mgr->lock); > + } > + > + return ret; > > out_unlock: > mutex_unlock(&mgr->lock); -- Cheers, Lyude Paul