Quoting Lyude Paul (2019-01-29 19:09:59) > When resuming, we check whether or not any previously connected > MST topologies are still present and if so, attempt to resume them. If > this fails, we disable said MST topologies and fire off a hotplug event > so that userspace knows to reprobe. > > However, sending a hotplug event involves calling > drm_fb_helper_hotplug_event(), which in turn results in fbcon doing a > connector reprobe in the caller's thread - something we can't do at the > point in which i915 calls drm_dp_mst_topology_mgr_resume() since > hotplugging hasn't been fully initialized yet. > > This currently causes some rather subtle but fatal issues. For example, > on my T480s the laptop dock connected to it usually disappears during a > suspend cycle, and comes back up a short while after the system has been > resumed. This guarantees pretty much every suspend and resume cycle, > drm_dp_mst_topology_mgr_set_mst(mgr, false); will be caused and in turn, > a connector hotplug will occur. Now it's Rute Goldberg time: when the > connector hotplug occurs, i915 reprobes /all/ of the connectors, > including eDP. However, eDP probing requires that we power on the panel > VDD which in turn, grabs a wakeref to the appropriate power domain on > the GPU (on my T480s, this is the PORT_DDI_A_IO domain). This is where > things start breaking, since this all happens before > intel_power_domains_enable() is called we end up leaking the wakeref > that was acquired and never releasing it later. Come next suspend/resume > cycle, this causes us to fail to shut down the GPU properly, which > causes it not to resume properly and die a horrible complicated death. > > (as a note: this only happens when there's both an eDP panel and MST > topology connected which is removed mid-suspend. One or the other seems > to always be OK). > > We could try to fix the VDD wakeref leak, but this doesn't seem like > it's worth it at all since we aren't able to handle hotplug detection > while resuming anyway. So, let's go with a more robust solution inspired > by nouveau: block fbdev from handling hotplug events until we resume > fbdev. This allows us to still send sysfs hotplug events to be handled > later by user space while we're resuming, while also preventing us from > actually processing any hotplug events we receive until it's safe. > > This fixes the wakeref leak observed on the T480s and as such, also > fixes suspend/resume with MST topologies connected on this machine. > > Changes since v2: > * Don't call drm_fb_helper_hotplug_event() under lock, do it after lock > (Chris Wilson) > * Don't call drm_fb_helper_hotplug_event() in > intel_fbdev_output_poll_changed() under lock (Chris Wilson) > * Always set ifbdev->hpd_waiting (Chris Wilson) > > Signed-off-by: Lyude Paul <lyude@xxxxxxxxxx> > Fixes: 0e32b39ceed6 ("drm/i915: add DP 1.2 MST support (v0.7)") > Cc: Todd Previte <tprevite@xxxxxxxxx> > Cc: Dave Airlie <airlied@xxxxxxxxxx> > Cc: Jani Nikula <jani.nikula@xxxxxxxxxxxxxxx> > Cc: Joonas Lahtinen <joonas.lahtinen@xxxxxxxxxxxxxxx> > Cc: Rodrigo Vivi <rodrigo.vivi@xxxxxxxxx> > Cc: Imre Deak <imre.deak@xxxxxxxxx> > Cc: intel-gfx@xxxxxxxxxxxxxxxxxxxxx > Cc: <stable@xxxxxxxxxxxxxxx> # v3.17+ I suspect the locking is overkill, but certainly easier to reason than trying to remember all the ins and outs of fbdev with its dubious async initialisation. Reviewed-by: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx> -Chris