Re: [Intel-gfx] [PATCH v2 1/3] drm/i915: Block fbdev HPD processing during suspend

Chris Wilson <chris@xxxxxxxxxxxxxxxxxx> · Tue, 29 Jan 2019 11:49:43 +0000

Quoting Lyude Paul (2019-01-28 20:56:01)
> When resuming, we check whether or not any previously connected
> MST topologies are still present and if so, attempt to resume them. If
> this fails, we disable said MST topologies and fire off a hotplug event
> so that userspace knows to reprobe.
> 
> However, sending a hotplug event involves calling
> drm_fb_helper_hotplug_event(), which in turn results in fbcon doing a
> connector reprobe in the caller's thread - something we can't do at the
> point in which i915 calls drm_dp_mst_topology_mgr_resume() since
> hotplugging hasn't been fully initialized yet.
> 
> This currently causes some rather subtle but fatal issues. For example,
> on my T480s the laptop dock connected to it usually disappears during a
> suspend cycle, and comes back up a short while after the system has been
> resumed. This guarantees pretty much every suspend and resume cycle,
> drm_dp_mst_topology_mgr_set_mst(mgr, false); will be caused and in turn,
> a connector hotplug will occur. Now it's Rute Goldberg time: when the
> connector hotplug occurs, i915 reprobes /all/ of the connectors,
> including eDP. However, eDP probing requires that we power on the panel
> VDD which in turn, grabs a wakeref to the appropriate power domain on
> the GPU (on my T480s, this is the PORT_DDI_A_IO domain). This is where
> things start breaking, since this all happens before
> intel_power_domains_enable() is called we end up leaking the wakeref
> that was acquired and never releasing it later. Come next suspend/resume
> cycle, this causes us to fail to shut down the GPU properly, which
> causes it not to resume properly and die a horrible complicated death.
> 
> (as a note: this only happens when there's both an eDP panel and MST
> topology connected which is removed mid-suspend. One or the other seems
> to always be OK).
> 
> We could try to fix the VDD wakeref leak, but this doesn't seem like
> it's worth it at all since we aren't able to handle hotplug detection
> while resuming anyway. So, let's go with a more robust solution inspired
> by nouveau: block fbdev from handling hotplug events until we resume
> fbdev. This allows us to still send sysfs hotplug events to be handled
> later by user space while we're resuming, while also preventing us from
> actually processing any hotplug events we receive until it's safe.
> 
> This fixes the wakeref leak observed on the T480s and as such, also
> fixes suspend/resume with MST topologies connected on this machine.
> 
> Signed-off-by: Lyude Paul <lyude@xxxxxxxxxx>
> Fixes: 0e32b39ceed6 ("drm/i915: add DP 1.2 MST support (v0.7)")
> Cc: Todd Previte <tprevite@xxxxxxxxx>
> Cc: Dave Airlie <airlied@xxxxxxxxxx>
> Cc: Jani Nikula <jani.nikula@xxxxxxxxxxxxxxx>
> Cc: Joonas Lahtinen <joonas.lahtinen@xxxxxxxxxxxxxxx>
> Cc: Rodrigo Vivi <rodrigo.vivi@xxxxxxxxx>
> Cc: Imre Deak <imre.deak@xxxxxxxxx>
> Cc: intel-gfx@xxxxxxxxxxxxxxxxxxxxx
> Cc: <stable@xxxxxxxxxxxxxxx> # v3.17+
> ---
>  drivers/gpu/drm/i915/intel_drv.h   | 10 ++++++++++
>  drivers/gpu/drm/i915/intel_fbdev.c | 30 +++++++++++++++++++++++++++++-
>  2 files changed, 39 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/i915/intel_drv.h b/drivers/gpu/drm/i915/intel_drv.h
> index 85b913ea6e80..c8549588b2ce 100644
> --- a/drivers/gpu/drm/i915/intel_drv.h
> +++ b/drivers/gpu/drm/i915/intel_drv.h
> @@ -213,6 +213,16 @@ struct intel_fbdev {
>         unsigned long vma_flags;
>         async_cookie_t cookie;
>         int preferred_bpp;
> +
> +       /* Whether or not fbdev hpd processing is temporarily suspended */
> +       bool hpd_suspended : 1;
> +       /* Set when a hotplug was received while HPD processing was
> +        * suspended
> +        */
> +       bool hpd_waiting : 1;
> +
> +       /* Protects hpd_suspended */
> +       struct mutex hpd_lock;
>  };
>  
>  struct intel_encoder {
> diff --git a/drivers/gpu/drm/i915/intel_fbdev.c b/drivers/gpu/drm/i915/intel_fbdev.c
> index 8cf3efe88f02..3a6c0bebaaf9 100644
> --- a/drivers/gpu/drm/i915/intel_fbdev.c
> +++ b/drivers/gpu/drm/i915/intel_fbdev.c
> @@ -681,6 +681,7 @@ int intel_fbdev_init(struct drm_device *dev)
>         if (ifbdev == NULL)
>                 return -ENOMEM;
>  
> +       mutex_init(&ifbdev->hpd_lock);
>         drm_fb_helper_prepare(dev, &ifbdev->helper, &intel_fb_helper_funcs);
>  
>         if (!intel_fbdev_init_bios(dev, ifbdev))
> @@ -754,6 +755,23 @@ void intel_fbdev_fini(struct drm_i915_private *dev_priv)
>         intel_fbdev_destroy(ifbdev);
>  }
>  
> +/* Suspends/resumes fbdev processing of incoming HPD events. When resuming HPD
> + * processing, fbdev will perform a full connector reprobe if a hotplug event
> + * was received while HPD was suspended.
> + */
> +static void intel_fbdev_hpd_set_suspend(struct intel_fbdev *ifbdev, int state)
> +{
> +       mutex_lock(&ifbdev->hpd_lock);
> +       ifbdev->hpd_suspended = state == FBINFO_STATE_SUSPENDED;
> +       if (ifbdev->hpd_waiting) {
> +               ifbdev->hpd_waiting = false;
> +
> +               DRM_DEBUG_KMS("Handling delayed fbcon HPD event\n");
> +               drm_fb_helper_hotplug_event(&ifbdev->helper);

Even when set_suspend(true) ?

Then on resume, we don't care as another hotplug is generated. Calling
the hotplug_even from under the mutex feels like unnecessary lock
spreading.

> +       }
> +       mutex_unlock(&ifbdev->hpd_lock);

I am thinking along the lines of:

bool send_event = false;

mutex_lock(&hpd_lock);
ifbdev->hpd_suspended = state == SUSPENDED;
send_event = !hpd_suspended && hpd_waiting;
hpd_waiting = false;
mutex_unlock(&hpd_lock)

if (send_event) {
	DRM_DEBUG_KMS("Handling delayed fbcon HPD event\n");
	drm_fb_helper_hotplug_event(&ifbdev->helper);
}

>  void intel_fbdev_output_poll_changed(struct drm_device *dev)
> @@ -812,8 +833,15 @@ void intel_fbdev_output_poll_changed(struct drm_device *dev)
>                 return;
>  
>         intel_fbdev_sync(ifbdev);
> -       if (ifbdev->vma || ifbdev->helper.deferred_setup)
> +
> +       mutex_lock(&ifbdev->hpd_lock);
> +       if (ifbdev->hpd_suspended) {
> +               DRM_DEBUG_KMS("fbdev HPD event deferred until resume\n");
> +               ifbdev->hpd_waiting = true;
> +       } else if (ifbdev->vma || ifbdev->helper.deferred_setup) {
>                 drm_fb_helper_hotplug_event(&ifbdev->helper);
> +       }

Seems ripe for simplification. We can always just set hpd_waiting as it
will be reset over suspend, so then we just need a way of tidying up the
"am I ready to send" check.
-Chris