On Thu, 2017-03-30 at 07:55 +0200, Takashi Iwai wrote: > On Thu, 30 Mar 2017 02:24:37 +0200, > Lyude Paul wrote: > > > > On Wed, 2017-03-29 at 15:54 +0200, Takashi Iwai wrote: > > > On Wed, 29 Mar 2017 15:34:15 +0200, > > > Ville Syrjälä wrote: > > > > > > > > On Wed, Mar 29, 2017 at 03:10:09PM +0200, Takashi Iwai wrote: > > > > > On Mon, 27 Mar 2017 18:02:13 +0200, > > > > > Takashi Iwai wrote: > > > > > > > > > > > > Hi, > > > > > > > > > > > > the upstream fix a16b7658f4e0d4aec9bc3e75a5f0cc3f7a3a0422 > > > > > > drm/i915: Call intel_dp_mst_resume() before resuming > > > > > > displays > > > > > > > > > > > > seems to trigger a kernel panic when some modeset change > > > > > > happens after > > > > > > S3 resume. The details are found in openSUSE bugzilla, > > > > > > https://bugzilla.suse.com/show_bug.cgi?id=1029634 > > > > > > > > > > > > In short, the following procedure causes a kernel panic > > > > > > (supposedly) > > > > > > almost 100% on Dell Latitude with Skylake with MST DP on > > > > > > dock: > > > > > > > > > > > > - Boot with a docking station, DP-1 connected. > > > > > > - Login on X > > > > > > - xrandr --output eDP-1 --primary --auto --output DP-1-1 -- > > > > > > auto > > > > > > --left-of eDP-1 > > > > > > ==> This changes the mode. > > > > > > - Suspend ("systemctl suspend" in my case), and close the > > > > > > lid. > > > > > > - Remove from the dock (keep the lid closed). > > > > > > - Open the lid, which resumes automatically. It works. > > > > > > - Suspend again. > > > > > > - Connect to the dock again (keep the lid closed). > > > > > > - Open the lid, which resumes automatically. It's still > > > > > > OK. > > > > > > - xrandr --output eDP-1 --primary --auto --output DP-1-1 -- > > > > > > auto > > > > > > --left-of eDP-1 > > > > > > ==> Now the kernel feezes. > > > > > > > > > > > > Reverting the commit mentioned above fixes the problem. > > > > > > > > > > > > The problem is present in all versions I tested. The > > > > > > reported > > > > > > kernel > > > > > > in the Bugzilla is 4.4.x-based one, but the issue is seen > > > > > > in > > > > > > 4.11-rc3, > > > > > > too. Note that the S3 resume itself works in 4.11-rc3; the > > > > > > kernel > > > > > > panic happens when invoking xrandr manually after that. > > > > > > > > > > > > Unfortunately, I couldn't get a kernel panic message, so > > > > > > far. kdump > > > > > > didn't work well in this case by some reason. There are > > > > > > some > > > > > > screenshots taken by the original reporter (could switch VT > > > > > > beforehand), but I don't know whether it helps. > > > > > > > > > > > > If you have any hints for further debugging, it'd be highly > > > > > > appreciated. > > > > > > > > > > It seems that the patch below works around the problem. > > > > > Can anyone enlighten what's going on there? > > > > > > > > > > > > > > > thanks, > > > > > > > > > > Takashi > > > > > > > > > > -- 8< -- > > > > > From: Takashi Iwai <tiwai@xxxxxxx> > > > > > Subject: [PATCH] drm/i915: Fix crash after S3 resume with DP > > > > > MST > > > > > mode change > > > > > > > > > > We've got a bug report showing that Skylake Dell machines > > > > > with a > > > > > docking station causes a kernel panic after S3 resume and > > > > > modeset. > > > > > The details are found in the openSUSE bugzilla entry > > > > > below. The > > > > > typical test procedure is: > > > > > > > > > > - Laptop is Dell Latitude with eDP (1366x768) > > > > > - Boot with docking station connected to a DP (1920x1080) > > > > > - Login, change the mode via > > > > > xrandr --output eDP-1 --auto --output DP-1-1 --auto --left- > > > > > of > > > > > eDP-1 > > > > > - Suspend, and close the lid after the suspend > > > > > (or close the lid to trigger the suspend) > > > > > - Undock while keeping the lid closed. > > > > > - Open the lid, which triggers the resume; > > > > > the machine wakes up well, and X shows up. No problem, so > > > > > far. > > > > > - Suspend again, close the lid. > > > > > - Dock again while keeping the lid closed. > > > > > - Open the lid, triggering the resume; this wakes up still > > > > > fine. > > > > > - At this moment, run xrandr again to re-setup DP-1 > > > > > xrandr --output eDP-1 --auto --output DP-1-1 --auto --left- > > > > > of > > > > > eDP-1 > > > > > ==> This triggers a hard crash. > > > > > > > > > > I could bisect it, and this leaded to the commit a16b7658f4e0 > > > > > ("drm/i915: Call intel_dp_mst_resume() before resuming > > > > > displays"). > > > > > > > > > > Basically the commit just shuffles the calls of > > > > > intel_display_resume() > > > > > and intel_dp_mst_resume(). So as a workaround, I tried to > > > > > split > > > > > intel_dp_mst_resume() call to postpone the suspected code > > > > > (the > > > > > invocation of intel_dp_check_mst_status()), then bingo, this > > > > > cured the > > > > > problem. > > > > > > > > > > But don't ask me *why* this fixes. It's still in a cargo- > > > > > cult > > > > > state. > > > > > > > > > > Fixes: a16b7658f4e0 ("drm/i915: Call intel_dp_mst_resume() > > > > > before > > > > > resuming displays") > > > > > Bugzilla: https://bugzilla.suse.com/show_bug.cgi?id=1029634 > > > > > Signed-off-by: Takashi Iwai <tiwai@xxxxxxx> > > > > > --- > > > > > drivers/gpu/drm/i915/i915_drv.c | 5 ++++- > > > > > drivers/gpu/drm/i915/intel_dp.c | 20 +++++++++++++++++++- > > > > > drivers/gpu/drm/i915/intel_drv.h | 3 ++- > > > > > 3 files changed, 25 insertions(+), 3 deletions(-) > > > > > > > > > > diff --git a/drivers/gpu/drm/i915/i915_drv.c > > > > > b/drivers/gpu/drm/i915/i915_drv.c > > > > > index 1c75402a59c1..62c40090ceed 100644 > > > > > --- a/drivers/gpu/drm/i915/i915_drv.c > > > > > +++ b/drivers/gpu/drm/i915/i915_drv.c > > > > > @@ -1559,6 +1559,7 @@ static int > > > > > i915_suspend_switcheroo(struct > > > > > drm_device *dev, pm_message_t state) > > > > > static int i915_drm_resume(struct drm_device *dev) > > > > > { > > > > > struct drm_i915_private *dev_priv = to_i915(dev); > > > > > + int mst_pending; > > > > > int ret; > > > > > > > > > > disable_rpm_wakeref_asserts(dev_priv); > > > > > @@ -1608,10 +1609,12 @@ static int i915_drm_resume(struct > > > > > drm_device *dev) > > > > > dev_priv->display.hpd_irq_setup(dev_priv); > > > > > spin_unlock_irq(&dev_priv->irq_lock); > > > > > > > > > > - intel_dp_mst_resume(dev); > > > > > + mst_pending = intel_dp_mst_resume(dev); > > > > > > > > > > intel_display_resume(dev); > > > > > > > > > > + intel_dp_mst_resume_post(dev, mst_pending); > > > > > + > > > > > drm_kms_helper_poll_enable(dev); > > > > > > > > > > /* > > > > > diff --git a/drivers/gpu/drm/i915/intel_dp.c > > > > > b/drivers/gpu/drm/i915/intel_dp.c > > > > > index d1670b8afbf5..fc5ea900e6f3 100644 > > > > > --- a/drivers/gpu/drm/i915/intel_dp.c > > > > > +++ b/drivers/gpu/drm/i915/intel_dp.c > > > > > @@ -6027,9 +6027,10 @@ void intel_dp_mst_suspend(struct > > > > > drm_device *dev) > > > > > } > > > > > } > > > > > > > > > > -void intel_dp_mst_resume(struct drm_device *dev) > > > > > +int intel_dp_mst_resume(struct drm_device *dev) > > > > > { > > > > > struct drm_i915_private *dev_priv = to_i915(dev); > > > > > + int pending = 0; > > > > > int i; > > > > > > > > > > for (i = 0; i < I915_MAX_PORTS; i++) { > > > > > @@ -6041,6 +6042,23 @@ void intel_dp_mst_resume(struct > > > > > drm_device > > > > > *dev) > > > > > > > > > > ret = > > > > > drm_dp_mst_topology_mgr_resume(&intel_dig_port->dp.mst_mgr); > > > > > if (ret) > > > > > + pending |= 1 << i; > > > > > + } > > > > > + > > > > > + return pending; > > > > > +} > > > > > + > > > > > +void intel_dp_mst_resume_post(struct drm_device *dev, int > > > > > pending) > > > > > +{ > > > > > + struct drm_i915_private *dev_priv = to_i915(dev); > > > > > + int i; > > > > > + > > > > > + for (i = 0; i < I915_MAX_PORTS; i++) { > > > > > + struct intel_digital_port *intel_dig_port = > > > > > + dev_priv->hotplug.irq_port[i]; > > > > > + if (!intel_dig_port || !intel_dig_port- > > > > > > dp.can_mst) > > > > > > > > > > + continue; > > > > > + if (pending & (1 << i)) > > > > > intel_dp_check_mst_status(&intel_dig > > > > > _por > > > > > t->dp); > > > > > } > > > > > } > > > > > > > > The whole MST resume is a bit of chicken and egg type of > > > > situation. > > > > We > > > > need the HPD interrupts to resume the previous state, but we > > > > don't > > > > want > > > > to actually process real hotplugs until we've done the resume. > > > > The > > > > current code is definitely broken IMO. > > > > > > > > But I'm not really sure why this patch fixes things because the > > > > HPD > > > > processing that will occur when we talk to the sink during the > > > > display > > > > resume should also call intel_dp_check_mst_status(). > > > > > > Actually, just dropping intel_dp_check_mst_status() calls in > > > intel_dp_mst_resume() seems enough to fix the problem. > > > > > > Below is the v2 patch doing that. Does it make more sense? > > > > Barely, unfortunately :(. I'll be honest I'm a little surprised > > this > > patch doesn't break anything, but you do seem to be right in saying > > that hotplugging re-sets up the displays in the event that things > > go > > wrong. > > > > Going through the backtrace and the dmesg that you gave on the > > bugzilla > > I can't figure out any definitive answer for why this is crashing. > > We > > will have to get dmesg output from when it crashes to understand > > this > > I'm pretty sure. > > > > JFYI (I really need to make a guide for this somewhere...) I > > usually > > use the netconsole module to get kernel output from machines when > > they > > kernel panic. In your case I would set this up right before you do > > the > > final `xrandr` call that kills the machine, since netconsole has > > not > > been very reliable in my experience over suspend/resume cycles. It > > might take a few tries to get it to print the whole crash to the > > netconsole (and you will need to make sure the system has a PCI > > ethernet adapter, USB or wifi won't do), but this usually works. > > I've already tried netconsole (even before kdump), but unfortunately > it doesn't work in this case. As mentioned in Bugzilla, the network > goes crazy when the machine crashes (even the whole network segment > the machine connects to is stalling). It's similar like the symptom > we've seen with the SKL CPU-state screw up in the past with > intel_idle > driver. Do you think you might be able to get a translation of the stack dump to the actual lines this issues is occurring on? You should be able to do this against i915.ko using eu-addr2line (it will have to be against the same kernel build you're running on the crashing machine, hence why I can't just do it myself). I might see if I can give you some patches to spit some info out. > > > If you could try getting a full dmesg with this, that would be > > super > > appreciated. > > > > JFYI: in the future you should use git send-email for sending these > > patches to intel-gfx, since the format you sent it in is preventing > > your patch from getting noticed by patchwork. Other then that > > though: > > > > Reviewed-by: Lyude <lyude@xxxxxxxxxx> > > Thanks, it's not meant to be a real submission of the patch, but > currently it's still a sort of RFC. If there is no better solution > insight, I'll submit it as a proper fix patch. > > > Takashi > > > > > > > > > > thanks, > > > > > > Takashi > > > > > > -- 8< -- > > > From: Takashi Iwai <tiwai@xxxxxxx> > > > Subject: [PATCH] drm/i915: Fix crash after S3 resume with DP MST > > > mode > > > change > > > (v2) > > > > > > We've got a bug report showing that Skylake Dell machines with a > > > docking station causes a kernel panic after S3 resume and > > > modeset. > > > The details are found in the openSUSE bugzilla entry below. The > > > typical test procedure is: > > > > > > - Laptop is Dell Latitude with eDP (1366x768) > > > - Boot with docking station connected to a DP (1920x1080) > > > - Login, change the mode via > > > xrandr --output eDP-1 --auto --output DP-1-1 --auto --left-of > > > eDP-1 > > > - Suspend, and close the lid after the suspend > > > (or close the lid to trigger the suspend) > > > - Undock while keeping the lid closed. > > > - Open the lid, which triggers the resume; > > > the machine wakes up well, and X shows up. No problem, so far. > > > - Suspend again, close the lid. > > > - Dock again while keeping the lid closed. > > > - Open the lid, triggering the resume; this wakes up still fine. > > > - At this moment, run xrandr again to re-setup DP-1 > > > xrandr --output eDP-1 --auto --output DP-1-1 --auto --left-of > > > eDP-1 > > > ==> This triggers a hard crash. > > > > > > I could bisect it, and this leaded to the commit a16b7658f4e0 > > > ("drm/i915: Call intel_dp_mst_resume() before resuming > > > displays"). > > > > > > This patch tries to work around the crash by ignoring the failed > > > port > > > from drm_dp_mst_topology_mgr_resume(). They should be handled in > > > hpd > > > later in anyway. > > > > > > v1->v2: just ignore the drm_dp_mst_topology_mgr_resume() error > > > codes > > > instead of postponing. > > > > > > Fixes: a16b7658f4e0 ("drm/i915: Call intel_dp_mst_resume() before > > > resuming displays") > > > Bugzilla: https://bugzilla.suse.com/show_bug.cgi?id=1029634 > > > Signed-off-by: Takashi Iwai <tiwai@xxxxxxx> > > > --- > > > drivers/gpu/drm/i915/intel_dp.c | 3 ++- > > > 1 file changed, 2 insertions(+), 1 deletion(-) > > > > > > diff --git a/drivers/gpu/drm/i915/intel_dp.c > > > b/drivers/gpu/drm/i915/intel_dp.c > > > index d1670b8afbf5..a6c0f0ac16eb 100644 > > > --- a/drivers/gpu/drm/i915/intel_dp.c > > > +++ b/drivers/gpu/drm/i915/intel_dp.c > > > @@ -6041,6 +6041,7 @@ void intel_dp_mst_resume(struct drm_device > > > *dev) > > > > > > ret = > > > drm_dp_mst_topology_mgr_resume(&intel_dig_port->dp.mst_mgr); > > > if (ret) > > > - intel_dp_check_mst_status(&intel_dig_por > > > t- > > > > dp); > > > > > > + DRM_DEBUG_KMS("DP MST resume failed for > > > port-%c\n", > > > + port_name(intel_dig_port- > > > > port)); > > > > > > } > > > } > > > > -- > > Cheers, > > Lyude > > _______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/intel-gfx