On Thu, Feb 26, 2015 at 08:50:48PM +0200, Imre Deak wrote: > On to, 2015-02-26 at 10:34 +0100, Bjørn Mork wrote: > > Imre Deak <imre.deak@xxxxxxxxx> writes: > > > > >> That patch fixes the problem, with only pci_set_power_state commented > > >> out. Do you still want me to try with pci_disable_device() commented > > >> out as well? > > > > > > No, but it would help if you could still try the two attached patch > > > separately, without any of the previous workarounds. Based on the > > > result, we'll follow up with a fix that adds for your specific platform > > > either of the attached workarounds or simply avoids putting the device > > > into D3 (corresponding to the patch you already tried). > > > > None of those patches made any difference. The laptop still hangs at > > power-off. > > > > Not really surprising, is it? Previous testing shows that the hang > > occurs at the pci_set_power_state(drm_dev->pdev, PCI_D3hot) call in the > > poweroff_late hook. It is hard to see how replacing it by an attempt to > > set D3cold, or adding any call after this point, could possibly change > > anything. The system is stil hanging at the pci_set_power_state() call. > > Judging from the blinking LED, I assume that it's not > pci_set_power_state() that hangs the machine, but the hang happens in > BIOS code. > > > The generic pci-driver code will put the i915 device into PCI_D3hot for > > you, won't it? Why do you need to duplicate that in the driver, > > *knowing* that doing so breaks (at least some) systems? > > Letting the pci core put the device into D3 wouldn't get rid of the problem. > It's putting the device into D3 in the first place what causes it. > > > I honestly don't think this "let's try some random code" is the proper > > way to fix this bug (or any other bug for that matter). You need to > > start understanding the code you write, and the first step is by > > actually explaining the changes you make. > > We have a good understanding about the issue: the BIOS on your platform > does something unexpected behind the back of the driver/kernel. In that > sense the patches were not random, for instance the first one is based on > an existing workaround for an issue that resembles quite a lot yours, see > pci_pm_poweroff_noirq(). > > > I also believe that you completely miss the fact that this bug has > > survived a full release cycle before you became aware of it, and the > > implications this has wrt other affected systems/users. I assume you > > understand that my system isn't one-of-a-kind, This means that there are > > other affected users with identical/similar systems. Now, if none of > > those users reported the bug to you (we all know why: Linux kernel > > development is currently limited by the available testing resources, NOT > > by the available developer resources), then how do you know that there > > aren't a number of other systems affected as well? > > > > Let me answer that for you: You don't. > > > > Which is why you must explain the mechanism triggering the bug, proving > > that it is a chipset/system specific issue. Because that's the only way > > you will *know* that you have solved the problem not only for me, but for > > all affected users. > > > > IMHO, the only safe and sane solution at the moment is the revert patch > > I posted. It's a simple fix, reverting back to the *known* working > > state before this regression was introduced. > > > > Then you can start over from there, trying to implement this properly. > > The current way is the proper one that we want for the generic case. The issue > on your platform is the exception, so working around that is a sensible choice. > > Attached is the proposed fix for this issue. > > --Imre > > From 5c23657bc168db12a1ba100ab65fabd305c89c8a Mon Sep 17 00:00:00 2001 > From: Imre Deak <imre.deak@xxxxxxxxx> > Date: Thu, 26 Feb 2015 18:38:53 +0200 > Subject: [PATCH] drm/i915: gm45: work around hang during hibernation > MIME-Version: 1.0 > Content-Type: text/plain; charset=UTF-8 > Content-Transfer-Encoding: 8bit > > Bjørn reported that his machine hang during hibernation and eventually > bisected the problem to the following commit: > > commit da2bc1b9db3351addd293e5b82757efe1f77ed1d > Author: Imre Deak <imre.deak@xxxxxxxxx> > Date: Thu Oct 23 19:23:26 2014 +0300 > > drm/i915: add poweroff_late handler > > The problem seems to be that after the kernel puts the device into D3 > the BIOS still tries to access it, or otherwise assumes that it's in D0. > This is clearly bogus, since ACPI mandates that devices are put into D3 > by the OSPM if they are not wake-up sources. In the future we want to > unify more of the driver's runtime and system suspend paths, for example > by skipping all the system suspend/hibernation hooks if the device is > runtime suspended already. Accordingly for all other platforms the goal > is still to properly power down the device during hibernation. > > Reference: http://lists.freedesktop.org/archives/intel-gfx/2015-February/060633.html > Reported-and-bisected-by: Bjørn Mork <bjorn@xxxxxxx> > Signed-off-by: Imre Deak <imre.deak@xxxxxxxxx> > --- > drivers/gpu/drm/i915/i915_drv.c | 27 ++++++++++++++++++++++----- > 1 file changed, 22 insertions(+), 5 deletions(-) > > diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c > index 4badb23..67d212b 100644 > --- a/drivers/gpu/drm/i915/i915_drv.c > +++ b/drivers/gpu/drm/i915/i915_drv.c > @@ -637,7 +637,7 @@ static int i915_drm_suspend(struct drm_device *dev) > return 0; > } > > -static int i915_drm_suspend_late(struct drm_device *drm_dev) > +static int i915_drm_suspend_late(struct drm_device *drm_dev, bool hibernation) > { > struct drm_i915_private *dev_priv = drm_dev->dev_private; > int ret; > @@ -651,7 +651,14 @@ static int i915_drm_suspend_late(struct drm_device *drm_dev) > } > > pci_disable_device(drm_dev->pdev); > - pci_set_power_state(drm_dev->pdev, PCI_D3hot); > + /* > + * During hibernation on some GM45 platforms the BIOS may try to access > + * the device even though it's already in D3 and hang the machine. So > + * leave the device in D0 on those platforms and hope the BIOS will > + * power down the device properly. > + */ > + if (!(hibernation && IS_GM45(dev_priv))) > + pci_set_power_state(drm_dev->pdev, PCI_D3hot); If we see more of these troublesome machines we might need to apply an even bigger hammer like gen < 5 or so. But as long as there's just 1 report I think this is the right approach. Bjorn, as soon as we have your tested-by that this work we can apply this and shovel it through the backporting machinery. Thanks, Daniel > > return 0; > } > @@ -677,7 +684,7 @@ int i915_suspend_legacy(struct drm_device *dev, pm_message_t state) > if (error) > return error; > > - return i915_drm_suspend_late(dev); > + return i915_drm_suspend_late(dev, false); > } > > static int i915_drm_resume(struct drm_device *dev) > @@ -965,7 +972,17 @@ static int i915_pm_suspend_late(struct device *dev) > if (drm_dev->switch_power_state == DRM_SWITCH_POWER_OFF) > return 0; > > - return i915_drm_suspend_late(drm_dev); > + return i915_drm_suspend_late(drm_dev, false); > +} > + > +static int i915_pm_poweroff_late(struct device *dev) > +{ > + struct drm_device *drm_dev = dev_to_i915(dev)->dev; > + > + if (drm_dev->switch_power_state == DRM_SWITCH_POWER_OFF) > + return 0; > + > + return i915_drm_suspend_late(drm_dev, true); > } > > static int i915_pm_resume_early(struct device *dev) > @@ -1535,7 +1552,7 @@ static const struct dev_pm_ops i915_pm_ops = { > .thaw_early = i915_pm_resume_early, > .thaw = i915_pm_resume, > .poweroff = i915_pm_suspend, > - .poweroff_late = i915_pm_suspend_late, > + .poweroff_late = i915_pm_poweroff_late, > .restore_early = i915_pm_resume_early, > .restore = i915_pm_resume, > > -- > 2.1.0 > > _______________________________________________ > Intel-gfx mailing list > Intel-gfx@xxxxxxxxxxxxxxxxxxxxx > http://lists.freedesktop.org/mailman/listinfo/intel-gfx -- Daniel Vetter Software Engineer, Intel Corporation +41 (0) 79 365 57 48 - http://blog.ffwll.ch -- To unsubscribe from this list: send the line "unsubscribe stable" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html