Re: [Intel-gfx] [PATCH] drm/i915: fix failure to power off after hibernate

Daniel Vetter <daniel@xxxxxxxx> · Thu, 26 Feb 2015 21:01:27 +0100

On Thu, Feb 26, 2015 at 08:50:48PM +0200, Imre Deak wrote:
> On to, 2015-02-26 at 10:34 +0100, Bjørn Mork wrote:
> > Imre Deak <imre.deak@xxxxxxxxx> writes:
> >
> > >> That patch fixes the problem, with only pci_set_power_state commented
> > >> out.  Do you still want me to try with pci_disable_device() commented
> > >> out as well?
> > >
> > > No, but it would help if you could still try the two attached patch
> > > separately, without any of the previous workarounds. Based on the
> > > result, we'll follow up with a fix that adds for your specific platform
> > > either of the attached workarounds or simply avoids putting the device
> > > into D3 (corresponding to the patch you already tried).
> >
> > None of those patches made any difference.  The laptop still hangs at
> > power-off.
> >
> > Not really surprising, is it?  Previous testing shows that the hang
> > occurs at the pci_set_power_state(drm_dev->pdev, PCI_D3hot) call in the
> > poweroff_late hook.  It is hard to see how replacing it by an attempt to
> > set D3cold, or adding any call after this point, could possibly change
> > anything.  The system is stil hanging at the pci_set_power_state() call.
> 
> Judging from the blinking LED, I assume that it's not
> pci_set_power_state() that hangs the machine, but the hang happens in
> BIOS code.
> 
> > The generic pci-driver code will put the i915 device into PCI_D3hot for
> > you, won't it? Why do you need to duplicate that in the driver,
> > *knowing* that doing so breaks (at least some) systems?
> 
> Letting the pci core put the device into D3 wouldn't get rid of the problem.
> It's putting the device into D3 in the first place what causes it.
> 
> > I honestly don't think this "let's try some random code" is the proper
> > way to fix this bug (or any other bug for that matter).  You need to
> > start understanding the code you write, and the first step is by
> > actually explaining the changes you make.
> 
> We have a good understanding about the issue: the BIOS on your platform
> does something unexpected behind the back of the driver/kernel. In that
> sense the patches were not random, for instance the first one is based on
> an existing workaround for an issue that resembles quite a lot yours, see
> pci_pm_poweroff_noirq().
> 
> > I also believe that you completely miss the fact that this bug has
> > survived a full release cycle before you became aware of it, and the
> > implications this has wrt other affected systems/users.  I assume you
> > understand that my system isn't one-of-a-kind, This means that there are
> > other affected users with identical/similar systems.  Now, if none of
> > those users reported the bug to you (we all know why: Linux kernel
> > development is currently limited by the available testing resources, NOT
> > by the available developer resources), then how do you know that there
> > aren't a number of other systems affected as well?
> >
> > Let me answer that for you:  You don't.
> >
> > Which is why you must explain the mechanism triggering the bug, proving
> > that it is a chipset/system specific issue.  Because that's the only way
> > you will *know* that you have solved the problem not only for me, but for
> > all affected users.
> >
> > IMHO, the only safe and sane solution at the moment is the revert patch
> > I posted.  It's a simple fix, reverting back to the *known* working
> > state before this regression was introduced.
> >
> > Then you can start over from there, trying to implement this properly.
> 
> The current way is the proper one that we want for the generic case. The issue
> on your platform is the exception, so working around that is a sensible choice.
> 
> Attached is the proposed fix for this issue.
> 
> --Imre
> 

> From 5c23657bc168db12a1ba100ab65fabd305c89c8a Mon Sep 17 00:00:00 2001
> From: Imre Deak <imre.deak@xxxxxxxxx>
> Date: Thu, 26 Feb 2015 18:38:53 +0200
> Subject: [PATCH] drm/i915: gm45: work around hang during hibernation
> MIME-Version: 1.0
> Content-Type: text/plain; charset=UTF-8
> Content-Transfer-Encoding: 8bit
> 
> Bjørn reported that his machine hang during hibernation and eventually
> bisected the problem to the following commit:
> 
> commit da2bc1b9db3351addd293e5b82757efe1f77ed1d
> Author: Imre Deak <imre.deak@xxxxxxxxx>
> Date:   Thu Oct 23 19:23:26 2014 +0300
> 
>     drm/i915: add poweroff_late handler
> 
> The problem seems to be that after the kernel puts the device into D3
> the BIOS still tries to access it, or otherwise assumes that it's in D0.
> This is clearly bogus, since ACPI mandates that devices are put into D3
> by the OSPM if they are not wake-up sources. In the future we want to
> unify more of the driver's runtime and system suspend paths, for example
> by skipping all the system suspend/hibernation hooks if the device is
> runtime suspended already. Accordingly for all other platforms the goal
> is still to properly power down the device during hibernation.
> 
> Reference: http://lists.freedesktop.org/archives/intel-gfx/2015-February/060633.html
> Reported-and-bisected-by: Bjørn Mork <bjorn@xxxxxxx>
> Signed-off-by: Imre Deak <imre.deak@xxxxxxxxx>
> ---
>  drivers/gpu/drm/i915/i915_drv.c | 27 ++++++++++++++++++++++-----
>  1 file changed, 22 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c
> index 4badb23..67d212b 100644
> --- a/drivers/gpu/drm/i915/i915_drv.c
> +++ b/drivers/gpu/drm/i915/i915_drv.c
> @@ -637,7 +637,7 @@ static int i915_drm_suspend(struct drm_device *dev)
>  	return 0;
>  }
>  
> -static int i915_drm_suspend_late(struct drm_device *drm_dev)
> +static int i915_drm_suspend_late(struct drm_device *drm_dev, bool hibernation)
>  {
>  	struct drm_i915_private *dev_priv = drm_dev->dev_private;
>  	int ret;
> @@ -651,7 +651,14 @@ static int i915_drm_suspend_late(struct drm_device *drm_dev)
>  	}
>  
>  	pci_disable_device(drm_dev->pdev);
> -	pci_set_power_state(drm_dev->pdev, PCI_D3hot);
> +	/*
> +	 * During hibernation on some GM45 platforms the BIOS may try to access
> +	 * the device even though it's already in D3 and hang the machine. So
> +	 * leave the device in D0 on those platforms and hope the BIOS will
> +	 * power down the device properly.
> +	 */
> +	if (!(hibernation && IS_GM45(dev_priv)))
> +		pci_set_power_state(drm_dev->pdev, PCI_D3hot);

If we see more of these troublesome machines we might need to apply an
even bigger hammer like gen < 5 or so. But as long as there's just 1
report I think this is the right approach.

Bjorn, as soon as we have your tested-by that this work we can apply this
and shovel it through the backporting machinery.

Thanks, Daniel

>  
>  	return 0;
>  }
> @@ -677,7 +684,7 @@ int i915_suspend_legacy(struct drm_device *dev, pm_message_t state)
>  	if (error)
>  		return error;
>  
> -	return i915_drm_suspend_late(dev);
> +	return i915_drm_suspend_late(dev, false);
>  }
>  
>  static int i915_drm_resume(struct drm_device *dev)
> @@ -965,7 +972,17 @@ static int i915_pm_suspend_late(struct device *dev)
>  	if (drm_dev->switch_power_state == DRM_SWITCH_POWER_OFF)
>  		return 0;
>  
> -	return i915_drm_suspend_late(drm_dev);
> +	return i915_drm_suspend_late(drm_dev, false);
> +}
> +
> +static int i915_pm_poweroff_late(struct device *dev)
> +{
> +	struct drm_device *drm_dev = dev_to_i915(dev)->dev;
> +
> +	if (drm_dev->switch_power_state == DRM_SWITCH_POWER_OFF)
> +		return 0;
> +
> +	return i915_drm_suspend_late(drm_dev, true);
>  }
>  
>  static int i915_pm_resume_early(struct device *dev)
> @@ -1535,7 +1552,7 @@ static const struct dev_pm_ops i915_pm_ops = {
>  	.thaw_early = i915_pm_resume_early,
>  	.thaw = i915_pm_resume,
>  	.poweroff = i915_pm_suspend,
> -	.poweroff_late = i915_pm_suspend_late,
> +	.poweroff_late = i915_pm_poweroff_late,
>  	.restore_early = i915_pm_resume_early,
>  	.restore = i915_pm_resume,
>  
> -- 
> 2.1.0
> 

> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@xxxxxxxxxxxxxxxxxxxxx
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx

-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch
--
To unsubscribe from this list: send the line "unsubscribe stable" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html