Re: [PATCH 6/6] drm/i915: Migrate stolen objects before hibernation

Dave Gordon <david.s.gordon@xxxxxxxxx> · Wed, 9 Dec 2015 19:35:26 +0000

On 09/12/15 12:46, ankitprasad.r.sharma@xxxxxxxxx wrote:
From: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx>

Ville reminded us that stolen memory is not preserved across
hibernation, and a result of this was that context objects now being
allocated from stolen were being corrupted on S4 and promptly hanging
the GPU on resume.

We want to utilise stolen for as much as possible (nothing else will use
that wasted memory otherwise), so we need a strategy for handling
general objects allocated from stolen and hibernation. A simple solution
is to do a CPU copy through the GTT of the stolen object into a fresh
shmemfs backing store and thenceforth treat it as a normal objects. This
can be refined in future to either use a GPU copy to avoid the slow
uncached reads (though it's hibernation!) and recreate stolen objects
upon resume/first-use. For now, a simple approach should suffice for
testing the object migration.

v2:
Swap PTE for pinned bindings over to the shmemfs. This adds a
complicated dance, but is required as many stolen objects are likely to
be pinned for use by the hardware. Swapping the PTEs should not result
in externally visible behaviour, as each PTE update should be atomic and
the two pages identical. (danvet)

safe-by-default, or the principle of least surprise. We need a new flag
to mark objects that we can wilfully discard and recreate across
hibernation. (danvet)

Just use the global_list rather than invent a new stolen_list. This is
the slowpath hibernate and so adding a new list and the associated
complexity isn't worth it.

v3: Rebased on drm-intel-nightly (Ankit)

v4: Use insert_page to map stolen memory backed pages for migration to
shmem (Chris)

v5: Acquire mutex lock while copying stolen buffer objects to shmem (Chris)

Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
Signed-off-by: Ankitprasad Sharma <ankitprasad.r.sharma@xxxxxxxxx>
---
  drivers/gpu/drm/i915/i915_drv.c         |  17 ++-
  drivers/gpu/drm/i915/i915_drv.h         |   7 +
  drivers/gpu/drm/i915/i915_gem.c         | 232 ++++++++++++++++++++++++++++++--
  drivers/gpu/drm/i915/intel_display.c    |   3 +
  drivers/gpu/drm/i915/intel_fbdev.c      |   6 +
  drivers/gpu/drm/i915/intel_pm.c         |   2 +
  drivers/gpu/drm/i915/intel_ringbuffer.c |   6 +
  7 files changed, 261 insertions(+), 12 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c
index 9f55209..2bb9e9e 100644
--- a/drivers/gpu/drm/i915/i915_drv.c
+++ b/drivers/gpu/drm/i915/i915_drv.c
@@ -1036,6 +1036,21 @@ static int i915_pm_suspend(struct device *dev)
  	return i915_drm_suspend(drm_dev);
  }

+static int i915_pm_freeze(struct device *dev)
+{
+	int ret;
+
+	ret = i915_gem_freeze(pci_get_drvdata(to_pci_dev(dev)));
+	if (ret)
+		return ret;
+
+	ret = i915_pm_suspend(dev);
+	if (ret)
+		return ret;
+
+	return 0;
+}
+
  static int i915_pm_suspend_late(struct device *dev)
  {
  	struct drm_device *drm_dev = dev_to_i915(dev)->dev;
@@ -1700,7 +1715,7 @@ static const struct dev_pm_ops i915_pm_ops = {
  	 * @restore, @restore_early : called after rebooting and restoring the
  	 *                            hibernation image [PMSG_RESTORE]
  	 */
-	.freeze = i915_pm_suspend,
+	.freeze = i915_pm_freeze,
  	.freeze_late = i915_pm_suspend_late,
  	.thaw_early = i915_pm_resume_early,
  	.thaw = i915_pm_resume,
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index e0b09b0..0d18b07 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -2080,6 +2080,12 @@ struct drm_i915_gem_object {
  	 * Advice: are the backing pages purgeable?
  	 */
  	unsigned int madv:2;
+	/**
+	 * Whereas madv is for userspace, there are certain situations
+	 * where we want I915_MADV_DONTNEED behaviour on internal objects
+	 * without conflating the userspace setting.
+	 */
+	unsigned int internal_volatile:1;

Does this new flag need to be examined by other code that currently 
checks 'madv', e.g. put_pages() ? Or does this indicate 
not-really-volatile-in-normal-use-only-across-hibernation ?

.Dave.
_______________________________________________
Intel-gfx mailing list
Intel-gfx@xxxxxxxxxxxxxxxxxxxxx
http://lists.freedesktop.org/mailman/listinfo/intel-gfx