Re: [PATCH v1] drm/i915/guc: Fix a fw content lost issue after it is evicted

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 10/11/15 23:27, yu.dai@xxxxxxxxx wrote:
From: Alex Dai <yu.dai@xxxxxxxxx>

We keep a copy of GuC fw in a GEM obj. However its content is lost
if the GEM obj is swapped (igt/gem_evict_*). Therefore, the later
fw loading during GPU reset will fail. Mark the obj dirty after
copying data into the pages. So its content will be kept during
swapped out.

Signed-off-by: Alex Dai <yu.dai@xxxxxxxxx>
---
  drivers/gpu/drm/i915/i915_gem.c | 1 +
  1 file changed, 1 insertion(+)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index f1e3fde..3b15167 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -5199,6 +5199,7 @@ i915_gem_object_create_from_data(struct drm_device *dev,
  	i915_gem_object_pin_pages(obj);
  	sg = obj->pages;
  	bytes = sg_copy_from_buffer(sg->sgl, sg->nents, (void *)data, size);
+	obj->dirty = 1;
  	i915_gem_object_unpin_pages(obj);

  	if (WARN_ON(bytes != size)) {

Hi,

although I liked this at first, on further checking I've found that there are other cases where the CPU writes on a GEM object without marking it dirty, so in theory all of those could also be subject to the same issue.

Functions that may suffer from this problem include copy_batch(),
relocate_entry_cpu(), and render_state_setup(); hence, the objects that might be affected (CPU updates discarded) would in general be batch buffers.

Obviously also i915_gem_object_create_from_data() currently used only for firmware objects; and populate_lr_context() also modifies a GEM object using the CPU, but it already explicitly marks the object dirty and so avoid any problems.

At present, setting an object into the GTT domain with intent-to-write causes the object to be marked as dirty, whereas setting it into the CPU domain with intent-to-write doesn't. Since the obj->dirty flag doesn't mean "CPU and GPU views differ" but rather "In-memory version is newer than backing store", setting it should depend only on the intent-to-write flag, not on which domain it's being put into.

So my preferred solution would be to set the obj->dirty flag inside i915_gem_object_set_to_cpu_domain(obj, true). Then objects that have been written by the CPU would be swapped out to backing store rather than discarded if the shrinker decides to reclaim the memory.

To make this work properly, we also need to audit all the callers of i915_gem_object_set_to_cpu_domain(), because I think in several places the callers pass the second argument as true where it should be false; this would cause objects to unnecessarily be marked dirty, increasing swap requirements and impacting performance.

I'll follow up with an actual patch once people have had an opportunity to check that the above analysis is accurate.

.Dave.
_______________________________________________
Intel-gfx mailing list
Intel-gfx@xxxxxxxxxxxxxxxxxxxxx
http://lists.freedesktop.org/mailman/listinfo/intel-gfx




[Index of Archives]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux