On Tue, Nov 24, 2015 at 05:47:08PM +0000, Dave Gordon wrote: > On 24/11/15 16:04, Daniel Vetter wrote: > >On Tue, Nov 24, 2015 at 03:47:02PM +0000, Dave Gordon wrote: > >>On 10/11/15 23:27, yu.dai@xxxxxxxxx wrote: > >>>From: Alex Dai <yu.dai@xxxxxxxxx> > >>> > >>>We keep a copy of GuC fw in a GEM obj. However its content is lost > >>>if the GEM obj is swapped (igt/gem_evict_*). Therefore, the later > >>>fw loading during GPU reset will fail. Mark the obj dirty after > >>>copying data into the pages. So its content will be kept during > >>>swapped out. > >>> > >>>Signed-off-by: Alex Dai <yu.dai@xxxxxxxxx> > >>>--- > >>> drivers/gpu/drm/i915/i915_gem.c | 1 + > >>> 1 file changed, 1 insertion(+) > >>> > >>>diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c > >>>index f1e3fde..3b15167 100644 > >>>--- a/drivers/gpu/drm/i915/i915_gem.c > >>>+++ b/drivers/gpu/drm/i915/i915_gem.c > >>>@@ -5199,6 +5199,7 @@ i915_gem_object_create_from_data(struct drm_device *dev, > >>> i915_gem_object_pin_pages(obj); > >>> sg = obj->pages; > >>> bytes = sg_copy_from_buffer(sg->sgl, sg->nents, (void *)data, size); > >>>+ obj->dirty = 1; > >>> i915_gem_object_unpin_pages(obj); > >>> > >>> if (WARN_ON(bytes != size)) { > >> > >>Hi, > >> > >>although I liked this at first, on further checking I've found that there > >>are other cases where the CPU writes on a GEM object without marking it > >>dirty, so in theory all of those could also be subject to the same issue. > >> > >>Functions that may suffer from this problem include copy_batch(), > >>relocate_entry_cpu(), and render_state_setup(); hence, the objects that > >>might be affected (CPU updates discarded) would in general be batch buffers. > >> > >>Obviously also i915_gem_object_create_from_data() currently used only for > >>firmware objects; and populate_lr_context() also modifies a GEM object using > >>the CPU, but it already explicitly marks the object dirty and so avoid any > >>problems. > >> > >>At present, setting an object into the GTT domain with intent-to-write > >>causes the object to be marked as dirty, whereas setting it into the CPU > >>domain with intent-to-write doesn't. Since the obj->dirty flag doesn't mean > >>"CPU and GPU views differ" but rather "In-memory version is newer than > >>backing store", setting it should depend only on the intent-to-write flag, > >>not on which domain it's being put into. > >> > >>So my preferred solution would be to set the obj->dirty flag inside > >>i915_gem_object_set_to_cpu_domain(obj, true). Then objects that have been > >>written by the CPU would be swapped out to backing store rather than > >>discarded if the shrinker decides to reclaim the memory. > >> > >>To make this work properly, we also need to audit all the callers of > >>i915_gem_object_set_to_cpu_domain(), because I think in several places the > >>callers pass the second argument as true where it should be false; this > >>would cause objects to unnecessarily be marked dirty, increasing swap > >>requirements and impacting performance. > >> > >>I'll follow up with an actual patch once people have had an opportunity to > >>check that the above analysis is accurate. > > > >set_to_cpu_domain won't cover everyone, since some callers like > >shmem_pwrite/pread do clever tricks with inline flushing. > > > >Also there's a fundamental difference between writing through GTT (where > >we always pin all the pages, even though now we do have partial views), > >and CPU writes, which in most places access the bo page-by-page. > > > >Given that I'm still slightly leaning towards a "cpu writes need to mark > >pages dirty themselves" world-view. At least teaching this to > >set_to_cpu_domain won't be a fully effective magic bullet. > >-Daniel > > shmem_pwrite doesn't call set_to_cpu_domain, and it already explicitly marks > the GEM object as dirty; shmem_pread obviously doesn't need to as it's a > read operation and doesn't change the object contents. > > There's no difference (as far as backing store is concerned) whether the > object is written via a CPU mapping, a GTT mapping, or by the GPU; the end > result is that the object in memory is different from the version on backing > store, and the object should be marked dirty if we want the changes to > persist across eviction. > > There may indeed be additional functions that modify GEM objects in some way > without first setting them into the CPU domain, in which case they will also > be responsible for marking the objects dirty. But all the (correct) callers > of i915_gem_object_set_to_cpu_domain(obj, true) follow it by writing on the > object, and only one marks the object as dirty, so all others are potential > victims of having the shrinker discard their uncommitted changes. > > Fixing it here (i.e when mapping whole objects for CPU write access) means > that object-level operations then don't have to worry about page-level > details. Functions that want to be smarter about tracking updates at the > level of individual pages could still do so, but I don't want to impose that > requirement on code that just wants to manipulate objects. It's still not entirely equivalent: With GTT writes it's impossible to write through the GTT with get_pages having been called, which means someone eventually will do the put_pages. But with CPU writes you can do a set_to_cpu_domain just for flushing, and then write through some completely different path. Just setting obj->dirty only works if you also have the pages. Which means moving the obj->dirty = true into set_to_cpu_domain won't be a solution that works everywhere, but it will work mostly. That's a bit fragile. For create_from_data it will work perfectly, and if we put the set_to_cpu_domain after get_pages it's even obvious that this is so. But since it's fundamentally fragile I don't like it that much. But it's also not awesome that set_to_gtt_domain does this for callers. For lack of clear solutions I'd go with sprinkling obj->dirty or page_set_dirty over callers. Aside: relocate_entry_cpu probably gets away because of the unconditional obj->dirty we do later on, and that we redo all relocs if a fault happens. Still would be good to fix it, just for safety. > BTW, I note that the different implementations of put_pages() are not > consistent about how they treat dirty pages within objects not marked as > dirty. Does this matter? Just checked them. All the struct page backed ones handle it (shmem+userptr) all the others (dma-buf+stolen) don't. Seems consistent. -Daniel -- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch _______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx http://lists.freedesktop.org/mailman/listinfo/intel-gfx