On Tue, 19 Mar 2019 at 11:58, Chris Wilson <chris@xxxxxxxxxxxxxxxxxx> wrote: > > When we return pages to the system, we ensure that they are marked as > being in the CPU domain since any external access is uncontrolled and we > must assume the worst. This means that we need to always flush the pages > on acquisition if we need to use them on the GPU, and from the beginning > have used set-domain. Set-domain is overkill for the purpose as it is a > general synchronisation barrier, but our intent is to only flush the > pages being swapped in. If we move that flush into the pages acquisition > phase, we know then that when we have obj->mm.pages, they are coherent > with the GPU and need only maintain that status without resorting to > heavy handed use of set-domain. > > The principle knock-on effect for userspace is through mmap-gtt > pagefaulting. Our uAPI has always implied that the GTT mmap was async > (especially as when any pagefault occurs is unpredicatable to userspace) > and so userspace had to apply explicit domain control itself > (set-domain). However, swapping is transparent to the kernel, and so on > first fault we need to acquire the pages and make them coherent for > access through the GTT. Our use of set-domain here leaks into the uABI > that the first pagefault was synchronous. This is unintentional and > baring a few igt should be unoticed, nevertheless we bump the uABI > version for mmap-gtt to reflect the change in behaviour. > > Another implication of the change is that gem_create() is presumed to > create an object that is coherent with the CPU and is in the CPU write > domain, so a set-domain(CPU) following a gem_create() would be a minor > operation that merely checked whether we could allocate all pages for > the object. On applying this change, a set-domain(CPU) causes a clflush > as we acquire the pages. This will have a small impact on mesa as we move > the clflush here on !llc from execbuf time to create, but that should > have minimal performance impact as the same clflush exists but is now > done early and because of the clflush issue, userspace recycles bo and > so should resist allocating fresh objects. > > Internally, the presumption that objects are created in the CPU > write-domain and remain so through writes to obj->mm.mapping is more > prevalent than I expect; but easy enough to catch and apply a manual > flush. > > Signed-off-by: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx> > Cc: Matthew Auld <matthew.william.auld@xxxxxxxxx> > Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@xxxxxxxxx> > Cc: Antonio Argenziano <antonio.argenziano@xxxxxxxxx> > Cc: Joonas Lahtinen <joonas.lahtinen@xxxxxxxxxxxxxxx> > --- > drivers/gpu/drm/i915/i915_drv.h | 8 +++ > drivers/gpu/drm/i915/i915_gem.c | 57 ++++++++++++----- > drivers/gpu/drm/i915/i915_gem_dmabuf.c | 1 + > drivers/gpu/drm/i915/i915_gem_execbuffer.c | 7 +-- > drivers/gpu/drm/i915/i915_gem_render_state.c | 2 +- > drivers/gpu/drm/i915/i915_perf.c | 4 +- > drivers/gpu/drm/i915/intel_engine_cs.c | 4 +- > drivers/gpu/drm/i915/intel_lrc.c | 63 +++++++++---------- > drivers/gpu/drm/i915/intel_ringbuffer.c | 62 +++++++----------- > drivers/gpu/drm/i915/selftests/huge_pages.c | 5 +- > .../gpu/drm/i915/selftests/i915_gem_context.c | 17 ++--- > .../gpu/drm/i915/selftests/i915_gem_dmabuf.c | 1 + > .../gpu/drm/i915/selftests/i915_gem_evict.c | 4 +- > drivers/gpu/drm/i915/selftests/i915_request.c | 14 ++--- > drivers/gpu/drm/i915/selftests/igt_spinner.c | 2 +- > .../gpu/drm/i915/selftests/intel_hangcheck.c | 2 +- > drivers/gpu/drm/i915/selftests/intel_lrc.c | 5 +- > .../drm/i915/selftests/intel_workarounds.c | 3 + > 18 files changed, 127 insertions(+), 134 deletions(-) > > diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h > index c65c2e6649df..395aa9d5ba02 100644 > --- a/drivers/gpu/drm/i915/i915_drv.h > +++ b/drivers/gpu/drm/i915/i915_drv.h > @@ -2959,6 +2959,14 @@ i915_coherent_map_type(struct drm_i915_private *i915) > void *__must_check i915_gem_object_pin_map(struct drm_i915_gem_object *obj, > enum i915_map_type type); > > +void __i915_gem_object_flush_map(struct drm_i915_gem_object *obj, > + unsigned long offset, > + unsigned long size); > +static inline void i915_gem_object_flush_map(struct drm_i915_gem_object *obj) > +{ > + __i915_gem_object_flush_map(obj, 0, obj->base.size); > +} > + > /** > * i915_gem_object_unpin_map - releases an earlier mapping > * @obj: the object to unmap > diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c > index b7086c8d4726..41d96414ef18 100644 > --- a/drivers/gpu/drm/i915/i915_gem.c > +++ b/drivers/gpu/drm/i915/i915_gem.c > @@ -1713,6 +1713,9 @@ static unsigned int tile_row_pages(const struct drm_i915_gem_object *obj) > * 2 - Recognise WC as a separate cache domain so that we can flush the > * delayed writes via GTT before performing direct access via WC. > * > + * 3 - Remove implicit set-domain(GTT) and synchronisation on initial > + * pagefault; swapin remains transparent. > + * > * Restrictions: > * > * * snoopable objects cannot be accessed via the GTT. It can cause machine > @@ -1740,7 +1743,7 @@ static unsigned int tile_row_pages(const struct drm_i915_gem_object *obj) > */ > int i915_gem_mmap_gtt_version(void) > { > - return 2; > + return 3; > } > > static inline struct i915_ggtt_view > @@ -1808,17 +1811,6 @@ vm_fault_t i915_gem_fault(struct vm_fault *vmf) > > trace_i915_gem_object_fault(obj, page_offset, true, write); > > - /* Try to flush the object off the GPU first without holding the lock. > - * Upon acquiring the lock, we will perform our sanity checks and then > - * repeat the flush holding the lock in the normal manner to catch cases > - * where we are gazumped. > - */ > - ret = i915_gem_object_wait(obj, > - I915_WAIT_INTERRUPTIBLE, > - MAX_SCHEDULE_TIMEOUT); > - if (ret) > - goto err; > - > ret = i915_gem_object_pin_pages(obj); > if (ret) > goto err; > @@ -1874,10 +1866,6 @@ vm_fault_t i915_gem_fault(struct vm_fault *vmf) > goto err_unlock; > } > > - ret = i915_gem_object_set_to_gtt_domain(obj, write); > - if (ret) > - goto err_unpin; > - > ret = i915_vma_pin_fence(vma); > if (ret) > goto err_unpin; > @@ -2534,6 +2522,14 @@ void __i915_gem_object_set_pages(struct drm_i915_gem_object *obj, > > lockdep_assert_held(&obj->mm.lock); > > + /* Make the pages coherent with the GPU (flushing any swapin). */ > + if (obj->cache_dirty) { > + obj->write_domain = 0; > + if (i915_gem_object_has_struct_page(obj)) > + drm_clflush_sg(pages); > + obj->cache_dirty = false; > + } Is it worth adding some special casing here for volatile objects, so that we avoid doing the clflush_sg every time we do set_pages for !llc? if (obj->cache_dirty && obj->mm.madvise == WILLNEED) Or is that meh? _______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/intel-gfx