Quoting Thomas Hellström (Intel) (2020-06-23 12:22:11) > Hi, Chris, > > On 6/22/20 11:59 AM, Chris Wilson wrote: > > In order to actually handle eviction and what not, we need to process > > all the objects together under a common lock, reservation_ww_class. As > > such, do a memory reservation pass after looking up the object/vma, > > which then feeds into the rest of execbuf [relocation, cmdparsing, > > flushing and ofc execution]. > > > > Signed-off-by: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx> > > --- > > .../gpu/drm/i915/gem/i915_gem_execbuffer.c | 91 ++++++++++++++----- > > 1 file changed, 70 insertions(+), 21 deletions(-) > > > > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c > > index 46fcbdf8161c..8db2e013465f 100644 > > --- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c > > +++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c > > @@ -53,10 +53,9 @@ struct eb_vma_array { > > > > #define __EXEC_OBJECT_HAS_PIN BIT(31) > > #define __EXEC_OBJECT_HAS_FENCE BIT(30) > > -#define __EXEC_OBJECT_HAS_PAGES BIT(29) > > -#define __EXEC_OBJECT_NEEDS_MAP BIT(28) > > -#define __EXEC_OBJECT_NEEDS_BIAS BIT(27) > > -#define __EXEC_OBJECT_INTERNAL_FLAGS (~0u << 27) /* all of the above */ > > +#define __EXEC_OBJECT_NEEDS_MAP BIT(29) > > +#define __EXEC_OBJECT_NEEDS_BIAS BIT(28) > > +#define __EXEC_OBJECT_INTERNAL_FLAGS (~0u << 28) /* all of the above */ > > > > #define __EXEC_HAS_RELOC BIT(31) > > #define __EXEC_INTERNAL_FLAGS (~0u << 31) > > @@ -241,6 +240,8 @@ struct i915_execbuffer { > > struct intel_context *context; /* logical state for the request */ > > struct i915_gem_context *gem_context; /** caller's context */ > > > > + struct dma_fence *mm_fence; > > + > > struct i915_request *request; /** our request to build */ > > struct eb_vma *batch; /** identity of the batch obj/vma */ > > struct i915_vma *trampoline; /** trampoline used for chaining */ > > @@ -331,12 +332,7 @@ static inline void eb_unreserve_vma(struct eb_vma *ev) > > if (ev->flags & __EXEC_OBJECT_HAS_PIN) > > __i915_vma_unpin(vma); > > > > - if (ev->flags & __EXEC_OBJECT_HAS_PAGES) > > - i915_gem_object_unpin_pages(vma->obj); > > - > > - ev->flags &= ~(__EXEC_OBJECT_HAS_PIN | > > - __EXEC_OBJECT_HAS_FENCE | > > - __EXEC_OBJECT_HAS_PAGES); > > + ev->flags &= ~(__EXEC_OBJECT_HAS_PIN | __EXEC_OBJECT_HAS_FENCE); > > } > > > > static void eb_vma_array_destroy(struct kref *kref) > > @@ -667,6 +663,55 @@ eb_add_vma(struct i915_execbuffer *eb, > > list_add_tail(&ev->lock_link, &eb->lock); > > } > > > > +static int eb_vma_get_pages(struct i915_execbuffer *eb, > > + struct eb_vma *ev, > > + u64 idx) > > +{ > > + struct i915_vma *vma = ev->vma; > > + int err; > > + > > + /* XXX also preallocate PD for vma */ > > + > > + err = ____i915_gem_object_get_pages_async(vma->obj); > > + if (err) > > + return err; > > + > > + return i915_active_ref(&vma->obj->mm.active, idx, eb->mm_fence); > > +} > > + > > +static int eb_reserve_mm(struct i915_execbuffer *eb) > > +{ > > + const u64 idx = eb->context->timeline->fence_context; > > + struct ww_acquire_ctx acquire; > > + struct eb_vma *ev; > > + int err; > > + > > + eb->mm_fence = __dma_fence_create_proxy(0, 0); > > + if (!eb->mm_fence) > > + return -ENOMEM; > > Question: eb is local to this thread, right, so eb->mm_fence is not > considered "published" yet? > > > + > > + ww_acquire_init(&acquire, &reservation_ww_class); > > + > > + err = eb_lock_vma(eb, &acquire); > > + if (err) > > + goto out; > > + > > + ww_acquire_done(&acquire); > > + > > + list_for_each_entry(ev, &eb->lock, lock_link) { > > + struct i915_vma *vma = ev->vma; > > + > > + if (err == 0) > > + err = eb_vma_get_pages(eb, ev, idx); > > I figure this is where you publish the proxy fence? If so, the fence > signaling critical path starts with this loop, Hmm, actually at this moment, the fence is still very much internal being only used as a reference token, and the async fence for the pages is still only in the internal migration slot [along side the reference tokens]. Those fences will not be attached to the dma_resv until the chains are completed in move-to-gpu. That might be enough of a difference to consider. -Chris _______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/intel-gfx