Quoting Thomas Hellström (Intel) (2020-06-24 06:42:33) > > On 6/23/20 11:15 PM, Chris Wilson wrote: > > Quoting Thomas Hellström (Intel) (2020-06-23 21:31:38) > >> On 6/23/20 8:41 PM, Chris Wilson wrote: > >>> Quoting Thomas Hellström (Intel) (2020-06-23 19:21:28) > >>>> On 6/23/20 6:36 PM, Chris Wilson wrote: > >>>>> Quoting Thomas Hellström (Intel) (2020-06-23 12:22:11) > >>>>>> Hi, Chris, > >>>>>> > >>>>>> On 6/22/20 11:59 AM, Chris Wilson wrote: > >>>>>>> In order to actually handle eviction and what not, we need to process > >>>>>>> all the objects together under a common lock, reservation_ww_class. As > >>>>>>> such, do a memory reservation pass after looking up the object/vma, > >>>>>>> which then feeds into the rest of execbuf [relocation, cmdparsing, > >>>>>>> flushing and ofc execution]. > >>>>>>> > >>>>>>> Signed-off-by: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx> > >>>>>>> --- > >>>>>>> .../gpu/drm/i915/gem/i915_gem_execbuffer.c | 91 ++++++++++++++----- > >>>>>>> 1 file changed, 70 insertions(+), 21 deletions(-) > >>>>>>> > >>>>>>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c > >>>>>>> index 46fcbdf8161c..8db2e013465f 100644 > >>>>>>> --- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c > >>>>>>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c > >>>>>>> @@ -53,10 +53,9 @@ struct eb_vma_array { > >>>>>>> > >>>>>>> #define __EXEC_OBJECT_HAS_PIN BIT(31) > >>>>>>> #define __EXEC_OBJECT_HAS_FENCE BIT(30) > >>>>>>> -#define __EXEC_OBJECT_HAS_PAGES BIT(29) > >>>>>>> -#define __EXEC_OBJECT_NEEDS_MAP BIT(28) > >>>>>>> -#define __EXEC_OBJECT_NEEDS_BIAS BIT(27) > >>>>>>> -#define __EXEC_OBJECT_INTERNAL_FLAGS (~0u << 27) /* all of the above */ > >>>>>>> +#define __EXEC_OBJECT_NEEDS_MAP BIT(29) > >>>>>>> +#define __EXEC_OBJECT_NEEDS_BIAS BIT(28) > >>>>>>> +#define __EXEC_OBJECT_INTERNAL_FLAGS (~0u << 28) /* all of the above */ > >>>>>>> > >>>>>>> #define __EXEC_HAS_RELOC BIT(31) > >>>>>>> #define __EXEC_INTERNAL_FLAGS (~0u << 31) > >>>>>>> @@ -241,6 +240,8 @@ struct i915_execbuffer { > >>>>>>> struct intel_context *context; /* logical state for the request */ > >>>>>>> struct i915_gem_context *gem_context; /** caller's context */ > >>>>>>> > >>>>>>> + struct dma_fence *mm_fence; > >>>>>>> + > >>>>>>> struct i915_request *request; /** our request to build */ > >>>>>>> struct eb_vma *batch; /** identity of the batch obj/vma */ > >>>>>>> struct i915_vma *trampoline; /** trampoline used for chaining */ > >>>>>>> @@ -331,12 +332,7 @@ static inline void eb_unreserve_vma(struct eb_vma *ev) > >>>>>>> if (ev->flags & __EXEC_OBJECT_HAS_PIN) > >>>>>>> __i915_vma_unpin(vma); > >>>>>>> > >>>>>>> - if (ev->flags & __EXEC_OBJECT_HAS_PAGES) > >>>>>>> - i915_gem_object_unpin_pages(vma->obj); > >>>>>>> - > >>>>>>> - ev->flags &= ~(__EXEC_OBJECT_HAS_PIN | > >>>>>>> - __EXEC_OBJECT_HAS_FENCE | > >>>>>>> - __EXEC_OBJECT_HAS_PAGES); > >>>>>>> + ev->flags &= ~(__EXEC_OBJECT_HAS_PIN | __EXEC_OBJECT_HAS_FENCE); > >>>>>>> } > >>>>>>> > >>>>>>> static void eb_vma_array_destroy(struct kref *kref) > >>>>>>> @@ -667,6 +663,55 @@ eb_add_vma(struct i915_execbuffer *eb, > >>>>>>> list_add_tail(&ev->lock_link, &eb->lock); > >>>>>>> } > >>>>>>> > >>>>>>> +static int eb_vma_get_pages(struct i915_execbuffer *eb, > >>>>>>> + struct eb_vma *ev, > >>>>>>> + u64 idx) > >>>>>>> +{ > >>>>>>> + struct i915_vma *vma = ev->vma; > >>>>>>> + int err; > >>>>>>> + > >>>>>>> + /* XXX also preallocate PD for vma */ > >>>>>>> + > >>>>>>> + err = ____i915_gem_object_get_pages_async(vma->obj); > >>>>>>> + if (err) > >>>>>>> + return err; > >>>>>>> + > >>>>>>> + return i915_active_ref(&vma->obj->mm.active, idx, eb->mm_fence); > >>>>>>> +} > >>>>>>> + > >>>>>>> +static int eb_reserve_mm(struct i915_execbuffer *eb) > >>>>>>> +{ > >>>>>>> + const u64 idx = eb->context->timeline->fence_context; > >>>>>>> + struct ww_acquire_ctx acquire; > >>>>>>> + struct eb_vma *ev; > >>>>>>> + int err; > >>>>>>> + > >>>>>>> + eb->mm_fence = __dma_fence_create_proxy(0, 0); > >>>>>>> + if (!eb->mm_fence) > >>>>>>> + return -ENOMEM; > >>>>>> Question: eb is local to this thread, right, so eb->mm_fence is not > >>>>>> considered "published" yet? > >>>>>> > >>>>>>> + > >>>>>>> + ww_acquire_init(&acquire, &reservation_ww_class); > >>>>>>> + > >>>>>>> + err = eb_lock_vma(eb, &acquire); > >>>>>>> + if (err) > >>>>>>> + goto out; > >>>>>>> + > >>>>>>> + ww_acquire_done(&acquire); > >>>>>>> + > >>>>>>> + list_for_each_entry(ev, &eb->lock, lock_link) { > >>>>>>> + struct i915_vma *vma = ev->vma; > >>>>>>> + > >>>>>>> + if (err == 0) > >>>>>>> + err = eb_vma_get_pages(eb, ev, idx); > >>>>>> I figure this is where you publish the proxy fence? If so, the fence > >>>>>> signaling critical path starts with this loop, > >>>>> Hmm, actually at this moment, the fence is still very much internal > >>>>> being only used as a reference token, > >>>> I think as long as another thread, running in this driver or another gpu > >>>> driver can theoretically reference the fence pointer from the > >>>> reservation object and wait for the fence it's considered published. > >>> It's not in the reservation object. > >>> > >>>> Also the ww_mutexes in this context are really all about grabbing a > >>>> random set of resources and associate them with a point in a timeline, > >>>> as the ww_mutexes are released, the fence pointer(s) need to point to > >>>> published fence(s). > >>> That's not the purpose of these fences, though. They exist to provide > >>> reference counting on the backing store, along side the migration fence. > >>> It's extra detail tacked on the equivalent of bo->moving. > >>> > >>> That is not to say that one could build up an async migration chain which > >>> form a graph back to these, that chain could only be formed once the > >>> operation itself has been published in the dma_resv though. > >> Hmm. So let's say another thread grabs one of the just released > >> ww_mutexes and wants to schedule a blit from one of the buffers in the > >> current operation with high priority. How would that thread know how to > >> order that blit operation w r t the current operation? > > Why would it order? > So let's say it's an eviction blit, needing to incorporate the data from > the current operation. Or, for that matter a ttm-style cpu copy eviction: > > ww_mutex_lock > wait_for_idle > copy > ww_mutex_unlock We have a scheduler. Eviction does not block. Submission never blocks. lock swap allocation blocks unlock copy. -Chris _______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/intel-gfx