Re: [PATCH 7/7] drm/i915/gem: Acquire all vma/objects under reservation_ww_class

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On 6/23/20 6:36 PM, Chris Wilson wrote:
Quoting Thomas Hellström (Intel) (2020-06-23 12:22:11)
Hi, Chris,

On 6/22/20 11:59 AM, Chris Wilson wrote:
In order to actually handle eviction and what not, we need to process
all the objects together under a common lock, reservation_ww_class. As
such, do a memory reservation pass after looking up the object/vma,
which then feeds into the rest of execbuf [relocation, cmdparsing,
flushing and ofc execution].

Signed-off-by: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx>
---
   .../gpu/drm/i915/gem/i915_gem_execbuffer.c    | 91 ++++++++++++++-----
   1 file changed, 70 insertions(+), 21 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
index 46fcbdf8161c..8db2e013465f 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
@@ -53,10 +53,9 @@ struct eb_vma_array {
#define __EXEC_OBJECT_HAS_PIN BIT(31)
   #define __EXEC_OBJECT_HAS_FENCE             BIT(30)
-#define __EXEC_OBJECT_HAS_PAGES              BIT(29)
-#define __EXEC_OBJECT_NEEDS_MAP              BIT(28)
-#define __EXEC_OBJECT_NEEDS_BIAS     BIT(27)
-#define __EXEC_OBJECT_INTERNAL_FLAGS (~0u << 27) /* all of the above */
+#define __EXEC_OBJECT_NEEDS_MAP              BIT(29)
+#define __EXEC_OBJECT_NEEDS_BIAS     BIT(28)
+#define __EXEC_OBJECT_INTERNAL_FLAGS (~0u << 28) /* all of the above */
#define __EXEC_HAS_RELOC BIT(31)
   #define __EXEC_INTERNAL_FLAGS       (~0u << 31)
@@ -241,6 +240,8 @@ struct i915_execbuffer {
       struct intel_context *context; /* logical state for the request */
       struct i915_gem_context *gem_context; /** caller's context */
+ struct dma_fence *mm_fence;
+
       struct i915_request *request; /** our request to build */
       struct eb_vma *batch; /** identity of the batch obj/vma */
       struct i915_vma *trampoline; /** trampoline used for chaining */
@@ -331,12 +332,7 @@ static inline void eb_unreserve_vma(struct eb_vma *ev)
       if (ev->flags & __EXEC_OBJECT_HAS_PIN)
               __i915_vma_unpin(vma);
- if (ev->flags & __EXEC_OBJECT_HAS_PAGES)
-             i915_gem_object_unpin_pages(vma->obj);
-
-     ev->flags &= ~(__EXEC_OBJECT_HAS_PIN |
-                    __EXEC_OBJECT_HAS_FENCE |
-                    __EXEC_OBJECT_HAS_PAGES);
+     ev->flags &= ~(__EXEC_OBJECT_HAS_PIN | __EXEC_OBJECT_HAS_FENCE);
   }
static void eb_vma_array_destroy(struct kref *kref)
@@ -667,6 +663,55 @@ eb_add_vma(struct i915_execbuffer *eb,
       list_add_tail(&ev->lock_link, &eb->lock);
   }
+static int eb_vma_get_pages(struct i915_execbuffer *eb,
+                         struct eb_vma *ev,
+                         u64 idx)
+{
+     struct i915_vma *vma = ev->vma;
+     int err;
+
+     /* XXX also preallocate PD for vma */
+
+     err = ____i915_gem_object_get_pages_async(vma->obj);
+     if (err)
+             return err;
+
+     return i915_active_ref(&vma->obj->mm.active, idx, eb->mm_fence);
+}
+
+static int eb_reserve_mm(struct i915_execbuffer *eb)
+{
+     const u64 idx = eb->context->timeline->fence_context;
+     struct ww_acquire_ctx acquire;
+     struct eb_vma *ev;
+     int err;
+
+     eb->mm_fence = __dma_fence_create_proxy(0, 0);
+     if (!eb->mm_fence)
+             return -ENOMEM;
Question: eb is local to this thread, right, so eb->mm_fence is not
considered "published" yet?

+
+     ww_acquire_init(&acquire, &reservation_ww_class);
+
+     err = eb_lock_vma(eb, &acquire);
+     if (err)
+             goto out;
+
+     ww_acquire_done(&acquire);
+
+     list_for_each_entry(ev, &eb->lock, lock_link) {
+             struct i915_vma *vma = ev->vma;
+
+             if (err == 0)
+                     err = eb_vma_get_pages(eb, ev, idx);
I figure this is where you publish the proxy fence? If so, the fence
signaling critical path starts with this loop,
Hmm, actually at this moment, the fence is still very much internal
being only used as a reference token,
I think as long as another thread, running in this driver or another gpu driver can theoretically reference the fence pointer from the reservation object and wait for the fence it's considered published.

Also the ww_mutexes in this context are really all about grabbing a random set of resources and associate them with a point in a timeline, as the ww_mutexes are released, the fence pointer(s) need to point to published fence(s).

/Thomas


_______________________________________________
Intel-gfx mailing list
Intel-gfx@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/intel-gfx




[Index of Archives]     [AMD Graphics]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux