Re: [PATCH 4/7] drm/i915: Combine multiple internal plists into the same i915_priolist bucket

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On 28/09/2018 14:08, Chris Wilson wrote:
Quoting Tvrtko Ursulin (2018-09-28 14:04:34)

On 27/09/2018 14:05, Chris Wilson wrote:
Quoting Tvrtko Ursulin (2018-09-25 10:48:44)

On 25/09/2018 09:32, Chris Wilson wrote:
As we are about to allow ourselves to slightly bump the user priority
into a few different sublevels, packthose internal priority lists
into the same i915_priolist to keep the rbtree compact and avoid having
to allocate the default user priority even after the internal bumping.
The downside to having an requests[] rather than a node per active list,
is that we then have to walk over the empty higher priority lists. To
compensate, we track the active buckets and use a small bitmap to skip
over any inactive ones.

v2: Use MASK of internal levels to simplify our usage.

Signed-off-by: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx>
Cc: Tvrtko Ursulin <tvrtko.ursulin@xxxxxxxxx>
---
    drivers/gpu/drm/i915/intel_engine_cs.c      |  6 +-
    drivers/gpu/drm/i915/intel_guc_submission.c | 12 ++-
    drivers/gpu/drm/i915/intel_lrc.c            | 87 ++++++++++++++-------
    drivers/gpu/drm/i915/intel_ringbuffer.h     | 13 ++-
    4 files changed, 80 insertions(+), 38 deletions(-)

diff --git a/drivers/gpu/drm/i915/intel_engine_cs.c b/drivers/gpu/drm/i915/intel_engine_cs.c
index 217ed3ee1cab..83f2f7774c1f 100644
--- a/drivers/gpu/drm/i915/intel_engine_cs.c
+++ b/drivers/gpu/drm/i915/intel_engine_cs.c
@@ -1534,10 +1534,10 @@ void intel_engine_dump(struct intel_engine_cs *engine,
        count = 0;
        drm_printf(m, "\t\tQueue priority: %d\n", execlists->queue_priority);
        for (rb = rb_first_cached(&execlists->queue); rb; rb = rb_next(rb)) {
-             struct i915_priolist *p =
-                     rb_entry(rb, typeof(*p), node);
+             struct i915_priolist *p = rb_entry(rb, typeof(*p), node);
+             int i;
- list_for_each_entry(rq, &p->requests, sched.link) {
+             priolist_for_each_request(rq, p, i) {
                        if (count++ < MAX_REQUESTS_TO_SHOW - 1)
                                print_request(m, rq, "\t\tQ ");
                        else
diff --git a/drivers/gpu/drm/i915/intel_guc_submission.c b/drivers/gpu/drm/i915/intel_guc_submission.c
index 4874a212754c..ac862b42f6a1 100644
--- a/drivers/gpu/drm/i915/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/intel_guc_submission.c
@@ -746,30 +746,28 @@ static bool __guc_dequeue(struct intel_engine_cs *engine)
        while ((rb = rb_first_cached(&execlists->queue))) {
                struct i915_priolist *p = to_priolist(rb);
                struct i915_request *rq, *rn;
+             int i;
- list_for_each_entry_safe(rq, rn, &p->requests, sched.link) {
+             priolist_for_each_request_consume(rq, rn, p, i) {
                        if (last && rq->hw_context != last->hw_context) {
-                             if (port == last_port) {
-                                     __list_del_many(&p->requests,
-                                                     &rq->sched.link);
+                             if (port == last_port)
                                        goto done;
-                             }
if (submit)
                                        port_assign(port, last);
                                port++;
                        }
- INIT_LIST_HEAD(&rq->sched.link);
+                     list_del_init(&rq->sched.link);
__i915_request_submit(rq);
                        trace_i915_request_in(rq, port_index(port, execlists));
+
                        last = rq;
                        submit = true;
                }
rb_erase_cached(&p->node, &execlists->queue);
-             INIT_LIST_HEAD(&p->requests);
                if (p->priority != I915_PRIORITY_NORMAL)
                        kmem_cache_free(engine->i915->priorities, p);
        }
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index e8de250c3413..b1b3f67d1120 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -259,14 +259,49 @@ intel_lr_context_descriptor_update(struct i915_gem_context *ctx,
        ce->lrc_desc = desc;
    }
-static struct i915_priolist *
+static void assert_priolists(struct intel_engine_execlists * const execlists,
+                          int queue_priority)
+{
+     struct rb_node *rb;
+     int last_prio, i;
+
+     if (!IS_ENABLED(CONFIG_DRM_I915_DEBUG_GEM))
+             return;
+
+     GEM_BUG_ON(rb_first_cached(&execlists->queue) !=
+                rb_first(&execlists->queue.rb_root));
+
+     last_prio = (queue_priority >> I915_USER_PRIORITY_SHIFT) + 1;
+     for (rb = rb_first_cached(&execlists->queue); rb; rb = rb_next(rb)) {
+             struct i915_priolist *p = to_priolist(rb);
+
+             GEM_BUG_ON(p->priority >= last_prio);
+             last_prio = p->priority;
+
+             GEM_BUG_ON(!p->used);
+             for (i = 0; i < ARRAY_SIZE(p->requests); i++) {
+                     if (list_empty(&p->requests[i]))
+                             continue;
+
+                     GEM_BUG_ON(!(p->used & BIT(i)));
+             }
+     }
+}
+
+static struct list_head *
    lookup_priolist(struct intel_engine_cs *engine, int prio)
    {
        struct intel_engine_execlists * const execlists = &engine->execlists;
        struct i915_priolist *p;
        struct rb_node **parent, *rb;
        bool first = true;
+     int idx, i;
+
+     assert_priolists(execlists, INT_MAX);
+ /* buckets sorted from highest [in slot 0] to lowest priority */
+     idx = I915_PRIORITY_MASK - (prio & I915_PRIORITY_MASK);

How about:

#define I915_INTERNAL_PRIO(p) (p & I915_PRIORITY_MASK)

Not convinced. It's the only place we use it (one use of MASK later to
assert correct less). There's two views, here the user/kernel priority
levels are not as important as what we are concerned about are the split
lists with the chunking of lowest priority bits. At the extreme we could
separate the two and make the chunks BITS_PER_LONG -- I think that's a
waste and my goal was simply to avoid kmallocs for the common case of
default user priority.

The intention is that we don't even think about the user/internal split,
and simply consider it an integrated field, with most positive
priorities executing first and most negative last.

I just find MASK - INT confusing, those two data flavours do not match
in my head.  With the local macro I clarified that we are getting an INT
from prio, and then the bit you did not quote I suggested we do not
substract from the mask but from the count - 1. It is effectively the
same thing just IMHO easier to understand while reading the code. You
can skip the macro and decrement from the count, I'd be happy with that
as well.

I swear that COUNT-1 was the bit you complained about last time ;)

Blast.. I complained that count was one when it actually should be zero, and you were also negating the mask for extra confusion.. no big harm done I hope?

I put the comment there to explain why we reverse the index, as leftmost
is highest priority.

Preemptive r-b for COUNT-1 - (prio & I915_PRIORITY_MASK) then?

Yes r-b.

Regards,

Tvrtko

_______________________________________________
Intel-gfx mailing list
Intel-gfx@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/intel-gfx




[Index of Archives]     [AMD Graphics]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux