Quoting Tvrtko Ursulin (2019-11-20 13:16:51) > > On 20/11/2019 09:33, Chris Wilson wrote: > > The major drawback of commit 7e34f4e4aad3 ("drm/i915/gen8+: Add RC6 CTX > > corruption WA") is that it disables RC6 while Skylake (and friends) is > > active, and we do not consider the GPU idle until all outstanding > > requests have been retired and the engine switched over to the kernel > > context. If userspace is idle, this task falls onto our background idle > > worker, which only runs roughly once a second, meaning that userspace has > > to have been idle for a couple of seconds before we enable RC6 again. > > Naturally, this causes us to consume considerably more energy than > > before as powersaving is effectively disabled while a display server > > (here's looking at you Xorg) is running. > > > > As execlists will get a completion event as each context is completed, > > we can use this interrupt to queue a retire worker bound to this engine > > to cleanup idle timelines. We will then immediately notice the idle > > engine (without userspace intervention or the aid of the background > > retire worker) and start parking the GPU. Thus during light workloads, > > we will do much more work to idle the GPU faster... Hopefully with > > commensurate power saving! > > > > v2: Watch context completions and only look at those local to the engine > > when retiring to reduce the amount of excess work we perform. > > > > References: https://bugs.freedesktop.org/show_bug.cgi?id=112315 > > References: 7e34f4e4aad3 ("drm/i915/gen8+: Add RC6 CTX corruption WA") > > References: 2248a28384fe ("drm/i915/gen8+: Add RC6 CTX corruption WA") > > Signed-off-by: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx> > > Cc: Tvrtko Ursulin <tvrtko.ursulin@xxxxxxxxx> > > --- > > drivers/gpu/drm/i915/gt/intel_engine_cs.c | 8 +- > > drivers/gpu/drm/i915/gt/intel_engine_types.h | 8 ++ > > drivers/gpu/drm/i915/gt/intel_gt_requests.c | 74 +++++++++++++++++++ > > drivers/gpu/drm/i915/gt/intel_gt_requests.h | 17 ++++- > > drivers/gpu/drm/i915/gt/intel_lrc.c | 9 +++ > > drivers/gpu/drm/i915/gt/intel_timeline.c | 1 + > > .../gpu/drm/i915/gt/intel_timeline_types.h | 3 + > > 7 files changed, 116 insertions(+), 4 deletions(-) > > > > diff --git a/drivers/gpu/drm/i915/gt/intel_engine_cs.c b/drivers/gpu/drm/i915/gt/intel_engine_cs.c > > index b9613d044393..8f6e353caa66 100644 > > --- a/drivers/gpu/drm/i915/gt/intel_engine_cs.c > > +++ b/drivers/gpu/drm/i915/gt/intel_engine_cs.c > > @@ -28,13 +28,13 @@ > > > > #include "i915_drv.h" > > > > -#include "gt/intel_gt.h" > > - > > +#include "intel_context.h" > > #include "intel_engine.h" > > #include "intel_engine_pm.h" > > #include "intel_engine_pool.h" > > #include "intel_engine_user.h" > > -#include "intel_context.h" > > +#include "intel_gt.h" > > +#include "intel_gt_requests.h" > > #include "intel_lrc.h" > > #include "intel_reset.h" > > #include "intel_ring.h" > > @@ -617,6 +617,7 @@ static int intel_engine_setup_common(struct intel_engine_cs *engine) > > intel_engine_init_execlists(engine); > > intel_engine_init_cmd_parser(engine); > > intel_engine_init__pm(engine); > > + intel_engine_init_retire(engine); > > > > intel_engine_pool_init(&engine->pool); > > > > @@ -839,6 +840,7 @@ void intel_engine_cleanup_common(struct intel_engine_cs *engine) > > > > cleanup_status_page(engine); > > > > + intel_engine_fini_retire(engine); > > intel_engine_pool_fini(&engine->pool); > > intel_engine_fini_breadcrumbs(engine); > > intel_engine_cleanup_cmd_parser(engine); > > diff --git a/drivers/gpu/drm/i915/gt/intel_engine_types.h b/drivers/gpu/drm/i915/gt/intel_engine_types.h > > index 758f0e8ec672..17f1f1441efc 100644 > > --- a/drivers/gpu/drm/i915/gt/intel_engine_types.h > > +++ b/drivers/gpu/drm/i915/gt/intel_engine_types.h > > @@ -451,6 +451,14 @@ struct intel_engine_cs { > > > > struct intel_engine_execlists execlists; > > > > + /* > > + * Keep track of completed timelines on this engine for early > > + * retirement with the goal of quickly enabling powersaving as > > + * soon as the engine is idle. > > + */ > > + struct intel_timeline *retire; > > + struct work_struct retire_work; > > + > > /* status_notifier: list of callbacks for context-switch changes */ > > struct atomic_notifier_head context_status_notifier; > > > > diff --git a/drivers/gpu/drm/i915/gt/intel_gt_requests.c b/drivers/gpu/drm/i915/gt/intel_gt_requests.c > > index 4dc3cbeb1b36..4a98fefdf915 100644 > > --- a/drivers/gpu/drm/i915/gt/intel_gt_requests.c > > +++ b/drivers/gpu/drm/i915/gt/intel_gt_requests.c > > @@ -29,6 +29,80 @@ static void flush_submission(struct intel_gt *gt) > > intel_engine_flush_submission(engine); > > } > > > > +static void engine_retire(struct work_struct *work) > > +{ > > + struct intel_engine_cs *engine = > > + container_of(work, typeof(*engine), retire_work); > > + struct intel_timeline *tl = xchg(&engine->retire, NULL); > > Shouldn't this be atomic_xchg to avoid racing with add_retire? > > > + > > + do { > > + struct intel_timeline *next = xchg(&tl->retire, NULL); > > Here as well? xchg() are always locked. atomic_xchg() operates on atomic_t; xchg() works on any variable, like cmpxchg(). > > + > > + /* > > + * Our goal here is to retire _idle_ timelines as soon as > > + * possible (as they are idle, we do not expect userspace > > + * to be cleaning up anytime soon). > > + * > > + * If the tl->active_count is already zero, someone else > > + * should have retired the timeline. Equally if the timeline > > + * is currently locked, either it is being retired elsewhere > > + * or about to be! > > + */ > > + if (atomic_read(&tl->active_count) && > > + mutex_trylock(&tl->mutex)) { > > + retire_requests(tl); > > + mutex_unlock(&tl->mutex); > > + } > > + intel_timeline_put(tl); > > + > > + GEM_BUG_ON(!next); > > + tl = ptr_mask_bits(next, 1); > > You sometimes expect engine->retire to contain 0x1? Yes, imagine that we are submitting very fast such that we schedule_out the same context before the worker ran, we would then try to add_retire() the same timeline again. So I was using BIT(0) to tag an active element in the retirement list. > > + } while (tl); > > +} > > + > > +static bool add_retire(struct intel_engine_cs *engine, > > + struct intel_timeline *tl) > > +{ > > + struct intel_timeline *first = READ_ONCE(engine->retire); > > + > > + /* > > + * We open-code a llist here to include the additional tag [BIT(0)] > > + * so that we know when the timeline is already on a > > + * retirement queue: either this engine or another. > > + * > > + * However, we rely on that a timeline can only be active on a single > > + * engine at any one time and that add_retire() is called before the > > + * engine releases the timeline and transferred to another to retire. > > + */ > > + > > + if (READ_ONCE(tl->retire)) /* already queued */ > > + return false; > > Can't this go first in the function? Conceptually it is. And I made it so because I also decided against having the READ_ONCE() at the top. > > + > > + intel_timeline_get(tl); > > + do > > + tl->retire = ptr_pack_bits(first, 1, 1); > > Here you rely on assignment being atomic right? Ish. Here we rely on the timeline being owned by the engine so it cannot be submitted by another (and so schedule_out called) until this engine has released it. It is a weak point for generality, but the ordering is strong in execlists. > > + while (!try_cmpxchg(&engine->retire, &first, tl)); > > So the loop is effectively creating a chain of timelines to retire on > this engine. > > What happens with virtual engines when a timeline goes to different > engine before (well or any single timeline context) the retire worker > runs? Ah okay, it gets re-assigned to the most recent engine. Right. The engine_retire() doesn't care which engine the timelines were run on, it's just a list of suspected idle timelines. > I am not sure about the BIT(0) business. It's always set on write so I > am not getting why it is useful. It's also set to 0 on consumption :) -Chris _______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/intel-gfx