On Wed, 2017-09-27 at 17:44 +0100, Chris Wilson wrote: > When we write to ELSP, it triggers a context preemption at the earliest > arbitration point (3DPRIMITIVE, some PIPECONTROLs, a few other > operations and the explicit MI_ARB_CHECK). If this is to the same > context, it triggers a LITE_RESTORE where the RING_TAIL is merely > updated (used currently to chain requests from the same context > together, avoiding bubbles). However, if it is to a different context, a > full context-switch is performed and it will start to execute the new > context saving the image of the old for later execution. > > Previously we avoided preemption by only submitting a new context when > the old was idle. But now we wish embrace it, and if the new request has > a higher priority than the currently executing request, we write to the > ELSP regardless, thus triggering preemption, but we tell the GPU to > switch to our special preemption context (not the target). In the > context-switch interrupt handler, we know that the previous contexts > have finished execution and so can unwind all the incomplete requests > and compute the new highest priority request to execute. > > It would be feasible to avoid the switch-to-idle intermediate by > programming the ELSP with the target context. The difficulty is in > tracking which request that should be whilst maintaining the dependency > change, the error comes in with coalesced requests. As we only track the > most recent request and its priority, we may run into the issue of being > tricked in preempting a high priority request that was followed by a > low priority request from the same context (e.g. for PI); "followed" is bit ambiguous here, depending on how you view the ordering, wall time or ports. > worse still > that earlier request may be our own dependency and the order then broken > by preemption. By injecting the switch-to-idle and then recomputing the > priority queue, we avoid the issue with tracking in-flight coalesced > requests. Having tried the preempt-to-busy approach, and failed to find > a way around the coalesced priority issue, Michal's original proposal to > inject an idle context (based on handling GuC preemption) succeeds. > > The current heuristic for deciding when to preempt are only if the new > request is of higher priority, and has the privileged priority of > greater than 0. Note that the scheduler remains unfair! > > v2: Disable for gen8 (bdw/bsw) as we need additional w/a for GPGPU. > Since, the feature is now conditional and not always available when we > have a scheduler, make it known via the HAS_SCHEDULER GETPARAM (now a > capability mask). > > Suggested-by: Michal Winiarski <michal.winiarski@xxxxxxxxx> > Signed-off-by: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx> > Cc: Michal Winiarski <michal.winiarski@xxxxxxxxx> > Cc: Tvrtko Ursulin <tvrtko.ursulin@xxxxxxxxx> > Cc: Arkadiusz Hiler <arkadiusz.hiler@xxxxxxxxx> > Cc: Mika Kuoppala <mika.kuoppala@xxxxxxxxx> > Cc: Ben Widawsky <benjamin.widawsky@xxxxxxxxx> > Cc: Zhenyu Wang <zhenyuw@xxxxxxxxxxxxxxx> > Cc: Zhi Wang <zhi.a.wang@xxxxxxxxx> <SNIP> > @@ -489,26 +489,44 @@ static void port_assign(struct execlist_port *port, > port_set(port, port_pack(i915_gem_request_get(rq), port_count(port))); > } > > +static void inject_preempt_context(struct intel_engine_cs *engine) > +{ > + struct intel_context *ce = > + &engine->i915->preempt_context->engine[engine->id]; > + u32 __iomem *elsp = > + engine->i915->regs + i915_mmio_reg_offset(RING_ELSP(engine)); engine_elsp() helper or so? > + unsigned int n; > + > + GEM_BUG_ON(engine->i915->preempt_context->hw_id != PREEMPT_ID); I think this could/should be done way earlier? > + > + memset(ce->ring->vaddr + ce->ring->tail, 0, 8); > + ce->ring->tail += 8; > + ce->ring->tail &= (ce->ring->size - 1); > + ce->lrc_reg_state[CTX_RING_TAIL+1] = ce->ring->tail; An awful lot of pre-expectations here, would be shame if somebody documented them. > + > + for (n = execlists_num_ports(&engine->execlists); --n; ) { This is fine detail compared to the other loop, "<=" vs "<" (or maybe even <= -1) would make a more clear distinction, but I'm not arguing. > + writel(0, elsp); > + writel(0, elsp); > + } > + writel(upper_32_bits(ce->lrc_desc), elsp); > + writel(lower_32_bits(ce->lrc_desc), elsp); Could also be elsp_write inline helper. > @@ -696,7 +746,7 @@ static void intel_lrc_irq_handler(unsigned long data) > { > struct intel_engine_cs * const engine = (struct intel_engine_cs *)data; > struct intel_engine_execlists * const execlists = &engine->execlists; > - struct execlist_port *port = execlists->port; > + struct execlist_port * const port = execlists->port; > struct drm_i915_private *dev_priv = engine->i915; > > /* We can skip acquiring intel_runtime_pm_get() here as it was taken > @@ -781,6 +831,23 @@ static void intel_lrc_irq_handler(unsigned long data) > if (!(status & GEN8_CTX_STATUS_COMPLETED_MASK)) > continue; > > + if (status & GEN8_CTX_STATUS_ACTIVE_IDLE && > + buf[2*head + 1] == PREEMPT_ID) { (2 * head + 1), could be a helper again, potentially with a descriptive name so the below comment can be removed and doesn't need to be duplicated :) > +++ b/drivers/gpu/drm/i915/intel_ringbuffer.h > @@ -238,6 +238,8 @@ struct intel_engine_execlists { > #define EXECLIST_MAX_PORTS 2 > } port[EXECLIST_MAX_PORTS]; > > + bool preempt; Definitely want to improve the variable name, "preempting" would be the smallest acceptable change. Combine that with kerneldoc here and fix at memset(), this is Reviewed-by: Joonas Lahtinen <joonas.lahtinen@xxxxxxxxxxxxxxx> Regards, Joonas -- Joonas Lahtinen Open Source Technology Center Intel Corporation _______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/intel-gfx