Quoting Tvrtko Ursulin (2020-05-27 10:36:31) > > On 27/05/2020 10:20, Chris Wilson wrote: > > Quoting Tvrtko Ursulin (2020-05-27 09:53:22) > >> From: Tvrtko Ursulin <tvrtko.ursulin@xxxxxxxxx> > >> > >> Our generic mechanism for specifying aligned request start, > >> I915_EXEC_FENCE_SUBMIT, has a bit too relaxed rules when it comes to the > >> actual use case of media scalability on Tigerlake. > >> > >> While the submit fence can provide the aligned start, the actual media > >> pipeline expects that execution remains in lockstep from then onwards. > >> Otherwise the hardware cannot handle well one of the pair getting > >> preempted leading to GPU hangs and corrupted media playback. > >> > >> This puts us in a difficult position where the only visible solution, > >> which does not involve adding new uapi, is to give more meaning to the > >> generic submit fence. At least when used between the video engines. > >> > >> The special semantics this patch applies in that case are two fold. First > >> is that both of the aligned pairs will be marked as non-preemptable and > >> second is ensuring both are not sharing ELSP ports with any other context. > >> > >> Non-preemptable property will ensure that once the start has been aligned > >> the requests remain executing until completion. > >> > >> Single port ELSP property will ensure there are no possible inversions > >> between different independent pairs of aligned requests. > > > > Nak. > > > >> Going forward we can think of introducing new uapi to request this > >> behaviour as a separate, more explicit flag, and at that point retire this > >> semi-generic special handling. > > > > As CI will say, such behaviour will already need a new flag. Forcing > > no-preemption should be a privileged flag, so I would expect some > > acknowledge that this is a HW problem that we have to workaround. > > I don't know how hw exactly works so from my side it's only empirical. > > I agree new flag is needed but we also need a fix ASAP for TGL. And I > don't think we can add new uapi in a reasonable time frame here. We > would need the ctx set_parallel extension with a dont-preempt flag and > multi-batch execbuf. And a lot of work in media-driver which will not be > ready for TGL. I don't think a flag at the I915_EXEC_FENCE_SUBMIT level > is a solution. I'm going to say the starting point is to remove the I915_ENGINE_HAS_PREEMPTION flag from vcs*. That way the tests that require preemption will skip. Sadly that's the level of our current API. > >> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@xxxxxxxxx> > >> Suggested-by: Xiaogang Li <xiaogang.li@xxxxxxxxx> > >> Cc: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx> > >> --- > >> drivers/gpu/drm/i915/gt/intel_context.h | 2 +- > >> drivers/gpu/drm/i915/gt/intel_lrc.c | 46 +++++++++++++++++++++---- > >> 2 files changed, 41 insertions(+), 7 deletions(-) > >> > >> diff --git a/drivers/gpu/drm/i915/gt/intel_context.h b/drivers/gpu/drm/i915/gt/intel_context.h > >> index 07be021882cc..576d11c0cdd9 100644 > >> --- a/drivers/gpu/drm/i915/gt/intel_context.h > >> +++ b/drivers/gpu/drm/i915/gt/intel_context.h > >> @@ -212,7 +212,7 @@ intel_context_force_single_submission(const struct intel_context *ce) > >> static inline void > >> intel_context_set_single_submission(struct intel_context *ce) > >> { > >> - __set_bit(CONTEXT_FORCE_SINGLE_SUBMISSION, &ce->flags); > >> + set_bit(CONTEXT_FORCE_SINGLE_SUBMISSION, &ce->flags); > >> } > >> > >> static inline bool > >> diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c b/drivers/gpu/drm/i915/gt/intel_lrc.c > >> index 3214a4ecc31a..88cf20fd92c8 100644 > >> --- a/drivers/gpu/drm/i915/gt/intel_lrc.c > >> +++ b/drivers/gpu/drm/i915/gt/intel_lrc.c > >> @@ -1734,8 +1734,7 @@ static void execlists_submit_ports(struct intel_engine_cs *engine) > >> > >> static bool ctx_single_port_submission(const struct intel_context *ce) > >> { > >> - return (IS_ENABLED(CONFIG_DRM_I915_GVT) && > >> - intel_context_force_single_submission(ce)); > >> + return intel_context_force_single_submission(ce); > >> } > >> > >> static bool can_merge_ctx(const struct intel_context *prev, > >> @@ -4929,9 +4928,41 @@ static void execlists_park(struct intel_engine_cs *engine) > >> cancel_timer(&engine->execlists.preempt); > >> } > >> > >> +static void > >> +mark_bonded_pair(struct i915_request *rq, struct i915_request *signal) > >> +{ > >> + /* > >> + * Give (temporary) special meaning to a pair requests with requested > >> + * aligned start along the video engines. > >> + * > >> + * They should be non-preemptable and have all ELSP ports to themselves > >> + * to avoid any deadlocks caused by inversions. > > > > This sentence needs expanding, because you are implying that this is an > > issue we need to address in the code. If there is a deadlock resulting > > from incorrect submission ordering, that is a bug in the code. > > If we add no-preempt I think we have to have single elsp because there > is no ordering between unrelated pairs and then elsp inversion certainly > sounds like a deadlock. You don't have a deadlock between unrelated pairs :) > > ctx-A: vcs0 + vcs1 > ctx-B: vcs0 + vcs1 > > There is no ordering between A and B but we'd have to pick same port for > both pairs of A and B respectively. I don't see that we can do that > since all four submissions are async. > > Regards, > > Tvrtko > _______________________________________________ > Intel-gfx mailing list > Intel-gfx@xxxxxxxxxxxxxxxxxxxxx > https://lists.freedesktop.org/mailman/listinfo/intel-gfx > _______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/intel-gfx