My current theory is that masks interrupt delivery to the local CPU during a critical phase. Purely papering over the symptoms with a delay plucked out of thin air from testing on tgl1-gem. Signed-off-by: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx> Cc: Mika Kuoppala <mika.kuoppala@xxxxxxxxxxxxxxx> Cc: Andi Shyti <andi.shyti@xxxxxxxxx> --- drivers/gpu/drm/i915/gt/intel_lrc.c | 15 +++++++++++++++ 1 file changed, 15 insertions(+) diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c b/drivers/gpu/drm/i915/gt/intel_lrc.c index fa385218ce92..fe8f4625f04f 100644 --- a/drivers/gpu/drm/i915/gt/intel_lrc.c +++ b/drivers/gpu/drm/i915/gt/intel_lrc.c @@ -1186,6 +1186,21 @@ static void execlists_submit_ports(struct intel_engine_cs *engine) /* we need to manually load the submit queue */ if (execlists->ctrl_reg) writel(EL_CTRL_LOAD, execlists->ctrl_reg); + + /* + * Now this is evil magic. + * + * Adding the same udelay() to process_csb before we clear + * execlists->pending (that is after we receive the HW ack for this + * submit and before we can submit again) does not relieve the symptoms + * (machine lockup). So is the active difference here the wait under + * the irq-off spinlock? That gives more credance to the theory that + * the issue is interrupt delivery. Also note that we still rely on + * disabling RPS, again that seems like an issue with simultaneous + * GT interrupts being delivered to the same CPU. + */ + if (IS_TIGERLAKE(engine->i915)) + udelay(250); } static bool ctx_single_port_submission(const struct intel_context *ce) -- 2.23.0 _______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/intel-gfx