During GuC reset prepare, interrupt is disabled, if the interrupt event already happens and is in progress, from interrupt event to tasklet get running, there is alway some kind of latency. In long latency case, it might have 2 rare race conditions: 1. Tasklet runs after IRQ flush, add request to queue after worker flush started, causes unexpected G2H message request processing, while reset prepare code already get context destroyed. Request handler will report error about bad context state. 2. Tasklet runs after intel_guc_submission_reset_prepare, ct_try_receive_message start to run, while intel_uc_reset_prepare already finished guc sanitize and set ct->enable to false. This will causes warning on incorrect ct->enable state. Fixed by disable ct receive tasklet during reset preparation to avoid the above race condition. Signed-off-by: Zhanjun Dong <zhanjun.dong@xxxxxxxxx> --- drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c | 11 ++++++++--- 1 file changed, 8 insertions(+), 3 deletions(-) diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c index 9ede6f240d79..f82fec33c432 100644 --- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c +++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c @@ -1684,15 +1684,20 @@ void intel_guc_submission_reset_prepare(struct intel_guc *guc) guc->interrupts.disable(guc); __reset_guc_busyness_stats(guc); - /* Flush IRQ handler */ - spin_lock_irq(guc_to_gt(guc)->irq_lock); - spin_unlock_irq(guc_to_gt(guc)->irq_lock); + /* + * Disable tasklet until end of prepare, if tasklet is active, + * tasklet_disable will wait until it finished + */ + tasklet_disable(&guc->ct.receive_tasklet); guc_flush_submissions(guc); guc_flush_destroyed_contexts(guc); flush_work(&guc->ct.requests.worker); scrub_guc_desc_for_outstanding_g2h(guc); + + /* Enable tasklet at the end, before HW reset */ + tasklet_enable(&guc->ct.receive_tasklet); } static struct intel_engine_cs * -- 2.34.1