Flush the g2h worker explicitly if TLB timeout happens which is observed on LNL and that points to the recent scheduling issue with E-cores on LNL. This is similar to the recent fix: commit e51527233804 ("drm/xe/guc/ct: Flush g2h worker in case of g2h response timeout") and should be removed once there is E core scheduling fix. v2: Add platform check(Himal) v3: Remove gfx platform check as the issue related to cpu platform(John) Use the common WA macro(John) and print when the flush resolves timeout(Matt B) Cc: Badal Nilawar <badal.nilawar@xxxxxxxxx> Cc: Matthew Brost <matthew.brost@xxxxxxxxx> Cc: Matthew Auld <matthew.auld@xxxxxxxxx> Cc: John Harrison <John.C.Harrison@xxxxxxxxx> Cc: Himal Prasad Ghimiray <himal.prasad.ghimiray@xxxxxxxxx> Cc: Lucas De Marchi <lucas.demarchi@xxxxxxxxx> Cc: <stable@xxxxxxxxxxxxxxx> # v6.11+ Link: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/2687 Signed-off-by: Nirmoy Das <nirmoy.das@xxxxxxxxx> --- drivers/gpu/drm/xe/xe_gt_tlb_invalidation.c | 9 +++++++++ 1 file changed, 9 insertions(+) diff --git a/drivers/gpu/drm/xe/xe_gt_tlb_invalidation.c b/drivers/gpu/drm/xe/xe_gt_tlb_invalidation.c index 773de1f08db9..0bdb3ba5220a 100644 --- a/drivers/gpu/drm/xe/xe_gt_tlb_invalidation.c +++ b/drivers/gpu/drm/xe/xe_gt_tlb_invalidation.c @@ -81,6 +81,15 @@ static void xe_gt_tlb_fence_timeout(struct work_struct *work) if (msecs_to_jiffies(since_inval_ms) < tlb_timeout_jiffies(gt)) break; + LNL_FLUSH_WORK(>->uc.guc.ct.g2h_worker); + since_inval_ms = ktime_ms_delta(ktime_get(), + fence->invalidation_time); + if (msecs_to_jiffies(since_inval_ms) < tlb_timeout_jiffies(gt)) { + xe_gt_dbg(gt, "LNL_FLUSH_WORK resolved TLB invalidation fence timeout, seqno=%d recv=%d", + fence->seqno, gt->tlb_invalidation.seqno_recv); + break; + } + trace_xe_gt_tlb_invalidation_fence_timeout(xe, fence); xe_gt_err(gt, "TLB invalidation fence timeout, seqno=%d recv=%d", fence->seqno, gt->tlb_invalidation.seqno_recv); -- 2.46.0