This is a note to let you know that I've just added the patch titled drm/xe: Enlarge the invalidation timeout from 150 to 500 to the 6.11-stable tree which can be found at: http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=summary The filename of the patch is: drm-xe-enlarge-the-invalidation-timeout-from-150-to-.patch and it can be found in the queue-6.11 subdirectory. If you, or anyone else, feels it should not be added to the stable tree, please let <stable@xxxxxxxxxxxxxxx> know about it. commit 7a6e2ac974af4db9a277c0a479c9cbb4d79645dd Author: Shuicheng Lin <shuicheng.lin@xxxxxxxxx> Date: Tue Oct 15 16:12:07 2024 +0000 drm/xe: Enlarge the invalidation timeout from 150 to 500 [ Upstream commit c8fb95e7a54315460b45090f0968167a332e1657 ] There are error messages like below that are occurring during stress testing: "[ 31.004009] xe 0000:03:00.0: [drm] ERROR GT0: Global invalidation timeout". Previously it was hitting this 3 out of 1000 executions of warm reboot. After raising it to 500, 1000 warm reboot executions passed and it didn't fail. Due to the way xe_mmio_wait32() is implemented, the timeout is able to expire early when the register matches the expected value due to the wait increments starting small. So, the larger timeout value should have no effect during normal use cases. v2 (Jonathan): - rework the commit message v3 (Lucas): - add conclusive message for the fail rate and test case v4: - add suggested-by Suggested-by: Jia Yao <jia.yao@xxxxxxxxx> Signed-off-by: Shuicheng Lin <shuicheng.lin@xxxxxxxxx> Cc: Lucas De Marchi <lucas.demarchi@xxxxxxxxx> Cc: Matthew Auld <matthew.auld@xxxxxxxxx> Cc: Nirmoy Das <nirmoy.das@xxxxxxxxx> Reviewed-by: Jonathan Cavitt <jonathan.cavitt@xxxxxxxxx> Tested-by: Zongyao Bai <zongyao.bai@xxxxxxxxx> Reviewed-by: Nirmoy Das <nirmoy.das@xxxxxxxxx> Signed-off-by: Matthew Auld <matthew.auld@xxxxxxxxx> Link: https://patchwork.freedesktop.org/patch/msgid/20241015161207.1373401-1-shuicheng.lin@xxxxxxxxx (cherry picked from commit 2eb460ab9f4bc5b575f52568d17936da0af681d8) [ Fix conflict with gt->mmio ] Signed-off-by: Lucas De Marchi <lucas.demarchi@xxxxxxxxx> Signed-off-by: Sasha Levin <sashal@xxxxxxxxxx> diff --git a/drivers/gpu/drm/xe/xe_device.c b/drivers/gpu/drm/xe/xe_device.c index fb394189d9e23..13213f39e52f9 100644 --- a/drivers/gpu/drm/xe/xe_device.c +++ b/drivers/gpu/drm/xe/xe_device.c @@ -870,7 +870,7 @@ void xe_device_l2_flush(struct xe_device *xe) spin_lock(>->global_invl_lock); xe_mmio_write32(gt, XE2_GLOBAL_INVAL, 0x1); - if (xe_mmio_wait32(gt, XE2_GLOBAL_INVAL, 0x1, 0x0, 150, NULL, true)) + if (xe_mmio_wait32(gt, XE2_GLOBAL_INVAL, 0x1, 0x0, 500, NULL, true)) xe_gt_err_once(gt, "Global invalidation timeout\n"); spin_unlock(>->global_invl_lock);