Hi, I spotted that between commits 70d201a40823 and 052d534373b7 my GPU begins randomly hanging when I open the GNOME shell activity screen. I found a good reproducing script. - Launch Elden Ring game - Continue game (game world should be loaded) - Press start (windows) button Here GPU hanged with 99% probability, if GPU not hanged that press start button several times for ensure. And founded bad commit is looking so: f7fe64ad0f22ff034f8ebcfbd7299ee9cc9b57d7 is the first bad commit commit f7fe64ad0f22ff034f8ebcfbd7299ee9cc9b57d7 Author: Matthew Brost <matthew.brost@xxxxxxxxx> Date: Mon Oct 30 20:24:37 2023 -0700 drm/sched: Split free_job into own work item Rather than call free_job and run_job in same work item have a dedicated work item for each. This aligns with the design and intended use of work queues. v2: - Test for DMA_FENCE_FLAG_TIMESTAMP_BIT before setting timestamp in free_job() work item (Danilo) v3: - Drop forward dec of drm_sched_select_entity (Boris) - Return in drm_sched_run_job_work if entity NULL (Boris) v4: - Replace dequeue with peek and invert logic (Luben) - Wrap to 100 lines (Luben) - Update comments for *_queue / *_queue_if_ready functions (Luben) v5: - Drop peek argument, blindly reinit idle (Luben) - s/drm_sched_free_job_queue_if_ready/drm_sched_free_job_queue_if_done (Luben) - Update work_run_job & work_free_job kernel doc (Luben) v6: - Do not move drm_sched_select_entity in file (Luben) Signed-off-by: Matthew Brost <matthew.brost@xxxxxxxxx> Link: https://lore.kernel.org/r/20231031032439.1558703-4-matthew.brost@xxxxxxxxx Reviewed-by: Luben Tuikov <ltuikov89@xxxxxxxxx> Signed-off-by: Luben Tuikov <ltuikov89@xxxxxxxxx> drivers/gpu/drm/scheduler/sched_main.c | 146 ++++++++++++++++++++++----------- include/drm/gpu_scheduler.h | 4 +- 2 files changed, 101 insertions(+), 49 deletions(-) Unfortunately GPU hangs still occurs even on 6.8-rc1 so why I wrote here bug report. GPU: Radeon 7900XTX CPU: Ryzen 7950X Full hardware specs are here: https://linux-hardware.org/?probe=9e5edb123e Also I attach full bisect logs and kernel logs from each bisect step in archives. Who could dig into it, please? -- Best Regards, Mike Gavrilov.
<<attachment: bisect-GPU-hang-issue-log.zip>>
<<attachment: kernel-logs.zip>>