After removing the context lock by patch e68efb27647f21 ("drm/amdgpu: remove ctx->lock"), we see BO list corruption as documented in the bug of the link below. While reverting removal of the context lock does fix the issue, a more comprehensive approach was suggested, which this patch implements. I'm currently running with this kernel and it works fine, however running the IGT's amd_cs_nop test, I see a hang in the 4th sub-test, "sync-gfx0". Previously I've seen it get stuck in the 6th sub-test, "fork-gfx0". The hang is generally as follows: [<0>] ttm_eu_reserve_buffers+0xe7/0x2c0 [ttm] [<0>] amdgpu_gem_va_ioctl+0x31c/0x540 [amdgpu] [<0>] drm_ioctl_kernel+0x8c/0x120 [drm] [<0>] drm_ioctl+0x220/0x3e0 [drm] [<0>] amdgpu_drm_ioctl+0x49/0x80 [amdgpu] [<0>] __x64_sys_ioctl+0x82/0xb0 [<0>] do_syscall_64+0x3b/0x90 [<0>] entry_SYSCALL_64_after_hwframe+0x44/0xae Generally, something like ttm_eu_reserve_buffers() --> ttm_bo_reserve() --> ... --> dma_resv_lock() --> ww_mutex_lock(). However, while normally using the system, I don't observe such hangs--only when running the IGT amd_cs_nop test. Luben Tuikov (1): drm/amdgpu: Protect the amdgpu_bo_list list with a mutex drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.c | 3 +- drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.h | 4 +++ drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 31 +++++++++++++++++++-- 3 files changed, 35 insertions(+), 3 deletions(-) Suggested-by: Christian König <christian.koenig@xxxxxxx> Cc: Alex Deucher <Alexander.Deucher@xxxxxxx> Cc: Andrey Grodzovsky <Andrey.Grodzovsky@xxxxxxx> Cc: Vitaly Prosyak <Vitaly.Prosyak@xxxxxxx> Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2048 Signed-off-by: Luben Tuikov <luben.tuikov@xxxxxxx> base-commit: ab7e60938be74e21c723223e7eb96cac7b441e5e -- 2.36.1.74.g277cf0bc36