If another thread accesses the gpu while the GPU is being reset, the reset could fail. This is especially problematic on SRIOV since host may reset the GPU even if guest is not yet ready. There are code in place that tries to prevent stray access, but over time bugs have crept in making it not reliable. This series hopes to address these bugs. v4: From testing, it seem that removing the flush from gart enable sometimes causes the gart to not be flushed at all. So dropping drm/amd/amdgpu: remove unnecessary flush when enable gart and replace with this patch instead drm/amdgpu: call flush_gpu_tlb directly in gfxhub enable Splitting drm/amdgpu: fix missing reset domain locks into multiple commits drm/amdgpu: add lock in amdgpu_gart_invalidate_tlb drm/amdgpu: add lock in kfd_process_dequeue_from_device v3: dropped: drm/amdgpu: abort fence poll if reset is started Revert "drm/amdgpu: Queue KFD reset workitem in VF FED" updated: drm/amdgpu: fix sriov host flr handler drm/amdgpu: fix missing reset domain locks Yunxiang Li (9): drm/amdgpu: add skip_hw_access checks for sriov drm/amdgpu: fix sriov host flr handler drm/amdgpu/kfd: remove is_hws_hang and is_resetting drm/amdgpu: remove tlb flush in amdgpu_gtt_mgr_recover drm/amdgpu: use helper in amdgpu_gart_unbind drm/amdgpu: call flush_gpu_tlb directly in gfxhub enable drm/amdgpu: fix locking scope when flushing tlb drm/amdgpu: add lock in amdgpu_gart_invalidate_tlb drm/amdgpu: add lock in kfd_process_dequeue_from_device drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 2 + drivers/gpu/drm/amd/amdgpu/amdgpu_gart.c | 11 +-- drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c | 70 ++++++++-------- drivers/gpu/drm/amd/amdgpu/amdgpu_gtt_mgr.c | 2 - drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c | 23 ++++++ drivers/gpu/drm/amd/amdgpu/amdgpu_virt.h | 2 + drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c | 2 +- drivers/gpu/drm/amd/amdgpu/gfx_v12_0.c | 2 +- drivers/gpu/drm/amd/amdgpu/mxgpu_ai.c | 39 ++++----- drivers/gpu/drm/amd/amdgpu/mxgpu_nv.c | 39 ++++----- drivers/gpu/drm/amd/amdgpu/mxgpu_vi.c | 6 -- drivers/gpu/drm/amd/amdkfd/kfd_device.c | 1 - .../drm/amd/amdkfd/kfd_device_queue_manager.c | 79 ++++++++----------- .../drm/amd/amdkfd/kfd_device_queue_manager.h | 1 - drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue.c | 11 ++- .../gpu/drm/amd/amdkfd/kfd_packet_manager.c | 4 +- drivers/gpu/drm/amd/amdkfd/kfd_priv.h | 4 +- .../amd/amdkfd/kfd_process_queue_manager.c | 13 ++- 18 files changed, 154 insertions(+), 157 deletions(-) -- 2.34.1