[Why] Current vm_pte entities have NORMAL priority, in SRIOV multi-vf use case, the vf flr happens first and then job time out is found. There can be several jobs timeout during a very small time slice. And if the innocent sdma job time out is found before the real bad job, then the innocent sdma job will be set to guilty as it only has NORMAL priority. This will lead to a page fault after resubmitting job. [How] sdma should always have KERNEL priority. The kernel job will always be resubmitted. Signed-off-by: Jingwen Chen <Jingwen.Chen2@xxxxxxx> --- drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c index 358316d6a38c..f7526b67cc5d 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c @@ -2923,13 +2923,13 @@ int amdgpu_vm_init(struct amdgpu_device *adev, struct amdgpu_vm *vm) INIT_LIST_HEAD(&vm->done); /* create scheduler entities for page table updates */ - r = drm_sched_entity_init(&vm->immediate, DRM_SCHED_PRIORITY_NORMAL, + r = drm_sched_entity_init(&vm->immediate, DRM_SCHED_PRIORITY_KERNEL, adev->vm_manager.vm_pte_scheds, adev->vm_manager.vm_pte_num_scheds, NULL); if (r) return r; - r = drm_sched_entity_init(&vm->delayed, DRM_SCHED_PRIORITY_NORMAL, + r = drm_sched_entity_init(&vm->delayed, DRM_SCHED_PRIORITY_KERNEL, adev->vm_manager.vm_pte_scheds, adev->vm_manager.vm_pte_num_scheds, NULL); if (r) -- 2.25.1 _______________________________________________ amd-gfx mailing list amd-gfx@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/amd-gfx