Am 19.07.21 um 07:57 schrieb Jingwen Chen:
[Why]
Current vm_pte entities have NORMAL priority, in SRIOV multi-vf
use case, the vf flr happens first and then job time out is found.
There can be several jobs timeout during a very small time slice.
And if the innocent sdma job time out is found before the real bad
job, then the innocent sdma job will be set to guilty as it only
has NORMAL priority. This will lead to a page fault after
resubmitting job.
[How]
sdma should always have KERNEL priority. The kernel job will always
be resubmitted.
I'm not sure if that is a good idea. We intentionally didn't gave the
page table updates kernel priority to avoid clashing with the move jobs.
Christian.
Signed-off-by: Jingwen Chen <Jingwen.Chen2@xxxxxxx>
---
drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
index 358316d6a38c..f7526b67cc5d 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
@@ -2923,13 +2923,13 @@ int amdgpu_vm_init(struct amdgpu_device *adev, struct amdgpu_vm *vm)
INIT_LIST_HEAD(&vm->done);
/* create scheduler entities for page table updates */
- r = drm_sched_entity_init(&vm->immediate, DRM_SCHED_PRIORITY_NORMAL,
+ r = drm_sched_entity_init(&vm->immediate, DRM_SCHED_PRIORITY_KERNEL,
adev->vm_manager.vm_pte_scheds,
adev->vm_manager.vm_pte_num_scheds, NULL);
if (r)
return r;
- r = drm_sched_entity_init(&vm->delayed, DRM_SCHED_PRIORITY_NORMAL,
+ r = drm_sched_entity_init(&vm->delayed, DRM_SCHED_PRIORITY_KERNEL,
adev->vm_manager.vm_pte_scheds,
adev->vm_manager.vm_pte_num_scheds, NULL);
if (r)
_______________________________________________
amd-gfx mailing list
amd-gfx@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/amd-gfx