[AMD Official Use Only] Besides, I think our current KMD have three types of kernel sdma jobs: 1) adev->mman.entity, it is already a KERNEL priority entity 2) vm->immediate 3) vm->delay Do you mean now vm->immediate or delay are used as moving jobs instead of mman.entity ? Thanks ------------------------------------------ Monk Liu | Cloud-GPU Core team ------------------------------------------ -----Original Message----- From: Liu, Monk Sent: Monday, July 19, 2021 5:40 PM To: 'Christian König' <ckoenig.leichtzumerken@xxxxxxxxx>; Chen, JingWen <JingWen.Chen2@xxxxxxx>; amd-gfx@xxxxxxxxxxxxxxxxxxxxx Cc: Chen, Horace <Horace.Chen@xxxxxxx> Subject: RE: [PATCH] drm/amd/amdgpu: vm entities should have kernel priority [AMD Official Use Only] If there is move jobs clashing there we probably need to fix the bugs of those move jobs Previously I believe you also remember that we agreed to always trust kernel jobs especially paging jobs, Without set paging jobs' priority to KERNEL level how can we keep that protocol ? do you have a better idea? Thanks ------------------------------------------ Monk Liu | Cloud-GPU Core team ------------------------------------------ -----Original Message----- From: Christian König <ckoenig.leichtzumerken@xxxxxxxxx> Sent: Monday, July 19, 2021 4:25 PM To: Chen, JingWen <JingWen.Chen2@xxxxxxx>; amd-gfx@xxxxxxxxxxxxxxxxxxxxx Cc: Chen, Horace <Horace.Chen@xxxxxxx>; Liu, Monk <Monk.Liu@xxxxxxx> Subject: Re: [PATCH] drm/amd/amdgpu: vm entities should have kernel priority Am 19.07.21 um 07:57 schrieb Jingwen Chen: > [Why] > Current vm_pte entities have NORMAL priority, in SRIOV multi-vf use > case, the vf flr happens first and then job time out is found. > There can be several jobs timeout during a very small time slice. > And if the innocent sdma job time out is found before the real bad > job, then the innocent sdma job will be set to guilty as it only has > NORMAL priority. This will lead to a page fault after resubmitting > job. > > [How] > sdma should always have KERNEL priority. The kernel job will always be > resubmitted. I'm not sure if that is a good idea. We intentionally didn't gave the page table updates kernel priority to avoid clashing with the move jobs. Christian. > > Signed-off-by: Jingwen Chen <Jingwen.Chen2@xxxxxxx> > --- > drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c > b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c > index 358316d6a38c..f7526b67cc5d 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c > @@ -2923,13 +2923,13 @@ int amdgpu_vm_init(struct amdgpu_device *adev, struct amdgpu_vm *vm) > INIT_LIST_HEAD(&vm->done); > > /* create scheduler entities for page table updates */ > - r = drm_sched_entity_init(&vm->immediate, DRM_SCHED_PRIORITY_NORMAL, > + r = drm_sched_entity_init(&vm->immediate, DRM_SCHED_PRIORITY_KERNEL, > adev->vm_manager.vm_pte_scheds, > adev->vm_manager.vm_pte_num_scheds, NULL); > if (r) > return r; > > - r = drm_sched_entity_init(&vm->delayed, DRM_SCHED_PRIORITY_NORMAL, > + r = drm_sched_entity_init(&vm->delayed, DRM_SCHED_PRIORITY_KERNEL, > adev->vm_manager.vm_pte_scheds, > adev->vm_manager.vm_pte_num_scheds, NULL); > if (r) _______________________________________________ amd-gfx mailing list amd-gfx@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/amd-gfx