[AMD Official Use Only - Internal Distribution Only] Consider this sequence: 1. GPU reset begin 2. device reset count + 1 3. job id 1 scheduled with vm_need_flush=false 4. When handling this job in vm_flush, amdgpu_vmid_had_gpu_reset will return true, thus this job will be flush and the vmid_reset_count will be set equals to device_reset_count 5. stop drm scheduler 6. GPU do real reset 7. resubmit job id 1 with vm_need_flush=false 8. Job id 1 is the first job to resubmit after reset. This time amdgpu_vmid_had_gpu_reset returns false and the vm_need_flush==false Then no one will flush pd_addr and vmid for jobs after resubmit. In this sequence amdgpu_vmid_had_gpu_reset doesn't work. Best Regards, JingWen Chen -----Original Message----- From: Koenig, Christian <Christian.Koenig@xxxxxxx> Sent: Thursday, February 25, 2021 3:46 PM To: Chen, JingWen <JingWen.Chen2@xxxxxxx>; amd-gfx@xxxxxxxxxxxxxxxxxxxxx Cc: Liu, Monk <Monk.Liu@xxxxxxx> Subject: Re: [PATCH 2/2] drm/amd/amdgpu: force flush resubmit job Am 25.02.21 um 06:27 schrieb Jingwen Chen: > [Why] > when a job is scheduled during TDR(after device reset count increase > and before drm_sched_stop), this job won't do vm_flush when resubmit > itself after GPU reset done. This can lead to a page fault. > > [How] > Always do vm_flush for resubmit job. NAK, what do you think amdgpu_vmid_had_gpu_reset already does? Christian. > > Signed-off-by: Jingwen Chen <Jingwen.Chen2@xxxxxxx> > --- > drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c > b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c > index fdbe7d4e8b8b..4af2c5d15950 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c > @@ -1088,7 +1088,8 @@ int amdgpu_vm_flush(struct amdgpu_ring *ring, struct amdgpu_job *job, > if (update_spm_vmid_needed && adev->gfx.rlc.funcs->update_spm_vmid) > adev->gfx.rlc.funcs->update_spm_vmid(adev, job->vmid); > > -if (amdgpu_vmid_had_gpu_reset(adev, id)) { > +if (amdgpu_vmid_had_gpu_reset(adev, id)|| > +(job->base.flags & DRM_FLAG_RESUBMIT_JOB)) { > gds_switch_needed = true; > vm_flush_needed = true; > pasid_mapping_needed = true; _______________________________________________ amd-gfx mailing list amd-gfx@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/amd-gfx