For SR-IOV use case, we call gpu reset under the case we have no choice ... So many places like debug fs shouldn't a good reason to trigger gpu reset You know that gpu reset under SR-IOV will have very big impact on all other VFs ... BR Monk -----Original Message----- From: Christian König [mailto:deathsimple@xxxxxxxxxxx] Sent: Monday, May 08, 2017 5:08 PM To: Liu, Monk <Monk.Liu at amd.com>; amd-gfx at lists.freedesktop.org Subject: Re: [PATCH 1/4] drm/amdgpu:don't invoke srio-gpu-reset in gpu-reset Am 08.05.2017 um 08:51 schrieb Monk Liu: > because we don't want to do sriov-gpu-reset under certain cases, so > just split those two funtion and don't invoke sr-iov one from > bare-metal one. > > Change-Id: I641126c241e2ee2dfd54e6d16c389b159f99cfe0 > Signed-off-by: Monk Liu <Monk.Liu at amd.com> > --- > drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 3 --- > drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c | 3 ++- > drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c | 2 +- > drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c | 3 ++- > drivers/gpu/drm/amd/amdgpu/amdgpu_job.c | 6 +++++- > 5 files changed, 10 insertions(+), 7 deletions(-) > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c > b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c > index 45a60a6..4985a7e 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c > @@ -2652,9 +2652,6 @@ int amdgpu_gpu_reset(struct amdgpu_device *adev) > int resched; > bool need_full_reset; > > - if (amdgpu_sriov_vf(adev)) > - return amdgpu_sriov_gpu_reset(adev, true); > - > if (!amdgpu_check_soft_reset(adev)) { > DRM_INFO("No hardware hang detected. Did some blocks stall?\n"); > return 0; > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c > b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c > index 5772ef2..d7523d1 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c > @@ -651,7 +651,8 @@ static int amdgpu_debugfs_gpu_reset(struct seq_file *m, void *data) > struct amdgpu_device *adev = dev->dev_private; > > seq_printf(m, "gpu reset\n"); > - amdgpu_gpu_reset(adev); > + if (!amdgpu_sriov_vf(adev)) > + amdgpu_gpu_reset(adev); Well that is clearly not a good idea. Why do you want to disable the reset here? > > return 0; > } > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c > b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c > index 67be795..5bcbea0 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c > @@ -221,7 +221,7 @@ void amdgpu_gem_object_close(struct drm_gem_object > *obj, > > static int amdgpu_gem_handle_lockup(struct amdgpu_device *adev, int r) > { > - if (r == -EDEADLK) { > + if (r == -EDEADLK && !amdgpu_sriov_vf(adev)) { Not a problem of your patch, but that stuff is outdated and should have been removed completely years ago. Going to take care of that. > r = amdgpu_gpu_reset(adev); > if (!r) > r = -EAGAIN; > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c > b/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c > index f8a6c95..49c6e6e 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c > @@ -89,7 +89,8 @@ static void amdgpu_irq_reset_work_func(struct work_struct *work) > struct amdgpu_device *adev = container_of(work, struct amdgpu_device, > reset_work); > > - amdgpu_gpu_reset(adev); > + if (!amdgpu_sriov_vf(adev)) > + amdgpu_gpu_reset(adev); Mhm, that disables the reset on an invalid register access or invalid command stream. Is that really what we want? Christian. > } > > /* Disable *all* interrupts */ > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c > b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c > index 690ef3d..c7718af 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c > @@ -36,7 +36,11 @@ static void amdgpu_job_timedout(struct amd_sched_job *s_job) > job->base.sched->name, > atomic_read(&job->ring->fence_drv.last_seq), > job->ring->fence_drv.sync_seq); > - amdgpu_gpu_reset(job->adev); > + > + if (amdgpu_sriov_vf(job->adev)) > + amdgpu_sriov_gpu_reset(job->adev, true); > + else > + amdgpu_gpu_reset(job->adev); > } > > int amdgpu_job_alloc(struct amdgpu_device *adev, unsigned num_ibs,