On 8/5/19 2:02 AM, Pan, Xinhui wrote: > As the race of gpu reset with ras interrupts. we might lose a chance to > do gpu recovery. To guarantee the gpu has recovered successfully, we use > atomic to save the counts of gpu reset requests, and issue another gpu > reset if there are any pending requests. > > Signed-off-by: xinhui pan <xinhui.pan@xxxxxxx> How this protects against RAS triggered amdgpu_device_gpu_recover being dropped because there was another non RAS recover GPU reset in progress such as due to job timeout? And reiterating a question I already asked before, why do you have to do schedule_work for GPU resets from amdgpu_ras_reset_gpu when you already in non interrupt context for any of the ras_ih_if.cb handlers. I see why you need it in amdgpu_ras_resume but not for the other call sites. Andrey > --- > drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 10 +++++++++- > drivers/gpu/drm/amd/amdgpu/amdgpu_ras.h | 2 +- > 2 files changed, 10 insertions(+), 2 deletions(-) > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c > index a96b0f17c619..c1f444b74b19 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c > @@ -1220,7 +1220,15 @@ static void amdgpu_ras_do_recovery(struct work_struct *work) > container_of(work, struct amdgpu_ras, recovery_work); > > amdgpu_device_gpu_recover(ras->adev, 0); > - atomic_set(&ras->in_recovery, 0); > + /* if there is no competiton, in_recovery changes from 1 to 0. > + * if ras_reset_gpu is called while we are doing gpu recvoery, > + * bacause of the atomic protection, we may lose some recovery > + * requests. > + * So we use atomic_xchg to check the count of requests, and > + * issue another gpu reset request to perform the gpu recovery. > + */ > + if (atomic_xchg(&ras->in_recovery, 0) > 1) > + amdgpu_ras_reset_gpu(ras->adev, 0); > } > > static int amdgpu_ras_release_vram(struct amdgpu_device *adev, > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.h > index 2765f2dbb1e6..ba423a4a3013 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.h > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.h > @@ -498,7 +498,7 @@ static inline int amdgpu_ras_reset_gpu(struct amdgpu_device *adev, > { > struct amdgpu_ras *ras = amdgpu_ras_get_context(adev); > > - if (atomic_cmpxchg(&ras->in_recovery, 0, 1) == 0) > + if (atomic_inc_return(&ras->in_recovery) == 1) > schedule_work(&ras->recovery_work); > return 0; > } _______________________________________________ amd-gfx mailing list amd-gfx@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/amd-gfx