[AMD Official Use Only - General] I can see the soft reset can be called from amdgpu_device_gpu_recover and pci error handler, they have called amdgpu_device_lock_reset_domain to acquire a reset lock before soft reset. Regards, Jack -----Original Message----- From: Christian König <ckoenig.leichtzumerken@xxxxxxxxx> Sent: Wednesday, December 20, 2023 10:06 PM To: Xiao, Jack <Jack.Xiao@xxxxxxx>; Alex Deucher <alexdeucher@xxxxxxxxx> Cc: Deucher, Alexander <Alexander.Deucher@xxxxxxx>; amd-gfx@xxxxxxxxxxxxxxxxxxxxx; Zhang, Hawking <Hawking.Zhang@xxxxxxx> Subject: Re: [PATCH] drm/amdgpu/gfx11: need acquire mutex before access CP_VMID_RESET Well not the reset lock, but there should only be a single reset queue which this runs on. Regards, Christian. Am 20.12.23 um 10:49 schrieb Xiao, Jack: > [AMD Official Use Only - General] > > It's already protected by the reset lock. In my understanding, soft reset should not run in parallel. > > Regards, > Jack > > -----Original Message----- > From: Alex Deucher <alexdeucher@xxxxxxxxx> > Sent: Wednesday, December 20, 2023 1:04 AM > To: Xiao, Jack <Jack.Xiao@xxxxxxx> > Cc: amd-gfx@xxxxxxxxxxxxxxxxxxxxx; Deucher, Alexander > <Alexander.Deucher@xxxxxxx>; Zhang, Hawking <Hawking.Zhang@xxxxxxx> > Subject: Re: [PATCH] drm/amdgpu/gfx11: need acquire mutex before > access CP_VMID_RESET > > On Tue, Dec 19, 2023 at 4:30 AM Jack Xiao <Jack.Xiao@xxxxxxx> wrote: >> It's required to take the gfx mutex before access to CP_VMID_RESET, >> for there is a race condition with CP firmware to write the register. >> >> Signed-off-by: Jack Xiao <Jack.Xiao@xxxxxxx> >> --- >> drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c | 20 ++++++++++++++++++++ >> 1 file changed, 20 insertions(+) >> >> diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c >> b/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c >> index bdcf96df69e6..ae3370d34d11 100644 >> --- a/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c >> +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c >> @@ -4518,6 +4518,22 @@ static int gfx_v11_0_soft_reset(void *handle) >> } >> } >> > We probably want a CPU mutex or spinlock to protect this as well unless this is already protected by the reset lock. > > Alex > >> + /* Try to require the gfx mutex before access to CP_VMID_RESET */ >> + for (i = 0; i < adev->usec_timeout; i++) { >> + /* Request with MeId=2, PipeId=0 */ >> + tmp = REG_SET_FIELD(0, CP_GFX_INDEX_MUTEX, REQUEST, 1); >> + tmp = REG_SET_FIELD(tmp, CP_GFX_INDEX_MUTEX, CLIENTID, 4); >> + WREG32_SOC15(GC, 0, regCP_GFX_INDEX_MUTEX, tmp); >> + if (RREG32_SOC15(GC, 0, regCP_GFX_INDEX_MUTEX) == tmp) >> + break; >> + udelay(1); >> + } >> + >> + if (i >= adev->usec_timeout) { >> + printk("Failed to require the gfx mutex during soft reset\n"); >> + return -EINVAL; >> + } >> + >> WREG32_SOC15(GC, 0, regCP_VMID_RESET, 0xfffffffe); >> >> // Read CP_VMID_RESET register three times. >> @@ -4526,6 +4542,10 @@ static int gfx_v11_0_soft_reset(void *handle) >> RREG32_SOC15(GC, 0, regCP_VMID_RESET); >> RREG32_SOC15(GC, 0, regCP_VMID_RESET); >> >> + /* release the gfx mutex */ >> + tmp = REG_SET_FIELD(tmp, CP_GFX_INDEX_MUTEX, REQUEST, 0); >> + WREG32_SOC15(GC, 0, regCP_GFX_INDEX_MUTEX, tmp); >> + >> for (i = 0; i < adev->usec_timeout; i++) { >> if (!RREG32_SOC15(GC, 0, regCP_HQD_ACTIVE) && >> !RREG32_SOC15(GC, 0, regCP_GFX_HQD_ACTIVE)) >> -- >> 2.41.0 >>