Hw_init -> psp_hw_init -> psp_hw_start (failed) -> ras_pre_fini -> hw_fini -> psp_hw_fini On psp_hw_start failure, it destroys the cmd/fence/fw buffers. But the ras/xgmi deinit was delayed to psp_hw_fini. In ras_pre_fini, it calls the ras APIs to disable previously enabled ras features. And that needs the cmd/fence buffers and trigger a NULL pointer dereference. Maybe "defer the cmd/fence/fw buffers destroys to psp_hw_fini also" is a more proper way.. Regards, Evan > -----Original Message----- > From: Christian König <ckoenig.leichtzumerken@xxxxxxxxx> > Sent: 2019年3月15日 16:40 > To: Quan, Evan <Evan.Quan@xxxxxxx>; amd-gfx@xxxxxxxxxxxxxxxxxxxxx > Subject: Re: [PATCH] drm/amdgpu: error handling for xgmi and ras > > Am 15.03.19 um 09:33 schrieb Evan Quan: > > This is a quick workaround. A more complete error handling around > > psp_hw_start is actually needed. > > > > Change-Id: I398efd652584e022debf237950207199a4ea78fc > > Signed-off-by: Evan Quan <evan.quan@xxxxxxx> > > --- > > drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c | 9 +++++++++ > > 1 file changed, 9 insertions(+) > > > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c > > b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c > > index 5888e24219d9..f8d712d306f1 100644 > > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c > > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c > > @@ -941,6 +941,11 @@ static int psp_load_fw(struct amdgpu_device > *adev) > > amdgpu_bo_free_kernel(&psp->fw_pri_bo, > > &psp->fw_pri_mc_addr, &psp->fw_pri_buf); > > failed: > > + if (psp->ras.ras_initialized) > > + psp->ras.ras_initialized = 0; > > + if (psp->ras.ras_initialized) > > + psp->ras.ras_initialized = 0; > > No idea what that code does, but it looks really odd. Just copy&pasted typo > maybe? > > A simple "psp->ras.ras_initialized = 0;" in the error path should have the > same effect. > > Christian. > > > + > > kfree(psp->cmd); > > psp->cmd = NULL; > > return ret; > > @@ -1061,6 +1066,10 @@ static int psp_resume(void *handle) > > > > failed: > > DRM_ERROR("PSP resume failed\n"); > > + if (psp->ras.ras_initialized) > > + psp->ras.ras_initialized = 0; > > + if (psp->ras.ras_initialized) > > + psp->ras.ras_initialized = 0; > > mutex_unlock(&adev->firmware.mutex); > > return ret; > > } _______________________________________________ amd-gfx mailing list amd-gfx@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/amd-gfx