I can try this approach as well. This csb is similar to the kiq mqd which are allocated in vram and might corrupt after baco reset. BR, Xiaojie ________________________________________ From: Zhang, Hawking <Hawking.Zhang@xxxxxxx> Sent: Wednesday, November 20, 2019 4:54 PM To: Koenig, Christian; Yuan, Xiaojie; amd-gfx@xxxxxxxxxxxxxxxxxxxxx Cc: Long, Gang; Xiao, Jack Subject: RE: [PATCH] drm/amdgpu/gfx10: re-init clear state buffer after gpu reset I think we should evict the bo and then move it back. Regards, Hawking -----Original Message----- From: Christian König <ckoenig.leichtzumerken@xxxxxxxxx> Sent: 2019年11月20日 16:47 To: Zhang, Hawking <Hawking.Zhang@xxxxxxx>; Yuan, Xiaojie <Xiaojie.Yuan@xxxxxxx>; amd-gfx@xxxxxxxxxxxxxxxxxxxxx Cc: Long, Gang <Gang.Long@xxxxxxx>; Xiao, Jack <Jack.Xiao@xxxxxxx> Subject: Re: [PATCH] drm/amdgpu/gfx10: re-init clear state buffer after gpu reset A baco reset also resets the MC, doesn't it? n this case it would be expected that the content of VRAM is corrupted. Christian. Am 20.11.19 um 09:45 schrieb Zhang, Hawking: > Or in another word, we are still not clear when the corruption actually happens, right? > > Regards, > Hawking > -----Original Message----- > From: amd-gfx <amd-gfx-bounces@xxxxxxxxxxxxxxxxxxxxx> On Behalf Of > Zhang, Hawking > Sent: 2019年11月20日 16:44 > To: Yuan, Xiaojie <Xiaojie.Yuan@xxxxxxx>; > amd-gfx@xxxxxxxxxxxxxxxxxxxxx > Cc: Long, Gang <Gang.Long@xxxxxxx>; Xiao, Jack <Jack.Xiao@xxxxxxx> > Subject: RE: [PATCH] drm/amdgpu/gfx10: re-init clear state buffer > after gpu reset > > Just make sure I understand you correctly. So until fw team root cause the reason of csb corruption, we keep the workaround in driver, correct? > > Regards, > Hawking > -----Original Message----- > From: Yuan, Xiaojie <Xiaojie.Yuan@xxxxxxx> > Sent: 2019年11月20日 14:47 > To: amd-gfx@xxxxxxxxxxxxxxxxxxxxx > Cc: Zhang, Hawking <Hawking.Zhang@xxxxxxx>; Xiao, Jack > <Jack.Xiao@xxxxxxx>; Long, Gang <Gang.Long@xxxxxxx>; Yuan, Xiaojie > <Xiaojie.Yuan@xxxxxxx> > Subject: [PATCH] drm/amdgpu/gfx10: re-init clear state buffer after > gpu reset > > This patch fixes 2nd baco reset failure with gfxoff enabled on navi1x. > > clear state buffer (resides in vram) is corrupted after 1st baco reset, upon gfxoff exit, CPF gets garbage header in CSIB and hangs. > > Signed-off-by: Xiaojie Yuan <xiaojie.yuan@xxxxxxx> > --- > drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c | 43 ++++++++++++++++++++++---- > 1 file changed, 37 insertions(+), 6 deletions(-) > > diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c > b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c > index 9274bd4b6c68..8e24ea08ca39 100644 > --- a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c > +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c > @@ -1789,27 +1789,52 @@ static void gfx_v10_0_enable_gui_idle_interrupt(struct amdgpu_device *adev, > WREG32_SOC15(GC, 0, mmCP_INT_CNTL_RING0, tmp); } > > -static void gfx_v10_0_init_csb(struct amdgpu_device *adev) > +static int gfx_v10_0_init_csb(struct amdgpu_device *adev) > { > + int r; > + > + if (adev->in_gpu_reset) { > + r = amdgpu_bo_reserve(adev->gfx.rlc.clear_state_obj, false); > + if (r) > + return r; > + > + r = amdgpu_bo_kmap(adev->gfx.rlc.clear_state_obj, > + (void **)&adev->gfx.rlc.cs_ptr); > + if (!r) { > + adev->gfx.rlc.funcs->get_csb_buffer(adev, > + adev->gfx.rlc.cs_ptr); > + amdgpu_bo_kunmap(adev->gfx.rlc.clear_state_obj); > + } > + > + amdgpu_bo_unreserve(adev->gfx.rlc.clear_state_obj); > + if (r) > + return r; > + } > + > /* csib */ > WREG32_SOC15(GC, 0, mmRLC_CSIB_ADDR_HI, > adev->gfx.rlc.clear_state_gpu_addr >> 32); > WREG32_SOC15(GC, 0, mmRLC_CSIB_ADDR_LO, > adev->gfx.rlc.clear_state_gpu_addr & 0xfffffffc); > WREG32_SOC15(GC, 0, mmRLC_CSIB_LENGTH, > adev->gfx.rlc.clear_state_size); > + > + return 0; > } > > -static void gfx_v10_0_init_pg(struct amdgpu_device *adev) > +static int gfx_v10_0_init_pg(struct amdgpu_device *adev) > { > int i; > + int r; > > - gfx_v10_0_init_csb(adev); > + r = gfx_v10_0_init_csb(adev); > + if (r) > + return r; > > for (i = 0; i < adev->num_vmhubs; i++) > amdgpu_gmc_flush_gpu_tlb(adev, 0, i, 0); > > /* TODO: init power gating */ > - return; > + return 0; > } > > void gfx_v10_0_rlc_stop(struct amdgpu_device *adev) @@ -1911,7 +1936,10 @@ static int gfx_v10_0_rlc_resume(struct amdgpu_device *adev) > r = gfx_v10_0_wait_for_rlc_autoload_complete(adev); > if (r) > return r; > - gfx_v10_0_init_pg(adev); > + > + r = gfx_v10_0_init_pg(adev); > + if (r) > + return r; > > /* enable RLC SRM */ > gfx_v10_0_rlc_enable_srm(adev); > @@ -1937,7 +1965,10 @@ static int gfx_v10_0_rlc_resume(struct amdgpu_device *adev) > return r; > } > > - gfx_v10_0_init_pg(adev); > + r = gfx_v10_0_init_pg(adev); > + if (r) > + return r; > + > adev->gfx.rlc.funcs->start(adev); > > if (adev->firmware.load_type == AMDGPU_FW_LOAD_RLC_BACKDOOR_AUTO) > { > -- > 2.20.1 > > _______________________________________________ > amd-gfx mailing list > amd-gfx@xxxxxxxxxxxxxxxxxxxxx > https://lists.freedesktop.org/mailman/listinfo/amd-gfx > _______________________________________________ > amd-gfx mailing list > amd-gfx@xxxxxxxxxxxxxxxxxxxxx > https://lists.freedesktop.org/mailman/listinfo/amd-gfx _______________________________________________ amd-gfx mailing list amd-gfx@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/amd-gfx