Re: [PATCH] drm/amdgpu/gfx10: re-init clear state buffer after gpu reset

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Got it.
Thanks Hawking and Chris for your attention.

BR,
Xiaojie

________________________________________
From: Zhang, Hawking <Hawking.Zhang@xxxxxxx>
Sent: Wednesday, November 20, 2019 5:04 PM
To: Yuan, Xiaojie; Koenig, Christian; amd-gfx@xxxxxxxxxxxxxxxxxxxxx
Cc: Long, Gang; Xiao, Jack; Ma, Le
Subject: RE: [PATCH] drm/amdgpu/gfx10: re-init clear state buffer after gpu reset

I'm okay with the re-initialize and it's more straightforward approach.

Regards,
Hawking
-----Original Message-----
From: Yuan, Xiaojie <Xiaojie.Yuan@xxxxxxx>
Sent: 2019年11月20日 17:00
To: Zhang, Hawking <Hawking.Zhang@xxxxxxx>; Koenig, Christian <Christian.Koenig@xxxxxxx>; amd-gfx@xxxxxxxxxxxxxxxxxxxxx
Cc: Long, Gang <Gang.Long@xxxxxxx>; Xiao, Jack <Jack.Xiao@xxxxxxx>
Subject: Re: [PATCH] drm/amdgpu/gfx10: re-init clear state buffer after gpu reset

I can try this approach as well.
This csb is similar to the kiq mqd which are allocated in vram and might corrupt after baco reset.

BR,
Xiaojie

________________________________________
From: Zhang, Hawking <Hawking.Zhang@xxxxxxx>
Sent: Wednesday, November 20, 2019 4:54 PM
To: Koenig, Christian; Yuan, Xiaojie; amd-gfx@xxxxxxxxxxxxxxxxxxxxx
Cc: Long, Gang; Xiao, Jack
Subject: RE: [PATCH] drm/amdgpu/gfx10: re-init clear state buffer after gpu reset

I think we should evict the bo and then move it back.

Regards,
Hawking

-----Original Message-----
From: Christian König <ckoenig.leichtzumerken@xxxxxxxxx>
Sent: 2019年11月20日 16:47
To: Zhang, Hawking <Hawking.Zhang@xxxxxxx>; Yuan, Xiaojie <Xiaojie.Yuan@xxxxxxx>; amd-gfx@xxxxxxxxxxxxxxxxxxxxx
Cc: Long, Gang <Gang.Long@xxxxxxx>; Xiao, Jack <Jack.Xiao@xxxxxxx>
Subject: Re: [PATCH] drm/amdgpu/gfx10: re-init clear state buffer after gpu reset

A baco reset also resets the MC, doesn't it? n this case it would be expected that the content of VRAM is corrupted.

Christian.

Am 20.11.19 um 09:45 schrieb Zhang, Hawking:
> Or in another word, we are still not clear when the corruption actually happens, right?
>
> Regards,
> Hawking
> -----Original Message-----
> From: amd-gfx <amd-gfx-bounces@xxxxxxxxxxxxxxxxxxxxx> On Behalf Of
> Zhang, Hawking
> Sent: 2019年11月20日 16:44
> To: Yuan, Xiaojie <Xiaojie.Yuan@xxxxxxx>;
> amd-gfx@xxxxxxxxxxxxxxxxxxxxx
> Cc: Long, Gang <Gang.Long@xxxxxxx>; Xiao, Jack <Jack.Xiao@xxxxxxx>
> Subject: RE: [PATCH] drm/amdgpu/gfx10: re-init clear state buffer
> after gpu reset
>
> Just make sure I understand you correctly. So until fw team root cause the reason of csb corruption, we keep the workaround in driver, correct?
>
> Regards,
> Hawking
> -----Original Message-----
> From: Yuan, Xiaojie <Xiaojie.Yuan@xxxxxxx>
> Sent: 2019年11月20日 14:47
> To: amd-gfx@xxxxxxxxxxxxxxxxxxxxx
> Cc: Zhang, Hawking <Hawking.Zhang@xxxxxxx>; Xiao, Jack
> <Jack.Xiao@xxxxxxx>; Long, Gang <Gang.Long@xxxxxxx>; Yuan, Xiaojie
> <Xiaojie.Yuan@xxxxxxx>
> Subject: [PATCH] drm/amdgpu/gfx10: re-init clear state buffer after
> gpu reset
>
> This patch fixes 2nd baco reset failure with gfxoff enabled on navi1x.
>
> clear state buffer (resides in vram) is corrupted after 1st baco reset, upon gfxoff exit, CPF gets garbage header in CSIB and hangs.
>
> Signed-off-by: Xiaojie Yuan <xiaojie.yuan@xxxxxxx>
> ---
>   drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c | 43 ++++++++++++++++++++++----
>   1 file changed, 37 insertions(+), 6 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
> b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
> index 9274bd4b6c68..8e24ea08ca39 100644
> --- a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
> +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c
> @@ -1789,27 +1789,52 @@ static void gfx_v10_0_enable_gui_idle_interrupt(struct amdgpu_device *adev,
>       WREG32_SOC15(GC, 0, mmCP_INT_CNTL_RING0, tmp);  }
>
> -static void gfx_v10_0_init_csb(struct amdgpu_device *adev)
> +static int gfx_v10_0_init_csb(struct amdgpu_device *adev)
>   {
> +     int r;
> +
> +     if (adev->in_gpu_reset) {
> +             r = amdgpu_bo_reserve(adev->gfx.rlc.clear_state_obj, false);
> +             if (r)
> +                     return r;
> +
> +             r = amdgpu_bo_kmap(adev->gfx.rlc.clear_state_obj,
> +                                (void **)&adev->gfx.rlc.cs_ptr);
> +             if (!r) {
> +                     adev->gfx.rlc.funcs->get_csb_buffer(adev,
> +                                     adev->gfx.rlc.cs_ptr);
> +                     amdgpu_bo_kunmap(adev->gfx.rlc.clear_state_obj);
> +             }
> +
> +             amdgpu_bo_unreserve(adev->gfx.rlc.clear_state_obj);
> +             if (r)
> +                     return r;
> +     }
> +
>       /* csib */
>       WREG32_SOC15(GC, 0, mmRLC_CSIB_ADDR_HI,
>                    adev->gfx.rlc.clear_state_gpu_addr >> 32);
>       WREG32_SOC15(GC, 0, mmRLC_CSIB_ADDR_LO,
>                    adev->gfx.rlc.clear_state_gpu_addr & 0xfffffffc);
>       WREG32_SOC15(GC, 0, mmRLC_CSIB_LENGTH,
> adev->gfx.rlc.clear_state_size);
> +
> +     return 0;
>   }
>
> -static void gfx_v10_0_init_pg(struct amdgpu_device *adev)
> +static int gfx_v10_0_init_pg(struct amdgpu_device *adev)
>   {
>       int i;
> +     int r;
>
> -     gfx_v10_0_init_csb(adev);
> +     r = gfx_v10_0_init_csb(adev);
> +     if (r)
> +             return r;
>
>       for (i = 0; i < adev->num_vmhubs; i++)
>               amdgpu_gmc_flush_gpu_tlb(adev, 0, i, 0);
>
>       /* TODO: init power gating */
> -     return;
> +     return 0;
>   }
>
>   void gfx_v10_0_rlc_stop(struct amdgpu_device *adev) @@ -1911,7 +1936,10 @@ static int gfx_v10_0_rlc_resume(struct amdgpu_device *adev)
>               r = gfx_v10_0_wait_for_rlc_autoload_complete(adev);
>               if (r)
>                       return r;
> -             gfx_v10_0_init_pg(adev);
> +
> +             r = gfx_v10_0_init_pg(adev);
> +             if (r)
> +                     return r;
>
>               /* enable RLC SRM */
>               gfx_v10_0_rlc_enable_srm(adev); @@ -1937,7 +1965,10 @@
> static int gfx_v10_0_rlc_resume(struct amdgpu_device *adev)
>                               return r;
>               }
>
> -             gfx_v10_0_init_pg(adev);
> +             r = gfx_v10_0_init_pg(adev);
> +             if (r)
> +                     return r;
> +
>               adev->gfx.rlc.funcs->start(adev);
>
>               if (adev->firmware.load_type ==
> AMDGPU_FW_LOAD_RLC_BACKDOOR_AUTO) {
> --
> 2.20.1
>
> _______________________________________________
> amd-gfx mailing list
> amd-gfx@xxxxxxxxxxxxxxxxxxxxx
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
> _______________________________________________
> amd-gfx mailing list
> amd-gfx@xxxxxxxxxxxxxxxxxxxxx
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx

_______________________________________________
amd-gfx mailing list
amd-gfx@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/amd-gfx




[Index of Archives]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux