Re: [PATCH] drm/amdgpu: add UTCL2 RAS poison query for gfx 9.4.3

"Lazar, Lijo" <lijo.lazar@xxxxxxx> · Mon, 19 Feb 2024 09:29:11 +0530

On 2/18/2024 12:26 PM, Tao Zhou wrote:
> Add help function to query and reset RAS UTCL2 poison status.
> 
> Signed-off-by: Tao Zhou <tao.zhou1@xxxxxxx>
> ---
>  drivers/gpu/drm/amd/amdgpu/gfx_v9_4_3.c | 14 ++++++++++++++
>  1 file changed, 14 insertions(+)
> 
> diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v9_4_3.c b/drivers/gpu/drm/amd/amdgpu/gfx_v9_4_3.c
> index aace4594a603..de04006f8db1 100644
> --- a/drivers/gpu/drm/amd/amdgpu/gfx_v9_4_3.c
> +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v9_4_3.c
> @@ -4329,10 +4329,24 @@ static int gfx_v9_4_3_ras_late_init(struct amdgpu_device *adev, struct ras_commo
>  	return r;
>  }
>  
> +static bool gfx_v9_4_3_query_uctl2_poison_status(struct amdgpu_device *adev)
> +{
> +	u32 status = 0;
> +	struct amdgpu_vmhub *hub;
> +
> +	hub = &adev->vmhub[AMDGPU_GFXHUB(0)];

This only take care of the first instance. What about others?

Thanks,
Lijo
> +	status = RREG32(hub->vm_l2_pro_fault_status);
> +	/* reset page fault status */
> +	WREG32_P(hub->vm_l2_pro_fault_cntl, 1, ~1);
> +
> +	return REG_GET_FIELD(status, VM_L2_PROTECTION_FAULT_STATUS, FED);
> +}
> +
>  struct amdgpu_gfx_ras gfx_v9_4_3_ras = {
>  	.ras_block = {
>  		.hw_ops = &gfx_v9_4_3_ras_ops,
>  		.ras_late_init = &gfx_v9_4_3_ras_late_init,
>  	},
>  	.enable_watchdog_timer = &gfx_v9_4_3_enable_watchdog_timer,
> +	.query_utcl2_poison_status = &gfx_v9_4_3_query_uctl2_poison_status,
>  };