Re: [PATCH 5/5] drm/amdgpu: skip GFX FED error in page fault handling

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On 2/19/2024 1:45 PM, Tao Zhou wrote:
> Let kfd interrupt handler process it.
> 
> Signed-off-by: Tao Zhou <tao.zhou1@xxxxxxx>
> ---
>  drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c | 10 +++++++++-
>  1 file changed, 9 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
> index 773725a92cf1..70defc394b7b 100644
> --- a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
> +++ b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c
> @@ -552,7 +552,7 @@ static int gmc_v9_0_process_interrupt(struct amdgpu_device *adev,
>  {
>  	bool retry_fault = !!(entry->src_data[1] & 0x80);
>  	bool write_fault = !!(entry->src_data[1] & 0x20);
> -	uint32_t status = 0, cid = 0, rw = 0;
> +	uint32_t status = 0, cid = 0, rw = 0, fed = 0;
>  	struct amdgpu_task_info task_info;
>  	struct amdgpu_vmhub *hub;
>  	const char *mmhub_cid;
> @@ -663,6 +663,14 @@ static int gmc_v9_0_process_interrupt(struct amdgpu_device *adev,
>  	status = RREG32(hub->vm_l2_pro_fault_status);
>  	cid = REG_GET_FIELD(status, VM_L2_PROTECTION_FAULT_STATUS, CID);
>  	rw = REG_GET_FIELD(status, VM_L2_PROTECTION_FAULT_STATUS, RW);
> +	fed = REG_GET_FIELD(status, VM_L2_PROTECTION_FAULT_STATUS, FED);
> +
> +	/* for gfx fed error, kfd will handle it, return directly */
> +	if (fed && amdgpu_ras_is_poison_mode_supported(adev) &&
> +	    amdgpu_ip_version(adev, GC_HWIP, 0) >= IP_VERSION(9, 4, 2) &&
> +	    !strcmp(hub_name, "gfxhub0"))
> +		return 1;

amdgpu_irq_dispatch() gives the impression that return value of 1 is
treated as handled, hence won't be passed to kfd. The commit description
says it is intended to pass to kfd for handling.

Also, FED status check may be moved up so that it's not misunderstood as
a regular page fault with the extra prints coming to dmesg log.
Otherwise, poison status also needs to be added to dmesg.

Thanks,
Lijo

> +
>  	WREG32_P(hub->vm_l2_pro_fault_cntl, 1, ~1);
>  #ifdef HAVE_STRUCT_XARRAY
>  	amdgpu_vm_update_fault_cache(adev, entry->pasid, addr, status, vmhub);



[Index of Archives]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux