[AMD Official Use Only - General] > -----Original Message----- > From: Lazar, Lijo <Lijo.Lazar@xxxxxxx> > Sent: Monday, February 19, 2024 8:40 PM > To: Zhou1, Tao <Tao.Zhou1@xxxxxxx>; amd-gfx@xxxxxxxxxxxxxxxxxxxxx > Subject: Re: [PATCH 5/5] drm/amdgpu: skip GFX FED error in page fault handling > > > > On 2/19/2024 1:45 PM, Tao Zhou wrote: > > Let kfd interrupt handler process it. > > > > Signed-off-by: Tao Zhou <tao.zhou1@xxxxxxx> > > --- > > drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c | 10 +++++++++- > > 1 file changed, 9 insertions(+), 1 deletion(-) > > > > diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c > > b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c > > index 773725a92cf1..70defc394b7b 100644 > > --- a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c > > +++ b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c > > @@ -552,7 +552,7 @@ static int gmc_v9_0_process_interrupt(struct > > amdgpu_device *adev, { > > bool retry_fault = !!(entry->src_data[1] & 0x80); > > bool write_fault = !!(entry->src_data[1] & 0x20); > > - uint32_t status = 0, cid = 0, rw = 0; > > + uint32_t status = 0, cid = 0, rw = 0, fed = 0; > > struct amdgpu_task_info task_info; > > struct amdgpu_vmhub *hub; > > const char *mmhub_cid; > > @@ -663,6 +663,14 @@ static int gmc_v9_0_process_interrupt(struct > amdgpu_device *adev, > > status = RREG32(hub->vm_l2_pro_fault_status); > > cid = REG_GET_FIELD(status, VM_L2_PROTECTION_FAULT_STATUS, CID); > > rw = REG_GET_FIELD(status, VM_L2_PROTECTION_FAULT_STATUS, RW); > > + fed = REG_GET_FIELD(status, VM_L2_PROTECTION_FAULT_STATUS, > FED); > > + > > + /* for gfx fed error, kfd will handle it, return directly */ > > + if (fed && amdgpu_ras_is_poison_mode_supported(adev) && > > + amdgpu_ip_version(adev, GC_HWIP, 0) >= IP_VERSION(9, 4, 2) && > > + !strcmp(hub_name, "gfxhub0")) > > + return 1; > > amdgpu_irq_dispatch() gives the impression that return value of 1 is treated as > handled, hence won't be passed to kfd. The commit description says it is intended > to pass to kfd for handling. [Tao] good catch, it should return 0 here, will update it in v2, thanks. > > Also, FED status check may be moved up so that it's not misunderstood as a > regular page fault with the extra prints coming to dmesg log. > Otherwise, poison status also needs to be added to dmesg. [Tao] there is poison consumption dmesg log in kfd interrupt handler, no neeed to add extra print here. My intention is to skip " WREG32_P(hub->vm_l2_pro_fault_cntl, 1, ~1)", moving up the check will make the change a little bit more and I think the page fault log is acceptable. > > Thanks, > Lijo > > > + > > WREG32_P(hub->vm_l2_pro_fault_cntl, 1, ~1); #ifdef > > HAVE_STRUCT_XARRAY > > amdgpu_vm_update_fault_cache(adev, entry->pasid, addr, status, > vmhub);