Any comments? I believe this is a nice stability improvement. In case of VM faults they don't take down the whole GPU with an interrupt storm. With KFD we can recover without a GPU reset in many cases just by unmapping the offending process' queues. Regards, Felix On 17-07-03 05:11 PM, Felix Kuehling wrote: > From: Jay Cornwall <Jay.Cornwall at amd.com> > > A subset of VM fault types currently send retry XNACK to the client. > This causes a storm of interrupts from the VM to the host. > > Until the storm is throttled by other means send no-retry XNACK for > all fault types instead. No change in behavior to the client which > will stall indefinitely with the current configuration in any case. > Improves system stability under GC or MMHUB faults. > > Signed-off-by: Jay Cornwall <Jay.Cornwall at amd.com> > Reviewed-by: Felix Kuehling <Felix.Kuehling at amd.com> > --- > drivers/gpu/drm/amd/amdgpu/gfxhub_v1_0.c | 3 +++ > drivers/gpu/drm/amd/amdgpu/mmhub_v1_0.c | 3 +++ > 2 files changed, 6 insertions(+) > > diff --git a/drivers/gpu/drm/amd/amdgpu/gfxhub_v1_0.c b/drivers/gpu/drm/amd/amdgpu/gfxhub_v1_0.c > index a42f483..f957b18 100644 > --- a/drivers/gpu/drm/amd/amdgpu/gfxhub_v1_0.c > +++ b/drivers/gpu/drm/amd/amdgpu/gfxhub_v1_0.c > @@ -206,6 +206,9 @@ static void gfxhub_v1_0_setup_vmid_config(struct amdgpu_device *adev) > tmp = REG_SET_FIELD(tmp, VM_CONTEXT1_CNTL, > PAGE_TABLE_BLOCK_SIZE, > adev->vm_manager.block_size - 9); > + /* Send no-retry XNACK on fault to suppress VM fault storm. */ > + tmp = REG_SET_FIELD(tmp, VM_CONTEXT1_CNTL, > + RETRY_PERMISSION_OR_INVALID_PAGE_FAULT, 0); > WREG32_SOC15_OFFSET(GC, 0, mmVM_CONTEXT1_CNTL, i, tmp); > WREG32_SOC15_OFFSET(GC, 0, mmVM_CONTEXT1_PAGE_TABLE_START_ADDR_LO32, i*2, 0); > WREG32_SOC15_OFFSET(GC, 0, mmVM_CONTEXT1_PAGE_TABLE_START_ADDR_HI32, i*2, 0); > diff --git a/drivers/gpu/drm/amd/amdgpu/mmhub_v1_0.c b/drivers/gpu/drm/amd/amdgpu/mmhub_v1_0.c > index 01918dc..b760018 100644 > --- a/drivers/gpu/drm/amd/amdgpu/mmhub_v1_0.c > +++ b/drivers/gpu/drm/amd/amdgpu/mmhub_v1_0.c > @@ -222,6 +222,9 @@ static void mmhub_v1_0_setup_vmid_config(struct amdgpu_device *adev) > tmp = REG_SET_FIELD(tmp, VM_CONTEXT1_CNTL, > PAGE_TABLE_BLOCK_SIZE, > adev->vm_manager.block_size - 9); > + /* Send no-retry XNACK on fault to suppress VM fault storm. */ > + tmp = REG_SET_FIELD(tmp, VM_CONTEXT1_CNTL, > + RETRY_PERMISSION_OR_INVALID_PAGE_FAULT, 0); > WREG32_SOC15_OFFSET(MMHUB, 0, mmVM_CONTEXT1_CNTL, i, tmp); > WREG32_SOC15_OFFSET(MMHUB, 0, mmVM_CONTEXT1_PAGE_TABLE_START_ADDR_LO32, i*2, 0); > WREG32_SOC15_OFFSET(MMHUB, 0, mmVM_CONTEXT1_PAGE_TABLE_START_ADDR_HI32, i*2, 0);