>-----Original Message----- >From: amd-gfx [mailto:amd-gfx-bounces at lists.freedesktop.org] On Behalf >Of Alex Deucher >Sent: Wednesday, July 12, 2017 11:59 AM >To: Kuehling, Felix >Cc: amd-gfx list >Subject: Re: [PATCH 05/12] drm/amdgpu: Send no-retry XNACK for all fault >types > >On Wed, Jul 12, 2017 at 1:40 AM, Felix Kuehling <felix.kuehling at amd.com> >wrote: >> Any comments? >> >> I believe this is a nice stability improvement. In case of VM faults >> they don't take down the whole GPU with an interrupt storm. With KFD >> we can recover without a GPU reset in many cases just by unmapping the >> offending process' queues. > >Will this cause any problems with enabling recoverable page faults later? If >not, >Acked-by: Alex Deucher <alexander.deucher at amd.com> We will need to back this out in order to enable recoverable page faults later, but probably still worth doing in the short term IMO. > >> >> Regards, >> Felix >> >> >> On 17-07-03 05:11 PM, Felix Kuehling wrote: >>> From: Jay Cornwall <Jay.Cornwall at amd.com> >>> >>> A subset of VM fault types currently send retry XNACK to the client. >>> This causes a storm of interrupts from the VM to the host. >>> >>> Until the storm is throttled by other means send no-retry XNACK for >>> all fault types instead. No change in behavior to the client which >>> will stall indefinitely with the current configuration in any case. >>> Improves system stability under GC or MMHUB faults. >>> >>> Signed-off-by: Jay Cornwall <Jay.Cornwall at amd.com> >>> Reviewed-by: Felix Kuehling <Felix.Kuehling at amd.com> >>> --- >>> drivers/gpu/drm/amd/amdgpu/gfxhub_v1_0.c | 3 +++ >>> drivers/gpu/drm/amd/amdgpu/mmhub_v1_0.c | 3 +++ >>> 2 files changed, 6 insertions(+) >>> >>> diff --git a/drivers/gpu/drm/amd/amdgpu/gfxhub_v1_0.c >>> b/drivers/gpu/drm/amd/amdgpu/gfxhub_v1_0.c >>> index a42f483..f957b18 100644 >>> --- a/drivers/gpu/drm/amd/amdgpu/gfxhub_v1_0.c >>> +++ b/drivers/gpu/drm/amd/amdgpu/gfxhub_v1_0.c >>> @@ -206,6 +206,9 @@ static void gfxhub_v1_0_setup_vmid_config(struct >amdgpu_device *adev) >>> tmp = REG_SET_FIELD(tmp, VM_CONTEXT1_CNTL, >>> PAGE_TABLE_BLOCK_SIZE, >>> adev->vm_manager.block_size - 9); >>> + /* Send no-retry XNACK on fault to suppress VM fault storm. */ >>> + tmp = REG_SET_FIELD(tmp, VM_CONTEXT1_CNTL, >>> + >>> + RETRY_PERMISSION_OR_INVALID_PAGE_FAULT, 0); >>> WREG32_SOC15_OFFSET(GC, 0, mmVM_CONTEXT1_CNTL, i, tmp); >>> WREG32_SOC15_OFFSET(GC, 0, >mmVM_CONTEXT1_PAGE_TABLE_START_ADDR_LO32, i*2, 0); >>> WREG32_SOC15_OFFSET(GC, 0, >>> mmVM_CONTEXT1_PAGE_TABLE_START_ADDR_HI32, i*2, 0); diff --git >>> a/drivers/gpu/drm/amd/amdgpu/mmhub_v1_0.c >>> b/drivers/gpu/drm/amd/amdgpu/mmhub_v1_0.c >>> index 01918dc..b760018 100644 >>> --- a/drivers/gpu/drm/amd/amdgpu/mmhub_v1_0.c >>> +++ b/drivers/gpu/drm/amd/amdgpu/mmhub_v1_0.c >>> @@ -222,6 +222,9 @@ static void mmhub_v1_0_setup_vmid_config(struct >amdgpu_device *adev) >>> tmp = REG_SET_FIELD(tmp, VM_CONTEXT1_CNTL, >>> PAGE_TABLE_BLOCK_SIZE, >>> adev->vm_manager.block_size - 9); >>> + /* Send no-retry XNACK on fault to suppress VM fault storm. */ >>> + tmp = REG_SET_FIELD(tmp, VM_CONTEXT1_CNTL, >>> + >>> + RETRY_PERMISSION_OR_INVALID_PAGE_FAULT, 0); >>> WREG32_SOC15_OFFSET(MMHUB, 0, mmVM_CONTEXT1_CNTL, i, >tmp); >>> WREG32_SOC15_OFFSET(MMHUB, 0, >mmVM_CONTEXT1_PAGE_TABLE_START_ADDR_LO32, i*2, 0); >>> WREG32_SOC15_OFFSET(MMHUB, 0, >>> mmVM_CONTEXT1_PAGE_TABLE_START_ADDR_HI32, i*2, 0); >> >> _______________________________________________ >> amd-gfx mailing list >> amd-gfx at lists.freedesktop.org >> https://lists.freedesktop.org/mailman/listinfo/amd-gfx >_______________________________________________ >amd-gfx mailing list >amd-gfx at lists.freedesktop.org >https://lists.freedesktop.org/mailman/listinfo/amd-gfx