On Wed, Jul 12, 2017 at 1:37 PM, Felix Kuehling <felix.kuehling at amd.com> wrote: > On 17-07-12 11:59 AM, Alex Deucher wrote: >> On Wed, Jul 12, 2017 at 1:40 AM, Felix Kuehling <felix.kuehling at amd.com> wrote: >>> Any comments? >>> >>> I believe this is a nice stability improvement. In case of VM faults >>> they don't take down the whole GPU with an interrupt storm. With KFD we >>> can recover without a GPU reset in many cases just by unmapping the >>> offending process' queues. >> Will this cause any problems with enabling recoverable page faults >> later? If not, >> Acked-by: Alex Deucher <alexander.deucher at amd.com> > > Like John said, this will need to be backed out when we enable > recoverable page faults. The nice thing on Vega10 is, that it's a > per-VMID setting. That will allow us for example to enable recoverable > page faults for KFD VMIDs for implementing a real HSA memory model, > without affecting the graphics VMIDs. > > Still OK to add your Acked-by? Yes, go ahead. Alex > > Regards, > Felix > >> >>> Regards, >>> Felix >>> >>> >>> On 17-07-03 05:11 PM, Felix Kuehling wrote: >>>> From: Jay Cornwall <Jay.Cornwall at amd.com> >>>> >>>> A subset of VM fault types currently send retry XNACK to the client. >>>> This causes a storm of interrupts from the VM to the host. >>>> >>>> Until the storm is throttled by other means send no-retry XNACK for >>>> all fault types instead. No change in behavior to the client which >>>> will stall indefinitely with the current configuration in any case. >>>> Improves system stability under GC or MMHUB faults. >>>> >>>> Signed-off-by: Jay Cornwall <Jay.Cornwall at amd.com> >>>> Reviewed-by: Felix Kuehling <Felix.Kuehling at amd.com> >>>> --- >>>> drivers/gpu/drm/amd/amdgpu/gfxhub_v1_0.c | 3 +++ >>>> drivers/gpu/drm/amd/amdgpu/mmhub_v1_0.c | 3 +++ >>>> 2 files changed, 6 insertions(+) >>>> >>>> diff --git a/drivers/gpu/drm/amd/amdgpu/gfxhub_v1_0.c b/drivers/gpu/drm/amd/amdgpu/gfxhub_v1_0.c >>>> index a42f483..f957b18 100644 >>>> --- a/drivers/gpu/drm/amd/amdgpu/gfxhub_v1_0.c >>>> +++ b/drivers/gpu/drm/amd/amdgpu/gfxhub_v1_0.c >>>> @@ -206,6 +206,9 @@ static void gfxhub_v1_0_setup_vmid_config(struct amdgpu_device *adev) >>>> tmp = REG_SET_FIELD(tmp, VM_CONTEXT1_CNTL, >>>> PAGE_TABLE_BLOCK_SIZE, >>>> adev->vm_manager.block_size - 9); >>>> + /* Send no-retry XNACK on fault to suppress VM fault storm. */ >>>> + tmp = REG_SET_FIELD(tmp, VM_CONTEXT1_CNTL, >>>> + RETRY_PERMISSION_OR_INVALID_PAGE_FAULT, 0); >>>> WREG32_SOC15_OFFSET(GC, 0, mmVM_CONTEXT1_CNTL, i, tmp); >>>> WREG32_SOC15_OFFSET(GC, 0, mmVM_CONTEXT1_PAGE_TABLE_START_ADDR_LO32, i*2, 0); >>>> WREG32_SOC15_OFFSET(GC, 0, mmVM_CONTEXT1_PAGE_TABLE_START_ADDR_HI32, i*2, 0); >>>> diff --git a/drivers/gpu/drm/amd/amdgpu/mmhub_v1_0.c b/drivers/gpu/drm/amd/amdgpu/mmhub_v1_0.c >>>> index 01918dc..b760018 100644 >>>> --- a/drivers/gpu/drm/amd/amdgpu/mmhub_v1_0.c >>>> +++ b/drivers/gpu/drm/amd/amdgpu/mmhub_v1_0.c >>>> @@ -222,6 +222,9 @@ static void mmhub_v1_0_setup_vmid_config(struct amdgpu_device *adev) >>>> tmp = REG_SET_FIELD(tmp, VM_CONTEXT1_CNTL, >>>> PAGE_TABLE_BLOCK_SIZE, >>>> adev->vm_manager.block_size - 9); >>>> + /* Send no-retry XNACK on fault to suppress VM fault storm. */ >>>> + tmp = REG_SET_FIELD(tmp, VM_CONTEXT1_CNTL, >>>> + RETRY_PERMISSION_OR_INVALID_PAGE_FAULT, 0); >>>> WREG32_SOC15_OFFSET(MMHUB, 0, mmVM_CONTEXT1_CNTL, i, tmp); >>>> WREG32_SOC15_OFFSET(MMHUB, 0, mmVM_CONTEXT1_PAGE_TABLE_START_ADDR_LO32, i*2, 0); >>>> WREG32_SOC15_OFFSET(MMHUB, 0, mmVM_CONTEXT1_PAGE_TABLE_START_ADDR_HI32, i*2, 0); >>> _______________________________________________ >>> amd-gfx mailing list >>> amd-gfx at lists.freedesktop.org >>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx >