[PATCH 05/12] drm/amdgpu: Send no-retry XNACK for all fault types

alexdeucher@xxxxxxxxx (Alex Deucher) · Wed, 12 Jul 2017 13:40:52 -0400

On Wed, Jul 12, 2017 at 1:37 PM, Felix Kuehling <felix.kuehling at amd.com> wrote:
> On 17-07-12 11:59 AM, Alex Deucher wrote:
>> On Wed, Jul 12, 2017 at 1:40 AM, Felix Kuehling <felix.kuehling at amd.com> wrote:
>>> Any comments?
>>>
>>> I believe this is a nice stability improvement. In case of VM faults
>>> they don't take down the whole GPU with an interrupt storm. With KFD we
>>> can recover without a GPU reset in many cases just by unmapping the
>>> offending process' queues.
>> Will this cause any problems with enabling recoverable page faults
>> later?  If not,
>> Acked-by: Alex Deucher <alexander.deucher at amd.com>
>
> Like John said, this will need to be backed out when we enable
> recoverable page faults. The nice thing on Vega10 is, that it's a
> per-VMID setting. That will allow us for example to enable recoverable
> page faults for KFD VMIDs for implementing a real HSA memory model,
> without affecting the graphics VMIDs.
>
> Still OK to add your Acked-by?

Yes, go ahead.

Alex

>
> Regards,
>   Felix
>
>>
>>> Regards,
>>>   Felix
>>>
>>>
>>> On 17-07-03 05:11 PM, Felix Kuehling wrote:
>>>> From: Jay Cornwall <Jay.Cornwall at amd.com>
>>>>
>>>> A subset of VM fault types currently send retry XNACK to the client.
>>>> This causes a storm of interrupts from the VM to the host.
>>>>
>>>> Until the storm is throttled by other means send no-retry XNACK for
>>>> all fault types instead. No change in behavior to the client which
>>>> will stall indefinitely with the current configuration in any case.
>>>> Improves system stability under GC or MMHUB faults.
>>>>
>>>> Signed-off-by: Jay Cornwall <Jay.Cornwall at amd.com>
>>>> Reviewed-by: Felix Kuehling <Felix.Kuehling at amd.com>
>>>> ---
>>>>  drivers/gpu/drm/amd/amdgpu/gfxhub_v1_0.c | 3 +++
>>>>  drivers/gpu/drm/amd/amdgpu/mmhub_v1_0.c  | 3 +++
>>>>  2 files changed, 6 insertions(+)
>>>>
>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/gfxhub_v1_0.c b/drivers/gpu/drm/amd/amdgpu/gfxhub_v1_0.c
>>>> index a42f483..f957b18 100644
>>>> --- a/drivers/gpu/drm/amd/amdgpu/gfxhub_v1_0.c
>>>> +++ b/drivers/gpu/drm/amd/amdgpu/gfxhub_v1_0.c
>>>> @@ -206,6 +206,9 @@ static void gfxhub_v1_0_setup_vmid_config(struct amdgpu_device *adev)
>>>>               tmp = REG_SET_FIELD(tmp, VM_CONTEXT1_CNTL,
>>>>                               PAGE_TABLE_BLOCK_SIZE,
>>>>                               adev->vm_manager.block_size - 9);
>>>> +             /* Send no-retry XNACK on fault to suppress VM fault storm. */
>>>> +             tmp = REG_SET_FIELD(tmp, VM_CONTEXT1_CNTL,
>>>> +                                 RETRY_PERMISSION_OR_INVALID_PAGE_FAULT, 0);
>>>>               WREG32_SOC15_OFFSET(GC, 0, mmVM_CONTEXT1_CNTL, i, tmp);
>>>>               WREG32_SOC15_OFFSET(GC, 0, mmVM_CONTEXT1_PAGE_TABLE_START_ADDR_LO32, i*2, 0);
>>>>               WREG32_SOC15_OFFSET(GC, 0, mmVM_CONTEXT1_PAGE_TABLE_START_ADDR_HI32, i*2, 0);
>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/mmhub_v1_0.c b/drivers/gpu/drm/amd/amdgpu/mmhub_v1_0.c
>>>> index 01918dc..b760018 100644
>>>> --- a/drivers/gpu/drm/amd/amdgpu/mmhub_v1_0.c
>>>> +++ b/drivers/gpu/drm/amd/amdgpu/mmhub_v1_0.c
>>>> @@ -222,6 +222,9 @@ static void mmhub_v1_0_setup_vmid_config(struct amdgpu_device *adev)
>>>>               tmp = REG_SET_FIELD(tmp, VM_CONTEXT1_CNTL,
>>>>                               PAGE_TABLE_BLOCK_SIZE,
>>>>                               adev->vm_manager.block_size - 9);
>>>> +             /* Send no-retry XNACK on fault to suppress VM fault storm. */
>>>> +             tmp = REG_SET_FIELD(tmp, VM_CONTEXT1_CNTL,
>>>> +                                 RETRY_PERMISSION_OR_INVALID_PAGE_FAULT, 0);
>>>>               WREG32_SOC15_OFFSET(MMHUB, 0, mmVM_CONTEXT1_CNTL, i, tmp);
>>>>               WREG32_SOC15_OFFSET(MMHUB, 0, mmVM_CONTEXT1_PAGE_TABLE_START_ADDR_LO32, i*2, 0);
>>>>               WREG32_SOC15_OFFSET(MMHUB, 0, mmVM_CONTEXT1_PAGE_TABLE_START_ADDR_HI32, i*2, 0);
>>> _______________________________________________
>>> amd-gfx mailing list
>>> amd-gfx at lists.freedesktop.org
>>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
>