[PATCH 05/12] drm/amdgpu: Send no-retry XNACK for all fault types

jay@xxxxxxxxxxxxx (Jay Cornwall) · Wed, 12 Jul 2017 12:43:49 -0500

On Wed, Jul 12, 2017, at 12:37, Felix Kuehling wrote:
> On 17-07-12 11:59 AM, Alex Deucher wrote:
> > On Wed, Jul 12, 2017 at 1:40 AM, Felix Kuehling <felix.kuehling at amd.com> wrote:
> >> Any comments?
> >>
> >> I believe this is a nice stability improvement. In case of VM faults
> >> they don't take down the whole GPU with an interrupt storm. With KFD we
> >> can recover without a GPU reset in many cases just by unmapping the
> >> offending process' queues.
> > Will this cause any problems with enabling recoverable page faults
> > later?  If not,
> > Acked-by: Alex Deucher <alexander.deucher at amd.com>
> 
> Like John said, this will need to be backed out when we enable
> recoverable page faults. The nice thing on Vega10 is, that it's a
> per-VMID setting. That will allow us for example to enable recoverable
> page faults for KFD VMIDs for implementing a real HSA memory model,
> without affecting the graphics VMIDs.

Right, the plan is to re-enable this feature once the interrupt storm
has been resolved. There are a few options for this discussed internally
but not currently implemented as far as I know.

I have a backup plan for implementing recoverable page faults with
no-retry XNACK if that doesn't pan out.