On Wed, Jul 12, 2017, at 12:37, Felix Kuehling wrote: > On 17-07-12 11:59 AM, Alex Deucher wrote: > > On Wed, Jul 12, 2017 at 1:40 AM, Felix Kuehling <felix.kuehling at amd.com> wrote: > >> Any comments? > >> > >> I believe this is a nice stability improvement. In case of VM faults > >> they don't take down the whole GPU with an interrupt storm. With KFD we > >> can recover without a GPU reset in many cases just by unmapping the > >> offending process' queues. > > Will this cause any problems with enabling recoverable page faults > > later? If not, > > Acked-by: Alex Deucher <alexander.deucher at amd.com> > > Like John said, this will need to be backed out when we enable > recoverable page faults. The nice thing on Vega10 is, that it's a > per-VMID setting. That will allow us for example to enable recoverable > page faults for KFD VMIDs for implementing a real HSA memory model, > without affecting the graphics VMIDs. Right, the plan is to re-enable this feature once the interrupt storm has been resolved. There are a few options for this discussed internally but not currently implemented as far as I know. I have a backup plan for implementing recoverable page faults with no-retry XNACK if that doesn't pan out.