> -----Original Message----- > From: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx> > Sent: Wednesday, August 7, 2019 9:51 AM > To: Bloomfield, Jon <jon.bloomfield@xxxxxxxxx>; intel- > gfx@xxxxxxxxxxxxxxxxxxxxx > Cc: Joonas Lahtinen <joonas.lahtinen@xxxxxxxxxxxxxxx>; Winiarski, Michal > <michal.winiarski@xxxxxxxxx> > Subject: RE: [PATCH 5/5] drm/i915: Cancel non-persistent contexts on close > > Quoting Bloomfield, Jon (2019-08-07 16:29:55) > > > -----Original Message----- > > > From: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx> > > > Sent: Wednesday, August 7, 2019 8:08 AM > > > To: Bloomfield, Jon <jon.bloomfield@xxxxxxxxx>; intel- > > > gfx@xxxxxxxxxxxxxxxxxxxxx > > > Cc: Joonas Lahtinen <joonas.lahtinen@xxxxxxxxxxxxxxx>; Winiarski, Michal > > > <michal.winiarski@xxxxxxxxx> > > > Subject: RE: [PATCH 5/5] drm/i915: Cancel non-persistent contexts on close > > > > > > Quoting Bloomfield, Jon (2019-08-07 15:33:51) > > > [skip to end] > > > > We didn't explore the idea of terminating orphaned contexts though > > > (where none of their resources are referenced by any other contexts). Is > > > there a reason why this is not feasible? In the case of compute (certainly > > > HPC) workloads, there would be no compositor taking the output so this > > > might be a solution. > > > > > > Sounds easier said than done. We have to go through each request and > > > determine it if has an external reference (or if the object holding the > > > reference has an external reference) to see if the output would be > > > visible to a third party. Sounds like a conservative GC :| > > > (Coming to that conclusion suggests that we should structure the request > > > tracking to make reparenting easier.) > > > > > > We could take a pid-1 approach and move all the orphan timelines over to > > > a new parent purely responsible for them. That honestly doesn't seem to > > > achieve anything. (We are still stuck with tasks on the GPU and no way > > > to kill them.) > > > > > > In comparison, persistence is a rarely used "feature" and cleaning up on > > > context close fits in nicely with the process model. It just works as > > > most users/clients would expect. (Although running in non-persistent > > > by default hasn't show anything to explode on the desktop, it's too easy > > > to construct scenarios where persistence turns out to be an advantage, > > > particularly with chains of clients (the compositor model).) Between the > > > two modes, we should have most bases covered, it's hard to argue for a > > > third way (that is until someone has a usecase!) > > > -Chris > > > > Ok, makes sense. Thanks. > > > > But have we converged on a decision :-) > > > > As I said, requiring compute umd optin should be ok for the immediate HPC > issue, but I'd personally argue that it's valid to change the contract for > hangcheck=0 and switch the default to non-persistent. > > Could you tender > > diff --git a/runtime/os_interface/linux/drm_neo.cpp > b/runtime/os_interface/linux/drm_neo.cpp > index 31deb68b..8a9af363 100644 > --- a/runtime/os_interface/linux/drm_neo.cpp > +++ b/runtime/os_interface/linux/drm_neo.cpp > @@ -141,11 +141,22 @@ void Drm::setLowPriorityContextParam(uint32_t > drmContextId) { > UNRECOVERABLE_IF(retVal != 0); > } > > +void setNonPersistent(uint32_t drmContextId) { > + drm_i915_gem_context_param gcp = {}; > + gcp.ctx_id = drmContextId; > + gcp.param = 0xb; /* I915_CONTEXT_PARAM_PERSISTENCE; */ > + > + ioctl(DRM_IOCTL_I915_GEM_CONTEXT_SETPARAM, &gcp); > +} > + > uint32_t Drm::createDrmContext() { > drm_i915_gem_context_create gcc = {}; > auto retVal = ioctl(DRM_IOCTL_I915_GEM_CONTEXT_CREATE, &gcc); > UNRECOVERABLE_IF(retVal != 0); > > + /* enable cleanup of resources on process termination */ > + setNonPersistent(gcc.ctx_id); > + > return gcc.ctx_id; > } > > to interested parties? > -Chris Yes, that's exactly what I had in mind. I think it's enough to resolve the HPC challenges. _______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/intel-gfx