Re: [PATCH 5/5] drm/i915: Cancel non-persistent contexts on close

Chris Wilson <chris@xxxxxxxxxxxxxxxxxx> · Wed, 07 Aug 2019 17:51:17 +0100

Quoting Bloomfield, Jon (2019-08-07 16:29:55)
> > -----Original Message-----
> > From: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx>
> > Sent: Wednesday, August 7, 2019 8:08 AM
> > To: Bloomfield, Jon <jon.bloomfield@xxxxxxxxx>; intel-
> > gfx@xxxxxxxxxxxxxxxxxxxxx
> > Cc: Joonas Lahtinen <joonas.lahtinen@xxxxxxxxxxxxxxx>; Winiarski, Michal
> > <michal.winiarski@xxxxxxxxx>
> > Subject: RE: [PATCH 5/5] drm/i915: Cancel non-persistent contexts on close
> > 
> > Quoting Bloomfield, Jon (2019-08-07 15:33:51)
> > [skip to end]
> > > We didn't explore the idea of terminating orphaned contexts though
> > (where none of their resources are referenced by any other contexts). Is
> > there a reason why this is not feasible? In the case of compute (certainly
> > HPC) workloads, there would be no compositor taking the output so this
> > might be a solution.
> > 
> > Sounds easier said than done. We have to go through each request and
> > determine it if has an external reference (or if the object holding the
> > reference has an external reference) to see if the output would be
> > visible to a third party. Sounds like a conservative GC :|
> > (Coming to that conclusion suggests that we should structure the request
> > tracking to make reparenting easier.)
> > 
> > We could take a pid-1 approach and move all the orphan timelines over to
> > a new parent purely responsible for them. That honestly doesn't seem to
> > achieve anything. (We are still stuck with tasks on the GPU and no way
> > to kill them.)
> > 
> > In comparison, persistence is a rarely used "feature" and cleaning up on
> > context close fits in nicely with the process model. It just works as
> > most users/clients would expect. (Although running in non-persistent
> > by default hasn't show anything to explode on the desktop, it's too easy
> > to construct scenarios where persistence turns out to be an advantage,
> > particularly with chains of clients (the compositor model).) Between the
> > two modes, we should have most bases covered, it's hard to argue for a
> > third way (that is until someone has a usecase!)
> > -Chris
> 
> Ok, makes sense. Thanks.
> 
> But have we converged on a decision :-)
> 
> As I said, requiring compute umd optin should be ok for the immediate HPC issue, but I'd personally argue that it's valid to change the contract for hangcheck=0 and switch the default to non-persistent.

Could you tender

diff --git a/runtime/os_interface/linux/drm_neo.cpp b/runtime/os_interface/linux/drm_neo.cpp
index 31deb68b..8a9af363 100644
--- a/runtime/os_interface/linux/drm_neo.cpp
+++ b/runtime/os_interface/linux/drm_neo.cpp
@@ -141,11 +141,22 @@ void Drm::setLowPriorityContextParam(uint32_t drmContextId) {
     UNRECOVERABLE_IF(retVal != 0);
 }

+void setNonPersistent(uint32_t drmContextId) {
+    drm_i915_gem_context_param gcp = {};
+    gcp.ctx_id = drmContextId;
+    gcp.param = 0xb; /* I915_CONTEXT_PARAM_PERSISTENCE; */
+
+    ioctl(DRM_IOCTL_I915_GEM_CONTEXT_SETPARAM, &gcp);
+}
+
 uint32_t Drm::createDrmContext() {
     drm_i915_gem_context_create gcc = {};
     auto retVal = ioctl(DRM_IOCTL_I915_GEM_CONTEXT_CREATE, &gcc);
     UNRECOVERABLE_IF(retVal != 0);

+    /* enable cleanup of resources on process termination */
+    setNonPersistent(gcc.ctx_id);
+
     return gcc.ctx_id;
 }

to interested parties?
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/intel-gfx