On Wed, Dec 16, 2015 at 10:44:19AM +0100, Daniel Vetter wrote: > On Fri, Dec 11, 2015 at 11:33:03AM +0000, Chris Wilson wrote: > > As the request is only valid during the same global reset epoch, we can > > record the current reset_counter when constructing the request and reuse > > it when waiting upon that request in future. This removes a very hairy > > atomic check serialised by the struct_mutex at the time of waiting and > > allows us to transfer those waits to a central dispatcher for all > > waiters and all requests. > > > > Signed-off-by: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx> > > Cc: Daniel Vetter <daniel.vetter@xxxxxxxx> > > "global reset epoch" got me thinking about what the implications for TDR > are. Now the current TDR patches iirc are still somewhat tacked on the > side of the existing reset handling, and so don't end up incrementing the > reset counter. > > But I don't agree with that design decision, and I think any reset must > increment the counter (and do the reset-in-progress flag dance), since > that's the only way to guarantee that waiters will get off locks. Even > though we have fewer of those with every kernel release. > > But then there's the problem that at least for some request they'll still > be valid after the reset, and would break with this design here. Which can > easily be fixed by incrementing the reset epoch for every request we > decide should keep running (or which gets re-scheduled/emitted for another > attempt), and doing that explicitly seems like a good idea. > > The only implication I see is that for per-ring reset we might want to go > to a per-engine reset epoch counter. Yes, I think per-engine epoch's will be required. (I've been trying to choose my words carefully so that it is clear that it only applies to today and not in a TDR future!) > So given all that I think this is a solid idea. But one comment below as a > fallout. > > > --- > > drivers/gpu/drm/i915/i915_drv.h | 2 +- > > drivers/gpu/drm/i915/i915_gem.c | 40 +++++++++++---------------------- > > drivers/gpu/drm/i915/intel_display.c | 7 +----- > > drivers/gpu/drm/i915/intel_lrc.c | 7 ------ > > drivers/gpu/drm/i915/intel_ringbuffer.c | 6 ----- > > 5 files changed, 15 insertions(+), 47 deletions(-) > > > > diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h > > index 1043ddd670a5..f30c305a6889 100644 > > --- a/drivers/gpu/drm/i915/i915_drv.h > > +++ b/drivers/gpu/drm/i915/i915_drv.h > > @@ -2178,6 +2178,7 @@ struct drm_i915_gem_request { > > /** On Which ring this request was generated */ > > struct drm_i915_private *i915; > > struct intel_engine_cs *ring; > > + unsigned reset_counter; > > > > /** GEM sequence number associated with the previous request, > > * when the HWS breadcrumb is equal to this the GPU is processing > > @@ -3059,7 +3060,6 @@ void __i915_add_request(struct drm_i915_gem_request *req, > > #define i915_add_request_no_flush(req) \ > > __i915_add_request(req, NULL, false) > > int __i915_wait_request(struct drm_i915_gem_request *req, > > - unsigned reset_counter, > > bool interruptible, > > s64 *timeout, > > struct intel_rps_client *rps); > > diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c > > index 27e617b76418..b17cc0e42a4f 100644 > > --- a/drivers/gpu/drm/i915/i915_gem.c > > +++ b/drivers/gpu/drm/i915/i915_gem.c > > @@ -1214,7 +1214,6 @@ static int __i915_spin_request(struct drm_i915_gem_request *req, int state) > > /** > > * __i915_wait_request - wait until execution of request has finished > > * @req: duh! > > - * @reset_counter: reset sequence associated with the given request > > * @interruptible: do an interruptible wait (normally yes) > > * @timeout: in - how long to wait (NULL forever); out - how much time remaining > > * > > @@ -1229,7 +1228,6 @@ static int __i915_spin_request(struct drm_i915_gem_request *req, int state) > > * errno with remaining time filled in timeout argument. > > */ > > int __i915_wait_request(struct drm_i915_gem_request *req, > > - unsigned reset_counter, > > bool interruptible, > > s64 *timeout, > > struct intel_rps_client *rps) > > @@ -1288,7 +1286,7 @@ int __i915_wait_request(struct drm_i915_gem_request *req, > > > > /* We need to check whether any gpu reset happened in between > > * the caller grabbing the seqno and now ... */ > > - if (reset_counter != i915_reset_counter(&dev_priv->gpu_error)) { > > + if (req->reset_counter != i915_reset_counter(&dev_priv->gpu_error)) { > > READ_ONCE(req->reset) as a future proofing maybe for TDR? Or punt that to > TDR? Could be confusing to READ_ONCE something that's (with the current > code at least) invariant. Right, my plan was to solve this in TDR! At the moment, I am thinking that we regenerate a new request for each one resubmitted (that requires patching the seqno embedded into the ring before restarting the ring), but otherwise we can just duplicate the contents of the request. -Chris -- Chris Wilson, Intel Open Source Technology Centre _______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx http://lists.freedesktop.org/mailman/listinfo/intel-gfx