On Fri, Aug 05, 2016 at 04:13:28PM +0100, Chris Wilson wrote: > When using RCU lookup for the request, commit 0eafec6d3244 ("drm/i915: > Enable lockless lookup of request tracking via RCU"), we acknowledge that > we may race with another thread that could have reallocated the request. > In order for the first thread not to blow up, the second thread must not > clear the request completed before overwriting it. In the RCU lookup, we > allow for the engine/seqno to be replaced but we do not allow for it to > be zeroed. First few remarks: - Commit message definitely needs to explain the tradeoff between avoiding the memset and just making req->engine lookup a bit safer for _rcu like below: diff --git a/drivers/gpu/drm/i915/i915_gem_request.h b/drivers/gpu/drm/i915/i915_gem_request.h index 6002adc43523..e55492ba20ec 100644 --- a/drivers/gpu/drm/i915/i915_gem_request.h +++ b/drivers/gpu/drm/i915/i915_gem_request.h @@ -244,6 +244,26 @@ i915_gem_request_started(const struct drm_i915_gem_request *req) } static inline bool +i915_gem_request_completed_rcu(const struct drm_i915_gem_request *req) +{ + struct intel_engine_cs *engine = READ_ONCE(req->engine); + + /* When we peek at a request solely under rcu protection, without + * hodling a full reference, the request might be in the process of + * getting freed and reallocated. Make sure we don't stumble over a NULL + * engine in that case. + * + * If we are hitting this race it means that the old request has been + * released, which only happens once it has completed. + */ + if (!engine) + return true; + + return i915_seqno_passed(intel_engine_get_seqno(engine), + req->fence.seqno); +} + +static inline bool i915_gem_request_completed(const struct drm_i915_gem_request *req) { return i915_seqno_passed(intel_engine_get_seqno(req->engine), @@ -384,7 +404,7 @@ i915_gem_active_peek_rcu(const struct i915_gem_active *active) struct drm_i915_gem_request *request; request = rcu_dereference(active->request); - if (!request || i915_gem_request_completed(request)) + if (!request || i915_gem_request_completed_rcu(request)) return NULL; return request; @@ -459,7 +479,7 @@ __i915_gem_active_get_rcu(const struct i915_gem_active *active) struct drm_i915_gem_request *request; request = rcu_dereference(active->request); - if (!request || i915_gem_request_completed(request)) + if (!request || i915_gem_request_completed_rcu(request)) return NULL; request = i915_gem_request_get_rcu(request); I'd go as far as putting this as an alternative fix into the changelog. - We need a big hoonking comment somewhere (probably right above the kmem_cache_alloc) why this is not zalloc. Proposal: /* Reallocation can race with rcu-protected request lookup. The * request look code does eventually acquire a full reference, but * before that it has a fast-path to peek at the request * completion. We must make sure that that code can't fall over * a request in the process of getting reinitialized here. Since * it's a pure optimization data integrity is not important, the * only risk is in chasing NULL pointers. Currently this is only * request->engine which must not be cleared. * * Alternative fix would be to make the request peeking more * robust, but that's overhead. Also, requests get reallocated a * lot, avoid the memset makes sense. Hence this is not allocated * with kzalloc, which is a rare exception in the i915 driver. * * BEWARE: Everything must be correctly initialized or set to * NULL! */ > > Fixes: 0eafec6d3244 ("drm/i915: Enable lockless lookup of request...") > Signed-off-by: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx> > Cc: "Goel, Akash" <akash.goel@xxxxxxxxx> > Cc: Daniel Vetter <daniel.vetter@xxxxxxxx> > Cc: Joonas Lahtinen <joonas.lahtinen@xxxxxxxxxxxxxxx> > --- > drivers/gpu/drm/i915/i915_gem_request.c | 6 +++++- > 1 file changed, 5 insertions(+), 1 deletion(-) > > diff --git a/drivers/gpu/drm/i915/i915_gem_request.c b/drivers/gpu/drm/i915/i915_gem_request.c > index b317a672040f..7529b6b5deda 100644 > --- a/drivers/gpu/drm/i915/i915_gem_request.c > +++ b/drivers/gpu/drm/i915/i915_gem_request.c > @@ -355,7 +355,7 @@ i915_gem_request_alloc(struct intel_engine_cs *engine, > if (req && i915_gem_request_completed(req)) > i915_gem_request_retire(req); > > - req = kmem_cache_zalloc(dev_priv->requests, GFP_KERNEL); > + req = kmem_cache_alloc(dev_priv->requests, GFP_KERNEL); > if (!req) > return ERR_PTR(-ENOMEM); > > @@ -375,6 +375,10 @@ i915_gem_request_alloc(struct intel_engine_cs *engine, > req->engine = engine; > req->ctx = i915_gem_context_get(ctx); > > + req->signaling.wait.tsk = NULL; Do we need to reinit this? The important bit is that we remove ourselves from the rb tree, and we do that in intel_engine_remove_wait. > + req->previous_context = NULL; Should we move that into the retire function where we call the lrc unpin? > + req->file_priv = NULL; We already clear this in remove_from_client. Admittedly didn't do a full audit whether those are all we need yet. -Daniel > + > /* > * Reserve space in the ring buffer for all the commands required to > * eventually emit this request. This is to guarantee that the > -- > 2.8.1 > -- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch _______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/intel-gfx