Re: [PATCH v3 17/28] drm/i915: Convert 'trace_irq' to use requests rather than seqnos

Daniel Vetter <daniel@xxxxxxxx> · Wed, 26 Nov 2014 19:25:48 +0100



On Wed, Nov 26, 2014 at 07:23:36PM +0100, Daniel Vetter wrote:
> On Wed, Nov 26, 2014 at 02:58:49PM +0000, John Harrison wrote:
> > On 26/11/2014 14:42, Daniel Vetter wrote:
> > >On Wed, Nov 26, 2014 at 3:12 PM, John Harrison
> > ><John.C.Harrison@xxxxxxxxx> wrote:
> > >>>>diff --git a/drivers/gpu/drm/i915/i915_drv.h
> > >>>>b/drivers/gpu/drm/i915/i915_drv.h
> > >>>>index 8bfdac6..831fae2 100644
> > >>>>--- a/drivers/gpu/drm/i915/i915_drv.h
> > >>>>+++ b/drivers/gpu/drm/i915/i915_drv.h
> > >>>>@@ -3130,4 +3130,11 @@ wait_remaining_ms_from_jiffies(unsigned long
> > >>>>timestamp_jiffies, int to_wait_ms)
> > >>>>         }
> > >>>>   }
> > >>>>   +static inline void i915_trace_irq_get(struct intel_engine_cs *ring,
> > >>>>+                                     struct drm_i915_gem_request *req)
> > >>>>+{
> > >>>>+       if (ring->trace_irq_req == NULL && ring->irq_get(ring))
> > >>>>+               i915_gem_request_assign(&ring->trace_irq_req, req);
> > >>>This looks a bit suspiciuos. Thus far ring->trace_irq_req was essentially
> > >>>ot protected at all by anything. But that was nothing troublesome since we
> > >>>didn't hang a real resource of it.
> > >>>
> > >>>But now there's a refcounted request in that pointer, which means if we
> > >>>race we leak. I'll skip this patch for now.
> > >>>-Daniel
> > >>
> > >>Race how? The assignment only ever occurs from inside execbuffer submission
> > >>at which point the driver mutex lock is held. Therefore it is very
> > >>definitely protected. Not doing the reference count means that there is now
> > >>the possibility of a dangling pointer and thus the possibility of going bang
> > >>with a kernel oops.
> > >Hm, ->trace_irq_seqno is indeed always touched from the a calling
> > >context with dev->struct_mutex held. Somehow I've misrembered that
> > >since the realtime/tracing folks are really freaked out about what
> > >we're doing here. But from that pov your patch doesn't really make
> > >things worse, so I'll pull it in.
> > >
> > >Btw I don't see the oops really without this patch. What would blow up?
> > >-Daniel
> > 
> > The sole access (and clear to null) of the trace pointer is done from retire
> > requests after the requests have been retired. Thus the request structure
> > could have just been freed immediately before it is used. The code could be
> > re-ordered to be safer but I'm not entirely sure what the trace pointer is
> > for or what it might potentially be used for in the future. With the
> > reference counting, the ordering is irrelevant. If the pointer exists then
> > it is safe to use.
> > 
> > The point is that anywhere that keeps a copy of a request pointer really
> > should reference count that copy. Otherwise there is the possibility that
> > the pointer could become stale. Either now or with future code changes. If
> > the copy is always done with the request_assign() function then the pointer
> > is guaranteed safe for all time.
> 
> Oh, I guess you've misunderstood what I've done. Ive dropped the entire
> patch here, not just the refcounting. Dropping the refcounting alone is
> obviously oops-worthy.

Trying to clarify more: The problem I've thought I've seen is _not_ with
the unref/ref done without holding struct mutex. But with different people
accessing ring->trace_irq_* without locks. Somehow I've thought we've had
no common lock thus far for that - iirc there was a bit of that code in
the irq handler once, but I can't find it any more.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch
_______________________________________________
Intel-gfx mailing list
Intel-gfx@xxxxxxxxxxxxxxxxxxxxx
http://lists.freedesktop.org/mailman/listinfo/intel-gfx