On Wed, Nov 26, 2014 at 07:23:36PM +0100, Daniel Vetter wrote: > On Wed, Nov 26, 2014 at 02:58:49PM +0000, John Harrison wrote: > > On 26/11/2014 14:42, Daniel Vetter wrote: > > >On Wed, Nov 26, 2014 at 3:12 PM, John Harrison > > ><John.C.Harrison@xxxxxxxxx> wrote: > > >>>>diff --git a/drivers/gpu/drm/i915/i915_drv.h > > >>>>b/drivers/gpu/drm/i915/i915_drv.h > > >>>>index 8bfdac6..831fae2 100644 > > >>>>--- a/drivers/gpu/drm/i915/i915_drv.h > > >>>>+++ b/drivers/gpu/drm/i915/i915_drv.h > > >>>>@@ -3130,4 +3130,11 @@ wait_remaining_ms_from_jiffies(unsigned long > > >>>>timestamp_jiffies, int to_wait_ms) > > >>>> } > > >>>> } > > >>>> +static inline void i915_trace_irq_get(struct intel_engine_cs *ring, > > >>>>+ struct drm_i915_gem_request *req) > > >>>>+{ > > >>>>+ if (ring->trace_irq_req == NULL && ring->irq_get(ring)) > > >>>>+ i915_gem_request_assign(&ring->trace_irq_req, req); > > >>>This looks a bit suspiciuos. Thus far ring->trace_irq_req was essentially > > >>>ot protected at all by anything. But that was nothing troublesome since we > > >>>didn't hang a real resource of it. > > >>> > > >>>But now there's a refcounted request in that pointer, which means if we > > >>>race we leak. I'll skip this patch for now. > > >>>-Daniel > > >> > > >>Race how? The assignment only ever occurs from inside execbuffer submission > > >>at which point the driver mutex lock is held. Therefore it is very > > >>definitely protected. Not doing the reference count means that there is now > > >>the possibility of a dangling pointer and thus the possibility of going bang > > >>with a kernel oops. > > >Hm, ->trace_irq_seqno is indeed always touched from the a calling > > >context with dev->struct_mutex held. Somehow I've misrembered that > > >since the realtime/tracing folks are really freaked out about what > > >we're doing here. But from that pov your patch doesn't really make > > >things worse, so I'll pull it in. > > > > > >Btw I don't see the oops really without this patch. What would blow up? > > >-Daniel > > > > The sole access (and clear to null) of the trace pointer is done from retire > > requests after the requests have been retired. Thus the request structure > > could have just been freed immediately before it is used. The code could be > > re-ordered to be safer but I'm not entirely sure what the trace pointer is > > for or what it might potentially be used for in the future. With the > > reference counting, the ordering is irrelevant. If the pointer exists then > > it is safe to use. > > > > The point is that anywhere that keeps a copy of a request pointer really > > should reference count that copy. Otherwise there is the possibility that > > the pointer could become stale. Either now or with future code changes. If > > the copy is always done with the request_assign() function then the pointer > > is guaranteed safe for all time. > > Oh, I guess you've misunderstood what I've done. Ive dropped the entire > patch here, not just the refcounting. Dropping the refcounting alone is > obviously oops-worthy. Trying to clarify more: The problem I've thought I've seen is _not_ with the unref/ref done without holding struct mutex. But with different people accessing ring->trace_irq_* without locks. Somehow I've thought we've had no common lock thus far for that - iirc there was a bit of that code in the irq handler once, but I can't find it any more. -Daniel -- Daniel Vetter Software Engineer, Intel Corporation +41 (0) 79 365 57 48 - http://blog.ffwll.ch _______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx http://lists.freedesktop.org/mailman/listinfo/intel-gfx