On Mon, 2015-06-22 at 16:09 +0000, Daniel Vetter wrote: > On Mon, Jun 22, 2015 at 02:22:54PM +0100, Chris Wilson wrote: > > On Mon, Jun 22, 2015 at 03:25:07PM +0530, sourab.gupta@xxxxxxxxx wrote: > > > From: Sourab Gupta <sourab.gupta@xxxxxxxxx> > > > > > > To collect timestamps around any GPU workload, we need to insert > > > commands to capture them into the ringbuffer. Therefore, during the stop event > > > call, we need to wait for GPU to complete processing the last request for > > > which these commands were inserted. > > > We need to ensure this processing is done before event_destroy callback which > > > deallocates the buffer for holding the data. > > > > There's no reason for this to be synchronous. Just that you need an > > active reference on the output buffer to be only released after the > > final request. > > Yeah I think the interaction between OA sampling and GEM is the critical > piece here for both patch series. Step one is to have a per-pmu lock to > keep track of data private to OA and mmio based sampling as a starting > point. Then we need to figure out how to structure the connection without > OA/PMU and gem trampling over each another's feet. > > Wrt requests those are refcounted already and just waiting on them doesn't > need dev->struct_mutex. That should be all you need, assuming you do > correctly refcount them ... > -Daniel Hi Daniel/Chris, Thanks for your feedback. I acknowledge the issue with increasing coverage scope of i915 dev mutex. I'll have a per-pmu lock to keep track of data private to OA in my next version of patches. W.r.t. having the synchronous wait during event stop, I thought through the method of having active reference on output buffer. This will let us have a delayed buffer destroy (controlled by decrement of output buffer refcount as and when gpu requests complete). This will be decoupled from the event stop here. But that is not the only thing at play here. The event stop is expected to disable the OA unit (which it is doing right now). In my usecase, I can't disable OA unit, till the time GPU processes the MI_RPC requests scheduled already, which otherwise may lead to hangs. So, I'm waiting synchronously for requests to complete before disabling OA unit. Also, the event_destroy callback should be destroying the buffers associated with event. (Right now, in current design, event_destroy callback waits for the above operations to be completed before proceeding to destroy the buffer). If I try to create a function which destroys the output buffer according to all references being released, all these operations will have to be performed together in that function, in this serial order (so that when refcount drops to 0, i.e. requests have completed, OA unit will be disabled, and then the buffer destroyed). This would lead to these operations being decoupled from the event_stop and event_destroy functions. This may be non-intentional behaviour w.r.t. these callbacks from userspace perspective. Robert, any thoughts? _______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx http://lists.freedesktop.org/mailman/listinfo/intel-gfx