On Thu, Jun 25, 2015 at 06:02:35AM +0000, Gupta, Sourab wrote: > On Mon, 2015-06-22 at 16:09 +0000, Daniel Vetter wrote: > > On Mon, Jun 22, 2015 at 02:22:54PM +0100, Chris Wilson wrote: > > > On Mon, Jun 22, 2015 at 03:25:07PM +0530, sourab.gupta@xxxxxxxxx wrote: > > > > From: Sourab Gupta <sourab.gupta@xxxxxxxxx> > > > > > > > > To collect timestamps around any GPU workload, we need to insert > > > > commands to capture them into the ringbuffer. Therefore, during the stop event > > > > call, we need to wait for GPU to complete processing the last request for > > > > which these commands were inserted. > > > > We need to ensure this processing is done before event_destroy callback which > > > > deallocates the buffer for holding the data. > > > > > > There's no reason for this to be synchronous. Just that you need an > > > active reference on the output buffer to be only released after the > > > final request. > > > > Yeah I think the interaction between OA sampling and GEM is the critical > > piece here for both patch series. Step one is to have a per-pmu lock to > > keep track of data private to OA and mmio based sampling as a starting > > point. Then we need to figure out how to structure the connection without > > OA/PMU and gem trampling over each another's feet. > > > > Wrt requests those are refcounted already and just waiting on them doesn't > > need dev->struct_mutex. That should be all you need, assuming you do > > correctly refcount them ... > > -Daniel > > Hi Daniel/Chris, > > Thanks for your feedback. I acknowledge the issue with increasing > coverage scope of i915 dev mutex. I'll have a per-pmu lock to keep track > of data private to OA in my next version of patches. > > W.r.t. having the synchronous wait during event stop, I thought through > the method of having active reference on output buffer. This will let us > have a delayed buffer destroy (controlled by decrement of output buffer > refcount as and when gpu requests complete). This will be decoupled from > the event stop here. But that is not the only thing at play here. > > The event stop is expected to disable the OA unit (which it is doing > right now). In my usecase, I can't disable OA unit, till the time GPU > processes the MI_RPC requests scheduled already, which otherwise may > lead to hangs. > So, I'm waiting synchronously for requests to complete before disabling > OA unit. There's no issue here. Just hook into the retire-notification callback for the last active request of the oa buffer. If you are using LRI to disable the OA Context writing to the buffer, you can simply create a request for the LRI and use the active reference as the retire notification. (Otoh, if the oa buffer is inactive you can simply do the decoupling immediately.) You shouldn't need a object-free-notification callback aiui. > Also, the event_destroy callback should be destroying the buffers > associated with event. (Right now, in current design, event_destroy > callback waits for the above operations to be completed before > proceeding to destroy the buffer). > > If I try to create a function which destroys the output buffer according > to all references being released, all these operations will have to be > performed together in that function, in this serial order (so that when > refcount drops to 0, i.e. requests have completed, OA unit will be > disabled, and then the buffer destroyed). This would lead to these > operations being decoupled from the event_stop and event_destroy > functions. This may be non-intentional behaviour w.r.t. these callbacks > from userspace perspective. When the perf event is stopped, you stop generating perf events. But the buffer may still be alive, and may be resurrected if you a new event is started, but you will want to be sure to ensure that pending oa writes are ignored. -Chris -- Chris Wilson, Intel Open Source Technology Centre _______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx http://lists.freedesktop.org/mailman/listinfo/intel-gfx