On Thu, Mar 16, 2017 at 03:22:03PM +0530, sourab gupta wrote: > On Thu, 2017-03-16 at 02:03 -0700, Chris Wilson wrote: > > On Thu, Mar 16, 2017 at 02:24:55PM +0530, sourab gupta wrote: > > > On Thu, 2017-03-16 at 01:10 -0700, Chris Wilson wrote: > > > > On Thu, Mar 16, 2017 at 11:44:10AM +0530, sourab.gupta@xxxxxxxxx wrote: > > > > > @@ -3593,6 +3670,7 @@ void i915_oa_init_reg_state(struct intel_engine_cs *engine, > > > > > void i915_oa_update_reg_state(struct intel_engine_cs *engine, > > > > > struct i915_gem_context *ctx, > > > > > uint32_t *reg_state); > > > > > +void i915_perf_command_stream_hook(struct drm_i915_gem_request *req); > > > > > > > > > > /* i915_gem_evict.c */ > > > > > int __must_check i915_gem_evict_something(struct i915_address_space *vm, > > > > > diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c > > > > > index aa75ea2..7af32c97 100644 > > > > > --- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c > > > > > +++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c > > > > > @@ -1441,12 +1441,16 @@ static void eb_export_fence(struct drm_i915_gem_object *obj, > > > > > if (exec_len == 0) > > > > > exec_len = params->batch->size - params->args_batch_start_offset; > > > > > > > > > > + i915_perf_command_stream_hook(params->request); > > > > > > > > Could you have named it anything more cyptic and inconsistent? > > > > > > Sorry. Would 'i915_perf_capture_metrics' work? > > > Can you please suggest an appropriate name otherwise. > > > > The verb we use for writting into the command stream is emit. So > > i915_perf_emit_samples() (emit_record record is clumsy as it is not clear > > whether it is the verb or noun). > > > Thanks for suggesting. I'll use 'i915_perf_emit_samples' here. > > > > > > > > > Quite clearly this can fail, so justify the non handling of errors. > > > > > > > > DRM_ERROR is not error handling, it is an indication that this patch > > > > isn't ready. > > > > > > Well, the intent was to have minimal effect to execbuf normal routine, > > > even if we fail. But, I guess I'm mistaken. > > > I'll rectify this to handle the case, if perf callback fails. > > > > That all depends on whether or not you can handle the unbalanced > > metrics. If simply missing a report is fine, then just kill the > > DRM_ERROR. > > > > The bigger question is whether the following emit can to fail -- once > > the batch is in the request, no more failures are tolerable. You have to > > preallocate reserved space. > > > > Don't you need a flush before the emit following the batch? > > > > Ok. So, that would mean that we have to first of all reserve the space > required by two 'perf_emit_samples' calls, so that we can't fail for the > lack of space in the emit following the batch. > Probably, I could pass an additional boolean parameter 'reserve_space' > set to true in the first call, which would reserve the space for both > emit_samples() calls (through intel_ring_begin)? Hmm, yes, you can just tweak the request->reserved_space in the first call to preallocate some space (and make it remains available) for the second. perf_emit_samples(rq, bool preallocate) { /* then in each sample callback */ cs_len = foo; if (preallocate) rq->reserved_space += cs_len; else rq->reserved_space -= cs_len; cs = intel_ring_begin(rq, cs_len); } > Would it still need the flush before the emit following the batch? > Ideally, the metrics should be collected as close to batch as possible. > So, if there are cache flush/tlb invalidate commands, it would introduce > some lag between the batch and following emit_samples command. > Sorry, I'm not able to gauge the need for flush here. I can understand > it's needed before the batch is programmed for flushing the cache/TLB > entries for the new workload to be submitted. But for the Sample_emit > commands, which generally only capture OA/timestamp/mmio metrics, is > this still required? Depends on the desired accuracy. If you want your metrics sampled after the user pipeline is completed, you need a stall at least, a flush if your metrics include e.g. fragment data. If you want samples taken in whilst the user's batch is still executing, then no. > > > > > +void i915_perf_command_stream_hook(struct drm_i915_gem_request *request) > > > > > +{ > > > > > + struct intel_engine_cs *engine = request->engine; > > > > > + struct drm_i915_private *dev_priv = engine->i915; > > > > > + struct i915_perf_stream *stream; > > > > > + > > > > > + if (!dev_priv->perf.initialized) > > > > > + return; > > > > > + > > > > > + mutex_lock(&dev_priv->perf.streams_lock); > > > > > > > > Justify a new global lock very, very carefully on execbuf. > > > > > > The lock introduced here is to protect the the perf.streams list against > > > addition/deletion while we're processing the list during execbuf call. > > > The other places where the mutex is taken is when a new stream is being > > > created (using perf_open ioctl) or a stream is being destroyed > > > (perf_release ioctl), which just protect the list_add and list_del into > > > the perf.streams list. > > > As such, there should not be much impact on execbuf path. > > > Does this seem reasonable? > > > > It doesn't sound like it needs a mutex, which will simplify the other > > users as well (if switched to, for example, an RCU protected list). > > Ok. Sounds reasonable, I'll switch to using an RCU protected list here. (I may be overthinking this, but it still seems overkill and made the timer callback uglier than expected.) > > > > > + list_for_each_entry(stream, &dev_priv->perf.streams, link) { > > > > > + if ((stream->state == I915_PERF_STREAM_ENABLED) && > > > > > + stream->cs_mode) > > > > > + stream->ops->command_stream_hook(stream, request); > > > > > + } > > > > > + mutex_unlock(&dev_priv->perf.streams_lock); > > > > > +} > > > > > > > > > +static void i915_perf_command_stream_hook_oa(struct i915_perf_stream *stream, > > > > > + struct drm_i915_gem_request *request) > > > > > +{ > > > > > + struct drm_i915_private *dev_priv = request->i915; > > > > > + struct i915_gem_context *ctx = request->ctx; > > > > > + struct i915_perf_cs_sample *sample; > > > > > + u32 addr = 0; > > > > > + u32 cmd, *cs; > > > > > + > > > > > + sample = kzalloc(sizeof(*sample), GFP_KERNEL); > > > > > + if (sample == NULL) { > > > > > + DRM_ERROR("Perf sample alloc failed\n"); > > > > > + return; > > > > > + } > > > > > + > > > > > + cs = intel_ring_begin(request, 4); > > > > > + if (IS_ERR(cs)) { > > > > > + kfree(sample); > > > > > + return; > > > > > + } > > > > > + > > > > > + sample->ctx_id = ctx->hw_id; > > > > > + i915_gem_request_assign(&sample->request, request); > > > > > > > > > + > > > > > + i915_gem_active_set(&stream->last_request, request); > > > > > > > > Hint, you don't need you own request trap. > > > Sorry, I didn't understand you fully here. I'm storing the reference to > > > the last active request associated with the stream inside the > > > last_request 'gem_active' field. Do I need to do something differently > > > here? > > > > It's the duplication. > > Are you suggesting that request_assign() and active_set() is > duplication? > Actually, I'm using the last_request active tracker for the purpose of > waiting on last request to complete, whenever the need. > But still I need to get reference for every request for which the > commands for collection of metrics are emitted. This is because I need > to check for their completion before collecting the associated metrics. Missed the sample / stream difference. request_assign means update the request field, had you used sample->request = i915_gem_request_get(request) it would have been clearer. Note that the requests are not ordered for the stream, so you cannot track the last_request so easily. > This is done inside 'append_command_stream_samples' function, which also > takes care of releasing the reference taken here. > Am I missing something w.r.t. the active_set() functionality? I was mostly thrown by the idea that you were reassigning requests, which history has shown to be a bad idea (hence i915_gem_active). However, stream->last_request should be a reservation_object. -Chris -- Chris Wilson, Intel Open Source Technology Centre _______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/intel-gfx