Re: [PATCH 3/8] drm/i915: Framework for capturing command stream based OA reports

Chris Wilson <chris@xxxxxxxxxxxxxxxxxx> · Thu, 16 Mar 2017 09:03:15 +0000

On Thu, Mar 16, 2017 at 02:24:55PM +0530, sourab gupta wrote:
> On Thu, 2017-03-16 at 01:10 -0700, Chris Wilson wrote:
> > On Thu, Mar 16, 2017 at 11:44:10AM +0530, sourab.gupta@xxxxxxxxx wrote:
> > > @@ -3593,6 +3670,7 @@ void i915_oa_init_reg_state(struct intel_engine_cs *engine,
> > >  void i915_oa_update_reg_state(struct intel_engine_cs *engine,
> > >  			      struct i915_gem_context *ctx,
> > >  			      uint32_t *reg_state);
> > > +void i915_perf_command_stream_hook(struct drm_i915_gem_request *req);
> > >  
> > >  /* i915_gem_evict.c */
> > >  int __must_check i915_gem_evict_something(struct i915_address_space *vm,
> > > diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> > > index aa75ea2..7af32c97 100644
> > > --- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> > > +++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> > > @@ -1441,12 +1441,16 @@ static void eb_export_fence(struct drm_i915_gem_object *obj,
> > >  	if (exec_len == 0)
> > >  		exec_len = params->batch->size - params->args_batch_start_offset;
> > >  
> > > +	i915_perf_command_stream_hook(params->request);
> > 
> > Could you have named it anything more cyptic and inconsistent?
> 
> Sorry. Would 'i915_perf_capture_metrics' work?
> Can you please suggest an appropriate name otherwise.

The verb we use for writting into the command stream is emit. So
i915_perf_emit_samples() (emit_record record is clumsy as it is not clear
whether it is the verb or noun).

> > 
> > Quite clearly this can fail, so justify the non handling of errors.
> > 
> > DRM_ERROR is not error handling, it is an indication that this patch
> > isn't ready.
> 
> Well, the intent was to have minimal effect to execbuf normal routine,
> even if we fail. But, I guess I'm mistaken.
> I'll rectify this to handle the case, if perf callback fails.

That all depends on whether or not you can handle the unbalanced
metrics. If simply missing a report is fine, then just kill the
DRM_ERROR.

The bigger question is whether the following emit can to fail -- once
the batch is in the request, no more failures are tolerable. You have to
preallocate reserved space.

Don't you need a flush before the emit following the batch?

> > > +void i915_perf_command_stream_hook(struct drm_i915_gem_request *request)
> > > +{
> > > +	struct intel_engine_cs *engine = request->engine;
> > > +	struct drm_i915_private *dev_priv = engine->i915;
> > > +	struct i915_perf_stream *stream;
> > > +
> > > +	if (!dev_priv->perf.initialized)
> > > +		return;
> > > +
> > > +	mutex_lock(&dev_priv->perf.streams_lock);
> > 
> > Justify a new global lock very, very carefully on execbuf.
> 
> The lock introduced here is to protect the the perf.streams list against
> addition/deletion while we're processing the list during execbuf call.
> The other places where the mutex is taken is when a new stream is being
> created (using perf_open ioctl) or a stream is being destroyed
> (perf_release ioctl), which just protect the list_add and list_del into
> the perf.streams list.
> As such, there should not be much impact on execbuf path.
> Does this seem reasonable?

It doesn't sound like it needs a mutex, which will simplify the other
users as well (if switched to, for example, an RCU protected list).

> > > +	list_for_each_entry(stream, &dev_priv->perf.streams, link) {
> > > +		if ((stream->state == I915_PERF_STREAM_ENABLED) &&
> > > +					stream->cs_mode)
> > > +			stream->ops->command_stream_hook(stream, request);
> > > +	}
> > > +	mutex_unlock(&dev_priv->perf.streams_lock);
> > > +}
> > 
> > > +static void i915_perf_command_stream_hook_oa(struct i915_perf_stream *stream,
> > > +					struct drm_i915_gem_request *request)
> > > +{
> > > +	struct drm_i915_private *dev_priv = request->i915;
> > > +	struct i915_gem_context *ctx = request->ctx;
> > > +	struct i915_perf_cs_sample *sample;
> > > +	u32 addr = 0;
> > > +	u32 cmd, *cs;
> > > +
> > > +	sample = kzalloc(sizeof(*sample), GFP_KERNEL);
> > > +	if (sample == NULL) {
> > > +		DRM_ERROR("Perf sample alloc failed\n");
> > > +		return;
> > > +	}
> > > +
> > > +	cs = intel_ring_begin(request, 4);
> > > +	if (IS_ERR(cs)) {
> > > +		kfree(sample);
> > > +		return;
> > > +	}
> > > +
> > > +	sample->ctx_id = ctx->hw_id;
> > > +	i915_gem_request_assign(&sample->request, request);
> > 
> > > +
> > > +	i915_gem_active_set(&stream->last_request, request);
> > 
> > Hint, you don't need you own request trap.
> Sorry, I didn't understand you fully here. I'm storing the reference to
> the last active request associated with the stream inside the
> last_request 'gem_active' field. Do I need to do something differently
> here?

It's the duplication.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/intel-gfx