Re: [PATCH v5 09/10] drm/i915/perf: execute OA configuration from command stream

Chris Wilson <chris@xxxxxxxxxxxxxxxxxx> · Thu, 27 Jun 2019 16:04:46 +0100

Quoting Lionel Landwerlin (2019-06-27 15:50:38)
> On 27/06/2019 13:02, Chris Wilson wrote:
> > Quoting Lionel Landwerlin (2019-06-27 09:00:44)
> >> @@ -1813,67 +1826,86 @@ static int alloc_noa_wait(struct drm_i915_private *i915)
> >>          return ret;
> >>   }
> >>   
> >> -static void config_oa_regs(struct drm_i915_private *dev_priv,
> >> -                          const struct i915_oa_reg *regs,
> >> -                          u32 n_regs)
> >> +static int config_oa_regs(struct drm_i915_private *i915,
> >> +                         struct i915_oa_config *oa_config)
> >>   {
> >> -       u32 i;
> >> +       struct i915_request *rq;
> >> +       struct i915_vma *vma;
> >> +       long timeout;
> >> +       u32 *cs;
> >> +       int err;
> >>   
> >> -       for (i = 0; i < n_regs; i++) {
> >> -               const struct i915_oa_reg *reg = regs + i;
> >> +       rq = i915_request_create(i915->engine[RCS0]->kernel_context);
> >> +       if (IS_ERR(rq))
> >> +               return PTR_ERR(rq);
> >>   
> >> -               I915_WRITE(reg->addr, reg->value);
> >> +       err = i915_active_request_set(&i915->engine[RCS0]->last_oa_config,
> >> +                                     rq);
> > This will need an explicit mutex as it is not serialised by any single
> > timeline.
> >
> > i915->perf.metrics_lock viable? Needs to be taken here and in execbuf.
> 
> 
> Not sure that I understand. This function is called under struct_mutex lock.

The active_request is serialised by the timeline lock, which for about
the next 10 minutes struct_mutex remains in lieu of. I want you to
nominate an explicit lock for serialisation of the last_oa_config
barrier

> >
> >> +       if (err) {
> >> +               i915_request_add(rq);
> >> +               return err;
> >> +       }
> >> +
> >> +       vma = i915_vma_instance(oa_config->obj, &i915->ggtt.vm, NULL);
> >> +       if (unlikely(IS_ERR(vma))) {
> >> +               i915_request_add(rq);
> >> +               return PTR_ERR(vma);
> >> +       }
> >> +
> >> +       err = i915_vma_pin(vma, 0, 0, PIN_GLOBAL);
> >> +       if (err) {
> >> +               i915_request_add(rq);
> >> +               return err;
> >>          }
> > This time there is no excuse for not pinning outside of the
> > timeline->mutex.
> 
> 
> Right, okay, but there is so much going on when turning on i915-perf, 
> draining + idling + context image edition.
> 
> Do we really need to optimize this in this patch?

It's not optimizing, it's about the lock nesting. To pin a vma, we may
ultimately need to emit requests, at which point trying to pin from
inside a request critical section is going to make lockdep rightfully
complain.

> After all this removes the cpu wait that was done under struct_mutex lock.

Puzzling, as all of this is under struct_mutex and may require waits :(
Fear not, removing struct_mutex here is easier than execbuf, so only 50
patches!

So long as you haven't used struct_mutex for serialising perf :)

> >> +       timeout = i915_request_wait(rq, I915_WAIT_INTERRUPTIBLE | I915_WAIT_LOCKED,
> >> +                                   MAX_SCHEDULE_TIMEOUT);

One other thing, I915_WAIT_LOCKED is no longer used.
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/intel-gfx