Quoting Lionel Landwerlin (2019-06-27 15:50:38) > On 27/06/2019 13:02, Chris Wilson wrote: > > Quoting Lionel Landwerlin (2019-06-27 09:00:44) > >> @@ -1813,67 +1826,86 @@ static int alloc_noa_wait(struct drm_i915_private *i915) > >> return ret; > >> } > >> > >> -static void config_oa_regs(struct drm_i915_private *dev_priv, > >> - const struct i915_oa_reg *regs, > >> - u32 n_regs) > >> +static int config_oa_regs(struct drm_i915_private *i915, > >> + struct i915_oa_config *oa_config) > >> { > >> - u32 i; > >> + struct i915_request *rq; > >> + struct i915_vma *vma; > >> + long timeout; > >> + u32 *cs; > >> + int err; > >> > >> - for (i = 0; i < n_regs; i++) { > >> - const struct i915_oa_reg *reg = regs + i; > >> + rq = i915_request_create(i915->engine[RCS0]->kernel_context); > >> + if (IS_ERR(rq)) > >> + return PTR_ERR(rq); > >> > >> - I915_WRITE(reg->addr, reg->value); > >> + err = i915_active_request_set(&i915->engine[RCS0]->last_oa_config, > >> + rq); > > This will need an explicit mutex as it is not serialised by any single > > timeline. > > > > i915->perf.metrics_lock viable? Needs to be taken here and in execbuf. > > > Not sure that I understand. This function is called under struct_mutex lock. The active_request is serialised by the timeline lock, which for about the next 10 minutes struct_mutex remains in lieu of. I want you to nominate an explicit lock for serialisation of the last_oa_config barrier > > > >> + if (err) { > >> + i915_request_add(rq); > >> + return err; > >> + } > >> + > >> + vma = i915_vma_instance(oa_config->obj, &i915->ggtt.vm, NULL); > >> + if (unlikely(IS_ERR(vma))) { > >> + i915_request_add(rq); > >> + return PTR_ERR(vma); > >> + } > >> + > >> + err = i915_vma_pin(vma, 0, 0, PIN_GLOBAL); > >> + if (err) { > >> + i915_request_add(rq); > >> + return err; > >> } > > This time there is no excuse for not pinning outside of the > > timeline->mutex. > > > Right, okay, but there is so much going on when turning on i915-perf, > draining + idling + context image edition. > > Do we really need to optimize this in this patch? It's not optimizing, it's about the lock nesting. To pin a vma, we may ultimately need to emit requests, at which point trying to pin from inside a request critical section is going to make lockdep rightfully complain. > After all this removes the cpu wait that was done under struct_mutex lock. Puzzling, as all of this is under struct_mutex and may require waits :( Fear not, removing struct_mutex here is easier than execbuf, so only 50 patches! So long as you haven't used struct_mutex for serialising perf :) > >> + timeout = i915_request_wait(rq, I915_WAIT_INTERRUPTIBLE | I915_WAIT_LOCKED, > >> + MAX_SCHEDULE_TIMEOUT); One other thing, I915_WAIT_LOCKED is no longer used. -Chris _______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/intel-gfx