Quoting Tvrtko Ursulin (2021-01-26 11:12:53) > > > On 25/01/2021 14:01, Chris Wilson wrote: > > +static void ipi_schedule(struct work_struct *wrk) > > +{ > > + struct i915_sched_ipi *ipi = container_of(wrk, typeof(*ipi), work); > > + struct i915_request *rq = xchg(&ipi->list, NULL); > > + > > + do { > > + struct i915_request *rn = xchg(&rq->sched.ipi_link, NULL); > > + int prio; > > + > > + prio = ipi_get_prio(rq); > > + > > + /* > > + * For cross-engine scheduling to work we rely on one of two > > + * things: > > + * > > + * a) The requests are using dma-fence fences and so will not > > + * be scheduled until the previous engine is completed, and > > + * so we cannot cross back onto the original engine and end up > > + * queuing an earlier request after the first (due to the > > + * interrupted DFS). > > + * > > + * b) The requests are using semaphores and so may be already > > + * be in flight, in which case if we cross back onto the same > > + * engine, we will already have put the interrupted DFS into > > + * the priolist, and the continuation will now be queued > > + * afterwards [out-of-order]. However, since we are using > > + * semaphores in this case, we also perform yield on semaphore > > + * waits and so will reorder the requests back into the correct > > + * sequence. This occurrence (of promoting a request chain > > + * that crosses the engines using semaphores back unto itself) > > + * should be unlikely enough that it probably does not matter... > > + */ > > + local_bh_disable(); > > + i915_request_set_priority(rq, prio); > > + local_bh_enable(); > > Is it that important and wouldn't the priority order restore eventually > due timeslicing? There would be a window in which we executed userspace code out-of-order. That's enough to scare me! However, for our PI dependency chains it should not matter as the only time we do submit out-of-order, we are stuck on _our_ semaphore that cannot be resolved until the requests are back in-order. I've tried to trick this into causing problems with the i915_selftest/igt_schedule_cycle and gem_exec_schedule/noreorder. Fortunately for my sanity, neither test have caught any problems. This is the handwaving part of removing the global lock. > > + /* > > + * If we are setting the priority before being submitted, see if we > > + * can quickly adjust our own priority in-situ and avoid taking > > + * the contended engine->active.lock. If we need priority inheritance, > > + * take the slow route. > > + */ > > + if (rq_prio(rq) == I915_PRIORITY_INVALID) { > > + struct i915_dependency *p; > > + > > + rcu_read_lock(); > > + for_each_signaler(p, rq) { > > + struct i915_request *s = > > + container_of(p->signaler, typeof(*s), sched); > > + > > + if (rq_prio(s) >= prio) > > + continue; > > + > > + if (__i915_request_is_complete(s)) > > + continue; > > + > > + break; > > + } > > + rcu_read_unlock(); > > Exit this loop with a first lower priority incomplete signaler. What > does the block below then do? Feels like it needs a comment. I thought I had sufficiently explained that in the comment above. /* Update priority in place if no PI required */ > > + if (&p->signal_link == &rq->sched.signalers_list && > > + cmpxchg(&rq->sched.attr.priority, > > + I915_PRIORITY_INVALID, > > + prio) == I915_PRIORITY_INVALID) > > + return; It could do a few more tricks to change the priority in-place a second time, but I did not think that would be frequent enough to matter. Whereas we always adjust the priority from INVALID once before submission, and avoiding taking the lock then does make a difference to the profiles. -Chris _______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/intel-gfx