Quoting Tvrtko Ursulin (2020-05-26 17:00:06) > > On 26/05/2020 10:07, Chris Wilson wrote: > > When we push a virtual request onto the HW, we update the rq->engine to > > point to the physical engine. A request that is then submitted by the > > user that waits upon the virtual engine, but along the physical engine > > in use, will then see that it is due to be submitted to the same engine > > and take a shortcut (and be queued without waiting for the completion > > fence). However, the virtual request may be preempted (either by higher > > priority users, or by timeslicing) and removed from the physical engine > > to be migrated over to one of its siblings. The dependent normal request > > however is oblivious to the removal of the virtual request and remains > > queued to execute on HW, believing that once it reaches the head of its > > queue all of its predecessors will have completed executing! > > > > v2: Beware restriction of signal->execution_mask prior to submission. > > > > Fixes: 6d06779e8672 ("drm/i915: Load balancing across a virtual engine") > > Testcase: igt/gem_exec_balancer/sliced > > Signed-off-by: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx> > > Cc: Tvrtko Ursulin <tvrtko.ursulin@xxxxxxxxx> > > Cc: <stable@xxxxxxxxxxxxxxx> # v5.3+ > > --- > > drivers/gpu/drm/i915/i915_request.c | 25 +++++++++++++++++++++---- > > 1 file changed, 21 insertions(+), 4 deletions(-) > > > > diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c > > index 33bbad623e02..0b07ccc7e9bc 100644 > > --- a/drivers/gpu/drm/i915/i915_request.c > > +++ b/drivers/gpu/drm/i915/i915_request.c > > @@ -1237,6 +1237,25 @@ i915_request_await_execution(struct i915_request *rq, > > return 0; > > } > > > > +static int > > +await_request_submit(struct i915_request *to, struct i915_request *from) > > +{ > > + /* > > + * If we are waiting on a virtual engine, then it may be > > + * constrained to execute on a single engine *prior* to submission. > > + * When it is submitted, it will be first submitted to the virtual > > + * engine and then passed to the physical engine. We cannot allow > > + * the waiter to be submitted immediately to the physical engine > > + * as it may then bypass the virtual request. > > + */ > > + if (to->engine == READ_ONCE(from->engine)) > > + return i915_sw_fence_await_sw_fence_gfp(&to->submit, > > + &from->submit, > > + I915_FENCE_GFP); > > + else > > When can engines be different and the mask test below brought us here? We change the mask during evaluation of the bond, which is from the signaler's signaler's execute_cb before the signaler is submitted. So there will be a period where the from->execution_mask is constrained to a single engine, but it is still waiting to be queued. Once it is executed on HW, it will remain on that engine. -Chris