Quoting Tvrtko Ursulin (2019-09-23 14:48:44) > > On 21/09/2019 10:55, Chris Wilson wrote: > > Force bonded requests to run on distinct engines so that they cannot be > > shuffled onto the same engine where timeslicing will reverse the order. > > A bonded request will often wait on a semaphore signaled by its master, > > creating an implicit dependency -- if we ignore that implicit dependency > > and allow the bonded request to run on the same engine and before its > > master, we will cause a GPU hang. [Whether it will hang the GPU is > > debatable, we should keep on timeslicing and each timeslice should be > > "accidentally" counted as forward progress, in which case it should run > > but at one-half to one-third speed.] > > > > We can prevent this inversion by restricting which engines we allow > > ourselves to jump to upon preemption, i.e. baking in the arrangement > > established at first execution. (We should also consider capturing the > > implicit dependency using i915_sched_add_dependency(), but first we need > > to think about the constraints that requires on the execution/retirement > > ordering.) > > > > Fixes: 8ee36e048c98 ("drm/i915/execlists: Minimalistic timeslicing") > > References: ee1136908e9b ("drm/i915/execlists: Virtual engine bonding") > > Testcase: igt/gem_exec_balancer/bonded-slice > > Signed-off-by: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx> > > Cc: Mika Kuoppala <mika.kuoppala@xxxxxxxxxxxxxxx> > > Cc: Tvrtko Ursulin <tvrtko.ursulin@xxxxxxxxx> > > Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@xxxxxxxxx> > > --- > > drivers/gpu/drm/i915/gt/intel_lrc.c | 20 ++++++++++++-------- > > 1 file changed, 12 insertions(+), 8 deletions(-) > > > > diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c b/drivers/gpu/drm/i915/gt/intel_lrc.c > > index 3eadd294bcd7..9872bb4c99fc 100644 > > --- a/drivers/gpu/drm/i915/gt/intel_lrc.c > > +++ b/drivers/gpu/drm/i915/gt/intel_lrc.c > > @@ -3846,18 +3846,22 @@ static void > > virtual_bond_execute(struct i915_request *rq, struct dma_fence *signal) > > { > > struct virtual_engine *ve = to_virtual_engine(rq->engine); > > + intel_engine_mask_t allowed, exec; > > struct ve_bond *bond; > > > > + allowed = ~to_request(signal)->engine->mask; > > + > > bond = virtual_find_bond(ve, to_request(signal)->engine); > > - if (bond) { > > - intel_engine_mask_t old, new, cmp; > > + if (bond) > > + allowed &= bond->sibling_mask; > > > > - cmp = READ_ONCE(rq->execution_mask); > > - do { > > - old = cmp; > > - new = cmp & bond->sibling_mask; > > - } while ((cmp = cmpxchg(&rq->execution_mask, old, new)) != old); > > - } > > + /* Restrict the bonded request to run on only the available engines */ > > + exec = READ_ONCE(rq->execution_mask); > > + while (!try_cmpxchg(&rq->execution_mask, &exec, exec & allowed)) > > + ; > > + > > + /* Prevent the master from being re-run on the bonded engines */ > > + to_request(signal)->execution_mask &= ~allowed; > > I had an open here whether we should also fix the slave. At re-submit > time what is preventing the slave being put on the same engine as the > master? We removed the master engine from the list of allowed engines for the slave, and we never reset the execution_mask on unsubmit (and we don't re-execute the bonding on resubmit). That's the hard bit I left open for the reader. -Chris _______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/intel-gfx