Quoting Tvrtko Ursulin (2019-09-20 15:51:35) > > On 20/09/2019 13:42, Chris Wilson wrote: > > Quoting Tvrtko Ursulin (2019-09-20 13:24:47) > >> > >> On 20/09/2019 09:36, Chris Wilson wrote: > >>> Force bonded requests to run on distinct engines so that they cannot be > >>> shuffled onto the same engine where timeslicing will reverse the order. > >>> A bonded request will often wait on a semaphore signaled by its master, > >>> creating an implicit dependency -- if we ignore that implicit dependency > >>> and allow the bonded request to run on the same engine and before its > >>> master, we will cause a GPU hang. > >>> > >>> We can prevent this inversion by restricting which engines we allow > >>> ourselves to jump to upon preemption, i.e. baking in the arrangement > >>> established at first execution. (We should also consider capturing the > >>> implicit dependency using i915_sched_add_dependency(), but first we need > >>> to think about the constraints that requires on the execution/retirement > >>> ordering.) > >>> > >>> Fixes: 8ee36e048c98 ("drm/i915/execlists: Minimalistic timeslicing") > >>> References: ee1136908e9b ("drm/i915/execlists: Virtual engine bonding") > >>> Signed-off-by: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx> > >>> Cc: Mika Kuoppala <mika.kuoppala@xxxxxxxxxxxxxxx> > >>> Cc: Tvrtko Ursulin <tvrtko.ursulin@xxxxxxxxx> > >>> --- > >>> drivers/gpu/drm/i915/gt/intel_lrc.c | 19 +++++++++++-------- > >>> 1 file changed, 11 insertions(+), 8 deletions(-) > >>> > >>> diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c b/drivers/gpu/drm/i915/gt/intel_lrc.c > >>> index a99166a2d2eb..7920649e4d87 100644 > >>> --- a/drivers/gpu/drm/i915/gt/intel_lrc.c > >>> +++ b/drivers/gpu/drm/i915/gt/intel_lrc.c > >>> @@ -3755,18 +3755,21 @@ static void > >>> virtual_bond_execute(struct i915_request *rq, struct dma_fence *signal) > >>> { > >>> struct virtual_engine *ve = to_virtual_engine(rq->engine); > >>> + intel_engine_mask_t allowed, exec; > >>> struct ve_bond *bond; > >>> > >>> bond = virtual_find_bond(ve, to_request(signal)->engine); > >>> - if (bond) { > >>> - intel_engine_mask_t old, new, cmp; > >>> + if (!bond) > >>> + return; > >>> > >>> - cmp = READ_ONCE(rq->execution_mask); > >>> - do { > >>> - old = cmp; > >>> - new = cmp & bond->sibling_mask; > >>> - } while ((cmp = cmpxchg(&rq->execution_mask, old, new)) != old); > >>> - } > >>> + /* Restrict the bonded request to run on only the slaved engines */ > >>> + allowed = bond->sibling_mask & ~to_request(signal)->engine->mask; > >> > >> Hmm.. isn't it a miss on the uapi level that we allow master to be > >> mentioned in the list of bonds? That's the only scenario where this line > >> does something I think. So should we just forbid this setup on the uapi > >> level? > > > > That's just a lot of digging! > > It's simple, in set_engines__bond in the bonds loop we just do "if > (master == bond) oh_no_you_wont". Am I missing something? There doesn't even need to be a bond, which is something I forgot to handle. So I'm looking at something more like static void virtual_bond_execute(struct i915_request *rq, struct dma_fence *signal) { struct virtual_engine *ve = to_virtual_engine(rq->engine); intel_engine_mask_t allowed, exec; struct ve_bond *bond; allowed = ~to_request(signal)->engine->mask; bond = virtual_find_bond(ve, to_request(signal)->engine); if (bond) allowed &= bond->sibling_mask; /* Restrict the bonded request to run on only the available engines */ exec = READ_ONCE(rq->execution_mask); while (!try_cmpxchg(&rq->execution_mask, &exec, exec & allowed)) ; /* Prevent the master from being re-run on the bonded engines */ to_request(signal)->execution_mask &= ~allowed; } (The joy of trying to write a test case.) -Chris _______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/intel-gfx