Chris Wilson <chris@xxxxxxxxxxxxxxxxxx> writes: > Quoting Mika Kuoppala (2018-08-06 12:12:15) >> Chris Wilson <chris@xxxxxxxxxxxxxxxxxx> writes: >> >> > By using TYPESAFE_BY_RCU, we accept that requests may be swapped out from >> > underneath us, even when using rcu_read_lock(). We use a strong barrier >> > on acquiring the refcount during lookup, but this needs to be paired >> > with a barrier on re-initialising it. Currently we call dma_fence_init, >> > which ultimately does a plain atomic_set(1) on the refcount, not >> > providing any memory barriers. As we inspect some state before even >> > acquiring the refcount in the lookup (by arguing that we can detect >> > inconsistent requests), that state should be initialised before the >> > refcount. >> > >> > Signed-off-by: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx> >> > --- >> > drivers/gpu/drm/i915/i915_request.c | 7 +++++++ >> > 1 file changed, 7 insertions(+) >> > >> > diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c >> > index 5c2c93cbab12..04a0b8e75533 100644 >> > --- a/drivers/gpu/drm/i915/i915_request.c >> > +++ b/drivers/gpu/drm/i915/i915_request.c >> > @@ -768,6 +768,13 @@ i915_request_alloc(struct intel_engine_cs *engine, struct i915_gem_context *ctx) >> > rq->timeline = ce->ring->timeline; >> > GEM_BUG_ON(rq->timeline == &engine->timeline); >> > >> > + /* >> > + * In order to coordinate with our RCU lookup, >> > + * __i915_gem_active_get_rcu(), we need to ensure that the change >> > + * to rq->engine is visible before acquring the refcount in the lookup. >> > + */ >> > + smp_wmb(); >> > + >> >> There is quite a lot going on here as we try to get a reference >> into a shapeshifting request. >> >> By looking at the code acquiring it, dma_fence_get_rcu >> and dma_fence_init and then the precheck of the request, >> should memory barrier be: >> >> smb_mb_before_atomic()? > > No. The code does have a mb, smb_mb_before_atomic is only for atomics > that don't themselves enforce a mb and so you need a bit of extra > weight. On x86, it's not even a mb, just a compiler barrier. > >> Admittedly that would be uglier as fence_init hides the atomic_set, >> but it is atomic on we are serializing. Especially >> as there is no atomic in callsight. > > Right, the suggestion in the thread was to use atomic_set_release(), but > that requires a lot of deconstruction merely to do the same: it adds > smp_mb() before the atomic_set. > >> Further, as engine and the kref are tightly bound, >> should we initialize everything not related first, then >> do engine init, wmb, fence init in a tight proximity? > > As we do. The existing order is sufficient for our needs. Everything > that needs to be initialised before the kref, is -- though I think it's > overkill as our argument about checking stale state is still correct and > safe. So what this nails down is the stability of a full referenced > request -- which is less worrisome as it will only be exposed to the rcu > onlookers much later, we don't have the same danger of immediate > exposure to rcu walkers. > > What I do think is useful overall is that it gives the companion mb to > the one referenced by __i915_gem_active_get_rcu, and dma_fence_get_rcu > generally. Agreed. I tried to think how to improve the comment pairing, but the rabbit hole is deep in here. Mentioning the refcount should guide the reader into right spots tho. Reviewed-by: Mika Kuoppala <mika.kuoppala@xxxxxxxxxxxxxxx> > -Chris _______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/intel-gfx