Quoting Mika Kuoppala (2018-08-06 12:12:15) > Chris Wilson <chris@xxxxxxxxxxxxxxxxxx> writes: > > > By using TYPESAFE_BY_RCU, we accept that requests may be swapped out from > > underneath us, even when using rcu_read_lock(). We use a strong barrier > > on acquiring the refcount during lookup, but this needs to be paired > > with a barrier on re-initialising it. Currently we call dma_fence_init, > > which ultimately does a plain atomic_set(1) on the refcount, not > > providing any memory barriers. As we inspect some state before even > > acquiring the refcount in the lookup (by arguing that we can detect > > inconsistent requests), that state should be initialised before the > > refcount. > > > > Signed-off-by: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx> > > --- > > drivers/gpu/drm/i915/i915_request.c | 7 +++++++ > > 1 file changed, 7 insertions(+) > > > > diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c > > index 5c2c93cbab12..04a0b8e75533 100644 > > --- a/drivers/gpu/drm/i915/i915_request.c > > +++ b/drivers/gpu/drm/i915/i915_request.c > > @@ -768,6 +768,13 @@ i915_request_alloc(struct intel_engine_cs *engine, struct i915_gem_context *ctx) > > rq->timeline = ce->ring->timeline; > > GEM_BUG_ON(rq->timeline == &engine->timeline); > > > > + /* > > + * In order to coordinate with our RCU lookup, > > + * __i915_gem_active_get_rcu(), we need to ensure that the change > > + * to rq->engine is visible before acquring the refcount in the lookup. > > + */ > > + smp_wmb(); > > + > > There is quite a lot going on here as we try to get a reference > into a shapeshifting request. > > By looking at the code acquiring it, dma_fence_get_rcu > and dma_fence_init and then the precheck of the request, > should memory barrier be: > > smb_mb_before_atomic()? No. The code does have a mb, smb_mb_before_atomic is only for atomics that don't themselves enforce a mb and so you need a bit of extra weight. On x86, it's not even a mb, just a compiler barrier. > Admittedly that would be uglier as fence_init hides the atomic_set, > but it is atomic on we are serializing. Especially > as there is no atomic in callsight. Right, the suggestion in the thread was to use atomic_set_release(), but that requires a lot of deconstruction merely to do the same: it adds smp_mb() before the atomic_set. > Further, as engine and the kref are tightly bound, > should we initialize everything not related first, then > do engine init, wmb, fence init in a tight proximity? As we do. The existing order is sufficient for our needs. Everything that needs to be initialised before the kref, is -- though I think it's overkill as our argument about checking stale state is still correct and safe. So what this nails down is the stability of a full referenced request -- which is less worrisome as it will only be exposed to the rcu onlookers much later, we don't have the same danger of immediate exposure to rcu walkers. What I do think is useful overall is that it gives the companion mb to the one referenced by __i915_gem_active_get_rcu, and dma_fence_get_rcu generally. -Chris _______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/intel-gfx