On Fri, Oct 06, 2017 at 11:06:37AM +0200, Daniel Vetter wrote: > I pondered whether we should annotate engine->submit_request as __rcu > and use rcu_assign_pointer and rcu_dereference on it. But the reason > behind those is to make sure the compiler/cpu barriers are there for > when you have an actual data structure you point at, to make sure all > the writes are seen correctly on the read side. But we just have a > function pointer, and .text isn't changed, so no need for these > barriers and hence no need for annotations. synchronize_*() provides an smp_mb() on the calling CPU and ensures an smp_mb() on all other CPUs before completion, such that everybody agrees on the state prior to calling syncrhonize_rcu(). So yes, no additional ordering requirements. > drivers/gpu/drm/i915/i915_gem.c | 31 +++++++++-------------- > drivers/gpu/drm/i915/i915_gem_request.c | 2 ++ > drivers/gpu/drm/i915/selftests/i915_gem_request.c | 2 ++ > 3 files changed, 16 insertions(+), 19 deletions(-) > > diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c > index ab8c6946fea4..e79a6ca60265 100644 > --- a/drivers/gpu/drm/i915/i915_gem.c > +++ b/drivers/gpu/drm/i915/i915_gem.c > @@ -3020,16 +3020,8 @@ static void nop_submit_request(struct drm_i915_gem_request *request) > intel_engine_init_global_seqno(request->engine, request->global_seqno); > } > > +static void engine_complete_requests(struct intel_engine_cs *engine) > { > /* Mark all executing requests as skipped */ > engine->cancel_requests(engine); > > @@ -3041,24 +3033,25 @@ static void engine_set_wedged(struct intel_engine_cs *engine) > intel_engine_last_submit(engine)); > } > > +void i915_gem_set_wedged(struct drm_i915_private *i915) > { > struct intel_engine_cs *engine; > enum intel_engine_id id; > > for_each_engine(engine, i915, id) > + engine->submit_request = nop_submit_request; > > + /* Make sure no one is running the old callback before we proceed with > + * cancelling requests and resetting the completion tracking. Otherwise > + * we might submit a request to the hardware which never completes. > + */ ARGH @ horrid comment style.. http://lkml.iu.edu/hypermail/linux/kernel/1607.1/00627.html :-) > + synchronize_rcu(); > > + for_each_engine(engine, i915, id) > + engine_complete_requests(engine); > > + set_bit(I915_WEDGED, &i915->gpu_error.flags); > + wake_up_all(&i915->gpu_error.reset_queue); > } > > bool i915_gem_unset_wedged(struct drm_i915_private *i915) > diff --git a/drivers/gpu/drm/i915/i915_gem_request.c b/drivers/gpu/drm/i915/i915_gem_request.c > index b100b38f1dd2..ef78a85cb845 100644 > --- a/drivers/gpu/drm/i915/i915_gem_request.c > +++ b/drivers/gpu/drm/i915/i915_gem_request.c > @@ -556,7 +556,9 @@ submit_notify(struct i915_sw_fence *fence, enum i915_sw_fence_notify state) > switch (state) { > case FENCE_COMPLETE: > trace_i915_gem_request_submit(request); > + rcu_read_lock(); > request->engine->submit_request(request); > + rcu_read_unlock(); > break; > > case FENCE_FREE: > diff --git a/drivers/gpu/drm/i915/selftests/i915_gem_request.c b/drivers/gpu/drm/i915/selftests/i915_gem_request.c > index 78b9f811707f..a999161e8db1 100644 > --- a/drivers/gpu/drm/i915/selftests/i915_gem_request.c > +++ b/drivers/gpu/drm/i915/selftests/i915_gem_request.c > @@ -215,7 +215,9 @@ static int igt_request_rewind(void *arg) > } > i915_gem_request_get(vip); > i915_add_request(vip); > + rcu_read_lock(); > request->engine->submit_request(request); > + rcu_read_unlock(); > > mutex_unlock(&i915->drm.struct_mutex); Yes, this is a correct and good replacement, however, you said: > Chris said parts of the reasons for going with stop_machine() was that > it's no overhead for the fast-path. But these callbacks use irqsave > spinlocks and do a bunch of MMIO, and rcu_read_lock is _real_ fast. This means that you could simply do synchronize_sched() without the addition of rcu_read_lock()s and still be fine. _______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/intel-gfx