On Thu, Oct 05, 2017 at 03:55:19PM +0100, Tvrtko Ursulin wrote: > > On 05/10/2017 15:09, Daniel Vetter wrote: > > stop_machine is not really a locking primitive we should use, except > > when the hw folks tell us the hw is broken and that's the only way to > > work around it. > > > > This patch here is just a suggestion for how to fix it up, possible > > changes needed to make it actually work: > > > > - Set the nop_submit_request first for _all_ engines, before > > proceeding. > > > > - Make sure engine->cancel_requests copes with the possibility that > > not all tests have consistently used the new or old version. I dont > > think this is a problem, since the same can happen really with the > > stop_machine() locking - stop_machine also doesn't give you any kind > > of global ordering against other cpu threads, it just makes them > > stop. > > > > This patch tries to address the locking snafu from > > > > commit 20e4933c478a1ca694b38fa4ac44d99e659941f5 > > Author: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx> > > Date: Tue Nov 22 14:41:21 2016 +0000 > > > > drm/i915: Stop the machine as we install the wedged submit_request handler > > > > Chris said parts of the reasons for going with stop_machine() was that > > it's no overhead for the fast-path. But these callbacks use irqsave > > spinlocks and do a bunch of MMIO, and rcu_read_lock is _real_ fast. > > > > Cc: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx> > > Cc: Mika Kuoppala <mika.kuoppala@xxxxxxxxx> > > Signed-off-by: Daniel Vetter <daniel.vetter@xxxxxxxxx> > > --- > > drivers/gpu/drm/i915/i915_gem.c | 18 +++++------------- > > drivers/gpu/drm/i915/i915_gem_request.c | 2 ++ > > drivers/gpu/drm/i915/selftests/i915_gem_request.c | 2 ++ > > 3 files changed, 9 insertions(+), 13 deletions(-) > > > > diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c > > index ab8c6946fea4..0b260e576b4b 100644 > > --- a/drivers/gpu/drm/i915/i915_gem.c > > +++ b/drivers/gpu/drm/i915/i915_gem.c > > @@ -3022,13 +3022,13 @@ static void nop_submit_request(struct drm_i915_gem_request *request) > > static void engine_set_wedged(struct intel_engine_cs *engine) > > { > > + engine->submit_request = nop_submit_request; > > Should this be rcu_assign_pointer? Those provide additional barriers, needed when you change/allocate the stuff you're pointing to. We point to immutable functions, so shouldn't be necessary (and would be confusing imo). > > + > > /* We need to be sure that no thread is running the old callback as > > * we install the nop handler (otherwise we would submit a request > > - * to hardware that will never complete). In order to prevent this > > - * race, we wait until the machine is idle before making the swap > > - * (using stop_machine()). > > + * to hardware that will never complete). > > */ > > - engine->submit_request = nop_submit_request; > > + synchronize_rcu(); > > Consumers of this are running in irq disabled or softirq. Does this mean we > would need synchronize_rcu_bh? Would either guarantee all tasklets and irq > handlers have exited? Oh ... tbh I didn't even digg that deep (much less ran this stuff). This really is an RFC so people with real clue could say whether it has a chance of working or not. Looking at rcu docs we don't want _bh variants, since rcu_read_lock should be safe in even hardirq context. _bh and _sched otoh require that all critical sections are either in bottom halfs or hardirq context, since they treat scheduling of those as a grace period. Cheers, Daniel > > /* Mark all executing requests as skipped */ > > engine->cancel_requests(engine); > > @@ -3041,9 +3041,8 @@ static void engine_set_wedged(struct intel_engine_cs *engine) > > intel_engine_last_submit(engine)); > > } > > -static int __i915_gem_set_wedged_BKL(void *data) > > +void i915_gem_set_wedged(struct drm_i915_private *i915) > > { > > - struct drm_i915_private *i915 = data; > > struct intel_engine_cs *engine; > > enum intel_engine_id id; > > @@ -3052,13 +3051,6 @@ static int __i915_gem_set_wedged_BKL(void *data) > > set_bit(I915_WEDGED, &i915->gpu_error.flags); > > wake_up_all(&i915->gpu_error.reset_queue); > > - > > - return 0; > > -} > > - > > -void i915_gem_set_wedged(struct drm_i915_private *dev_priv) > > -{ > > - stop_machine(__i915_gem_set_wedged_BKL, dev_priv, NULL); > > } > > bool i915_gem_unset_wedged(struct drm_i915_private *i915) > > diff --git a/drivers/gpu/drm/i915/i915_gem_request.c b/drivers/gpu/drm/i915/i915_gem_request.c > > index b100b38f1dd2..ef78a85cb845 100644 > > --- a/drivers/gpu/drm/i915/i915_gem_request.c > > +++ b/drivers/gpu/drm/i915/i915_gem_request.c > > @@ -556,7 +556,9 @@ submit_notify(struct i915_sw_fence *fence, enum i915_sw_fence_notify state) > > switch (state) { > > case FENCE_COMPLETE: > > trace_i915_gem_request_submit(request); > > + rcu_read_lock(); > > request->engine->submit_request(request); > > + rcu_read_unlock(); > > And _bh for these? Although this already runs with preemption off, but I > guess it is good for documentation. > > Regards, > > Tvrtko > > > break; > > case FENCE_FREE: > > diff --git a/drivers/gpu/drm/i915/selftests/i915_gem_request.c b/drivers/gpu/drm/i915/selftests/i915_gem_request.c > > index 78b9f811707f..a999161e8db1 100644 > > --- a/drivers/gpu/drm/i915/selftests/i915_gem_request.c > > +++ b/drivers/gpu/drm/i915/selftests/i915_gem_request.c > > @@ -215,7 +215,9 @@ static int igt_request_rewind(void *arg) > > } > > i915_gem_request_get(vip); > > i915_add_request(vip); > > + rcu_read_lock(); > > request->engine->submit_request(request); > > + rcu_read_unlock(); > > mutex_unlock(&i915->drm.struct_mutex); > > -- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch _______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/intel-gfx