Re: [PATCH] drm/i915: Annotate timeline lock nesting

Chris Wilson <chris@xxxxxxxxxxxxxxxxxx> · Tue, 08 May 2018 20:49:23 +0100



Quoting Tvrtko Ursulin (2018-05-08 18:47:42)
> 
> On 08/05/2018 16:35, Chris Wilson wrote:
> > CI noticed
> > 
> > <4>[   23.430701] ============================================
> > <4>[   23.430706] WARNING: possible recursive locking detected
> > <4>[   23.430713] 4.17.0-rc4-CI-CI_DRM_4156+ #1 Not tainted
> > <4>[   23.430720] --------------------------------------------
> > <4>[   23.430725] systemd-udevd/169 is trying to acquire lock:
> > <4>[   23.430732]         (ptrval) (&(&timeline->lock)->rlock){....}, at: move_to_timeline+0x48/0x12c [i915]
> > <4>[   23.430888]
> >                    but task is already holding lock:
> > <4>[   23.430894]         (ptrval) (&(&timeline->lock)->rlock){....}, at: i915_request_submit+0x1a/0x40 [i915]
> > <4>[   23.430995]
> >                    other info that might help us debug this:
> > <4>[   23.431002]  Possible unsafe locking scenario:
> > 
> > <4>[   23.431007]        CPU0
> > <4>[   23.431010]        ----
> > <4>[   23.431013]   lock(&(&timeline->lock)->rlock);
> > <4>[   23.431021]   lock(&(&timeline->lock)->rlock);
> > <4>[   23.431028]
> >                     *** DEADLOCK ***
> > 
> > <4>[   23.431036]  May be due to missing lock nesting notation
> > 
> > <4>[   23.431044] 5 locks held by systemd-udevd/169:
> > <4>[   23.431049]  #0:         (ptrval) (&dev->mutex){....}, at: __driver_attach+0x42/0xe0
> > <4>[   23.431065]  #1:         (ptrval) (&dev->mutex){....}, at: __driver_attach+0x50/0xe0
> > <4>[   23.431078]  #2:         (ptrval) (&dev->struct_mutex){+.+.}, at: i915_gem_init+0xca/0x630 [i915]
> > <4>[   23.431174]  #3:         (ptrval) (rcu_read_lock){....}, at: submit_notify+0x35/0x124 [i915]
> > <4>[   23.431271]  #4:         (ptrval) (&(&timeline->lock)->rlock){....}, at: i915_request_submit+0x1a/0x40 [i915]
> > <4>[   23.431369]
> >                    stack backtrace:
> > <4>[   23.431377] CPU: 0 PID: 169 Comm: systemd-udevd Not tainted 4.17.0-rc4-CI-CI_DRM_4156+ #1
> > <4>[   23.431385] Hardware name: Dell Inc.                 OptiPlex GX280               /0G8310, BIOS A04 02/09/2005
> > <4>[   23.431394] Call Trace:
> > <4>[   23.431403]  dump_stack+0x67/0x9b
> > <4>[   23.431411]  __lock_acquire+0xc67/0x1b50
> > <4>[   23.431421]  ? ring_buffer_lock_reserve+0x154/0x3f0
> > <4>[   23.431429]  ? lock_acquire+0xa6/0x210
> > <4>[   23.431435]  lock_acquire+0xa6/0x210
> > <4>[   23.431530]  ? move_to_timeline+0x48/0x12c [i915]
> > <4>[   23.431540]  _raw_spin_lock+0x2a/0x40
> > <4>[   23.431634]  ? move_to_timeline+0x48/0x12c [i915]
> > <4>[   23.431730]  move_to_timeline+0x48/0x12c [i915]
> > <4>[   23.431826]  __i915_request_submit+0xfa/0x280 [i915]
> > <4>[   23.431923]  i915_request_submit+0x25/0x40 [i915]
> > <4>[   23.432024]  i9xx_submit_request+0x11/0x140 [i915]
> > <4>[   23.432120]  submit_notify+0x8d/0x124 [i915]
> > <4>[   23.432202]  __i915_sw_fence_complete+0x81/0x250 [i915]
> > <4>[   23.432300]  __i915_request_add+0x31c/0x7c0 [i915]
> > <4>[   23.432395]  i915_gem_init+0x621/0x630 [i915]
> > <4>[   23.432476]  i915_driver_load+0xbee/0x10b0 [i915]
> > <4>[   23.432485]  ? trace_hardirqs_on_caller+0xe0/0x1b0
> > <4>[   23.432566]  i915_pci_probe+0x29/0x90 [i915]
> > <4>[   23.432574]  pci_device_probe+0xa1/0x130
> > <4>[   23.432582]  driver_probe_device+0x306/0x480
> > <4>[   23.432589]  __driver_attach+0xb7/0xe0
> > <4>[   23.432596]  ? driver_probe_device+0x480/0x480
> > <4>[   23.432602]  ? driver_probe_device+0x480/0x480
> > <4>[   23.432609]  bus_for_each_dev+0x74/0xc0
> > <4>[   23.432616]  bus_add_driver+0x15f/0x250
> > <4>[   23.432623]  ? 0xffffffffa02d7000
> > <4>[   23.432629]  driver_register+0x52/0xc0
> > <4>[   23.432635]  ? 0xffffffffa02d7000
> > <4>[   23.432642]  do_one_initcall+0x58/0x370
> > <4>[   23.432653]  ? do_init_module+0x1d/0x1ea
> > <4>[   23.432660]  ? rcu_read_lock_sched_held+0x6f/0x80
> > <4>[   23.432667]  ? kmem_cache_alloc_trace+0x282/0x2e0
> > <4>[   23.432675]  do_init_module+0x56/0x1ea
> > <4>[   23.432682]  load_module+0x2435/0x2b20
> > <4>[   23.432694]  ? __se_sys_finit_module+0xd3/0xf0
> > <4>[   23.432701]  __se_sys_finit_module+0xd3/0xf0
> > <4>[   23.432710]  do_syscall_64+0x55/0x190
> > <4>[   23.432717]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
> > <4>[   23.432724] RIP: 0033:0x7fa780782839
> > <4>[   23.432729] RSP: 002b:00007ffcea73e668 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
> > <4>[   23.432738] RAX: ffffffffffffffda RBX: 0000561a472a4b30 RCX: 00007fa780782839
> > <4>[   23.432745] RDX: 0000000000000000 RSI: 00007fa7804610e5 RDI: 000000000000000e
> > <4>[   23.432752] RBP: 00007fa7804610e5 R08: 0000000000000000 R09: 00007ffcea73e780
> > <4>[   23.432758] R10: 000000000000000e R11: 0000000000000246 R12: 0000000000000000
> > <4>[   23.432765] R13: 0000561a47296450 R14: 0000000000020000 R15: 0000561a472a4b30
> > 
> > but did not report it as an issue as it only occurred during the first
> > module on boot. This is due to the removal of the distinct global
> > timeline, and its separate lock class. So instead mark up the expected
> > nesting. An alternative would be to define a separate lock class for the
> > engine, but since we only expect to have a single point of nesting, we
> > can avoid having multiple lock classes for the struct.
> > 
> > Fixes: a89d1f921c15 ("drm/i915: Split i915_gem_timeline into individual timelines")
> > Signed-off-by: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx>
> > Cc: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx>
> > Cc: Tvrtko Ursulin <tvrtko.ursulin@xxxxxxxxx>
> > Cc: Joonas Lahtinen <joonas.lahtinen@xxxxxxxxxxxxxxx>
> > ---
> >   drivers/gpu/drm/i915/i915_request.c | 2 +-
> >   1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
> > index f336942229cf..8928894dd9c7 100644
> > --- a/drivers/gpu/drm/i915/i915_request.c
> > +++ b/drivers/gpu/drm/i915/i915_request.c
> > @@ -502,7 +502,7 @@ static void move_to_timeline(struct i915_request *request,
> >       GEM_BUG_ON(request->timeline == &request->engine->timeline);
> >       lockdep_assert_held(&request->engine->timeline.lock);
> >   
> > -     spin_lock(&request->timeline->lock);
> > +     spin_lock_nested(&request->timeline->lock, SINGLE_DEPTH_NESTING);
> >       list_move_tail(&request->link, &timeline->requests);
> >       spin_unlock(&request->timeline->lock);
> >   }
> > 
> 
> Looks like I haven't been paying enough attention!

Nor me, and I've definitely run it through lockdep myself, at least I
think I have! (though admittedly less often than simple kasan runs as I
tend to be more concerned about accessing requests after they are freed).
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/intel-gfx