On Wed, Oct 12, 2016 at 12:16:33PM +0100, Chris Wilson wrote: > During rpm resume we restore the fences, but we do not have the > protection of struct_mutex. This rules out updating the activity > tracking on the fences, and requires us to rely on the rpm as the > serialisation barrier instead. > > [ 350.298052] [drm:intel_runtime_resume [i915]] Resuming device > [ 350.308606] > [ 350.310520] =============================== > [ 350.315560] [ INFO: suspicious RCU usage. ] > [ 350.320554] 4.8.0-rc8-bsw-rapl+ #3133 Tainted: G U W > [ 350.327208] ------------------------------- > [ 350.331977] ../drivers/gpu/drm/i915/i915_gem_request.h:371 suspicious rcu_dereference_protected() usage! > [ 350.342619] > [ 350.342619] other info that might help us debug this: > [ 350.342619] > [ 350.351593] > [ 350.351593] rcu_scheduler_active = 1, debug_locks = 0 > [ 350.358952] 3 locks held by Xorg/320: > [ 350.363077] #0: (&dev->mode_config.mutex){+.+.+.}, at: [<ffffffffa030589c>] drm_modeset_lock_all+0x3c/0xd0 [drm] > [ 350.375162] #1: (crtc_ww_class_acquire){+.+.+.}, at: [<ffffffffa03058a6>] drm_modeset_lock_all+0x46/0xd0 [drm] > [ 350.387022] #2: (crtc_ww_class_mutex){+.+.+.}, at: [<ffffffffa0305056>] drm_modeset_lock+0x36/0x110 [drm] > [ 350.398236] > [ 350.398236] stack backtrace: > [ 350.403196] CPU: 1 PID: 320 Comm: Xorg Tainted: G U W 4.8.0-rc8-bsw-rapl+ #3133 > [ 350.412457] Hardware name: Intel Corporation CHERRYVIEW C0 PLATFORM/Braswell CRB, BIOS BRAS.X64.X088.R00.1510270350 10/27/2015 > [ 350.425212] 0000000000000000 ffff8801680a78c8 ffffffff81332187 ffff88016c5c5000 > [ 350.433611] 0000000000000001 ffff8801680a78f8 ffffffff810ca6da ffff88016cc8b0f0 > [ 350.442012] ffff88016cc80000 ffff88016cc80000 ffff880177ad0000 ffff8801680a7948 > [ 350.450409] Call Trace: > [ 350.453165] [<ffffffff81332187>] dump_stack+0x67/0x90 > [ 350.458931] [<ffffffff810ca6da>] lockdep_rcu_suspicious+0xea/0x120 > [ 350.466002] [<ffffffffa039e8dd>] fence_update+0xbd/0x670 [i915] > [ 350.472766] [<ffffffffa039efe2>] i915_gem_restore_fences+0x52/0x70 [i915] > [ 350.480496] [<ffffffffa0368f42>] vlv_resume_prepare+0x72/0x570 [i915] > [ 350.487839] [<ffffffffa0369802>] intel_runtime_resume+0x102/0x210 [i915] > [ 350.495442] [<ffffffff8137f26f>] pci_pm_runtime_resume+0x7f/0xb0 > [ 350.502274] [<ffffffff8137f1f0>] ? pci_restore_standard_config+0x40/0x40 > [ 350.509883] [<ffffffff814401c5>] __rpm_callback+0x35/0x70 > [ 350.516037] [<ffffffff8137f1f0>] ? pci_restore_standard_config+0x40/0x40 > [ 350.523646] [<ffffffff81440224>] rpm_callback+0x24/0x80 > [ 350.529604] [<ffffffff8137f1f0>] ? pci_restore_standard_config+0x40/0x40 > [ 350.537212] [<ffffffff814417bd>] rpm_resume+0x4ad/0x740 > [ 350.543161] [<ffffffff81441aa1>] __pm_runtime_resume+0x51/0x80 > [ 350.549824] [<ffffffffa03889c8>] intel_runtime_pm_get+0x28/0x90 [i915] > [ 350.557265] [<ffffffffa0388a53>] intel_display_power_get+0x23/0x50 [i915] > [ 350.565001] [<ffffffffa03ef23d>] intel_atomic_commit_tail+0xdfd/0x10b0 [i915] > [ 350.573106] [<ffffffffa034b2e9>] ? drm_atomic_helper_swap_state+0x159/0x300 [drm_kms_helper] > [ 350.582659] [<ffffffff81615091>] ? _raw_spin_unlock+0x31/0x50 > [ 350.589205] [<ffffffffa034b2e9>] ? drm_atomic_helper_swap_state+0x159/0x300 [drm_kms_helper] > [ 350.598787] [<ffffffffa03ef8a5>] intel_atomic_commit+0x3b5/0x500 [i915] > [ 350.606319] [<ffffffffa03061dc>] ? drm_atomic_set_crtc_for_connector+0xcc/0x100 [drm] > [ 350.615209] [<ffffffffa0306b49>] drm_atomic_commit+0x49/0x50 [drm] > [ 350.622242] [<ffffffffa034dee8>] drm_atomic_helper_set_config+0x88/0xc0 [drm_kms_helper] > [ 350.631419] [<ffffffffa02f94ac>] drm_mode_set_config_internal+0x6c/0x120 [drm] > [ 350.639623] [<ffffffffa02fa94c>] drm_mode_setcrtc+0x22c/0x4d0 [drm] > [ 350.646760] [<ffffffffa02f0f19>] drm_ioctl+0x209/0x460 [drm] > [ 350.653217] [<ffffffffa02fa720>] ? drm_mode_getcrtc+0x150/0x150 [drm] > [ 350.660536] [<ffffffff810c984a>] ? __lock_is_held+0x4a/0x70 > [ 350.666885] [<ffffffff81202303>] do_vfs_ioctl+0x93/0x6b0 > [ 350.672939] [<ffffffff8120f843>] ? __fget+0x113/0x200 > [ 350.678797] [<ffffffff8120f735>] ? __fget+0x5/0x200 > [ 350.684361] [<ffffffff81202964>] SyS_ioctl+0x44/0x80 > [ 350.690030] [<ffffffff81001deb>] do_syscall_64+0x5b/0x120 > [ 350.696184] [<ffffffff81615ada>] entry_SYSCALL64_slow_path+0x25/0x25 > > Note we also have to remember the lesson from commit 4fc788f5ee3d > ("drm/i915: Flush delayed fence releases after reset") where we have to > flush any changes to the fence on restore. > > v2: Replace call to release user mmaps with an assertion that they have > already been zapped. > > Fixes: 49ef5294cda2 ("drm/i915: Move fence tracking from object to vma") > Reported-by: Ville Syrjälä <ville.syrjala@xxxxxxxxxxxxxxx> > Signed-off-by: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx> > Cc: Ville Syrjälä <ville.syrjala@xxxxxxxxxxxxxxx> > Cc: Joonas Lahtinen <joonas.lahtinen@xxxxxxxxxxxxxxx> > Cc: Mika Kuoppala <mika.kuoppala@xxxxxxxxx> > --- > drivers/gpu/drm/i915/i915_gem_fence.c | 21 +++++++++++++++++++-- > 1 file changed, 19 insertions(+), 2 deletions(-) > > diff --git a/drivers/gpu/drm/i915/i915_gem_fence.c b/drivers/gpu/drm/i915/i915_gem_fence.c > index 8df1fa7234e8..f7081f4b5d22 100644 > --- a/drivers/gpu/drm/i915/i915_gem_fence.c > +++ b/drivers/gpu/drm/i915/i915_gem_fence.c > @@ -290,6 +290,8 @@ i915_vma_put_fence(struct i915_vma *vma) > { > struct drm_i915_fence_reg *fence = vma->fence; > > + assert_rpm_wakelock_held(to_i915(vma->vm->dev)); > + > if (!fence) > return 0; > > @@ -341,6 +343,8 @@ i915_vma_get_fence(struct i915_vma *vma) > struct drm_i915_fence_reg *fence; > struct i915_vma *set = i915_gem_object_is_tiled(vma->obj) ? vma : NULL; > > + assert_rpm_wakelock_held(to_i915(vma->vm->dev)); > + > /* Just update our place in the LRU if our fence is getting reused. */ > if (vma->fence) { > fence = vma->fence; > @@ -371,6 +375,12 @@ void i915_gem_restore_fences(struct drm_device *dev) > struct drm_i915_private *dev_priv = to_i915(dev); > int i; > > + /* Note that this may be called outside of struct_mutex, by > + * runtime suspend/resume. The barrier we require is enforced by > + * rpm itself - all access to fences/GTT are only within an rpm > + * wakeref, and to acquire that wakeref you must pass through here. > + */ > + > for (i = 0; i < dev_priv->num_fence_regs; i++) { > struct drm_i915_fence_reg *reg = &dev_priv->fence_regs[i]; > struct i915_vma *vma = reg->vma; > @@ -379,10 +389,17 @@ void i915_gem_restore_fences(struct drm_device *dev) > * Commit delayed tiling changes if we have an object still > * attached to the fence, otherwise just clear the fence. > */ > - if (vma && !i915_gem_object_is_tiled(vma->obj)) > + if (vma && !i915_gem_object_is_tiled(vma->obj)) { > + GEM_BUG_ON(!reg->dirty); > + GEM_BUG_ON(!list_empty(&vma->obj->userfault_link)); > + > + list_move(®->link, &dev_priv->mm.fence_list); > + vma->fence = NULL; > vma = NULL; > + } > > - fence_update(reg, vma); > + fence_write(reg, vma); > + reg->vma = vma; Same comments as with the userfault_list: Using rpm ordering to enforce consistency causes mild panic attacks here with me ;-) Is the above (delayed tiling change commit) even possible here, at least for rpm resume? Same for system s/r (both s3 and s4) since the pagetables won't survive anyway. Can't we simply make this an impossibility? -Daniel > } > } > > -- > 2.9.3 > > _______________________________________________ > Intel-gfx mailing list > Intel-gfx@xxxxxxxxxxxxxxxxxxxxx > https://lists.freedesktop.org/mailman/listinfo/intel-gfx -- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch _______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/intel-gfx