Re: [PATCH] drm/i915: Remove WARN_ONCE for failing to pm_runtime_if_in_use

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Quoting Imre Deak (2018-02-19 13:23:47)
> On Mon, Feb 19, 2018 at 12:50:46PM +0000, Chris Wilson wrote:
> > As the driver is called to handle circumstances beyond it's control, we
> > cannot assume that the pm_runtime core is happy to see us. For example,
> > if we are called from shrink_slab to free up our pages during suspend,
> > rpm may be disabled and pm_runtime_if_in_use() decides to fail with
> > -EINVAL rather than simply say no. This is expected to happen, so don't
> > warn.  For example,
> > 
> > [  217.429228] Suspending console(s) (use no_console_suspend to debug)
> > [  217.557179] sd 0:0:0:0: [sda] Synchronizing SCSI cache
> > [  217.559399] sd 0:0:0:0: [sda] Stopping disk
> > [  218.661567] i915 0000:00:02.0: Resetting chip after gpu hang
> > [  219.523879] ------------[ cut here ]------------
> > [  219.524474] pm_runtime_get_if_in_use() failed: -22
> > [  219.524817] WARNING: CPU: 1 PID: 14 at drivers/gpu/drm/i915/intel_runtime_pm.c:3351 intel_runtime_pm_get_if_in_use+0xe3/0x150 [i915]
> > [  219.524836] Modules linked in: vgem i915 snd_hda_codec_realtek snd_hda_codec_generic coretemp snd_hda_intel snd_hda_codec r8169 lpc_ich snd_hwdep mii snd_hda_core snd_pcm prime_numbers
> > [  219.525054] CPU: 1 PID: 14 Comm: cpuhp/1 Tainted: G     U           4.16.0-rc1-g740f57c54ecf-kasan_6+ #1
> > [  219.525070] Hardware name:  /D510MO, BIOS MOPNV10J.86A.0311.2010.0802.2346 08/02/2010
> > [  219.525294] RIP: 0010:intel_runtime_pm_get_if_in_use+0xe3/0x150 [i915]
> > [  219.525313] RSP: 0018:ffff880018f5edf8 EFLAGS: 00010286
> > [  219.525344] RAX: dffffc0000000008 RBX: ffff880007fc0000 RCX: 0000000000000000
> > [  219.525361] RDX: 0000000000000001 RSI: ffffffff850609c0 RDI: ffffffff872992a0
> > [  219.525377] RBP: 0000000000000000 R08: 0000000000000001 R09: 0000000000000000
> > [  219.525394] R10: 0000000000000000 R11: 0000000000000000 R12: ffff880007fc0000
> > [  219.525411] R13: ffff880018f5f0f8 R14: ffff880007fc8de8 R15: ffff880018f5f0f0
> > [  219.525429] FS:  0000000000000000(0000) GS:ffff880019c80000(0000) knlGS:0000000000000000
> > [  219.525446] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [  219.525463] CR2: 0000564df7897e86 CR3: 0000000000d7c000 CR4: 00000000000006e0
> > [  219.525478] Call Trace:
> > [  219.525734]  i915_gem_shrink+0x841/0xb50 [i915]
> > [  219.525802]  ? debug_check_no_locks_freed+0x2a0/0x2a0
> > [  219.525842]  ? trace_hardirqs_on_thunk+0x1a/0x1c
> > [  219.526083]  ? i915_gem_shrinker_count+0x2f0/0x2f0 [i915]
> > [  219.526131]  ? lock_acquire+0x138/0x3c0
> > [  219.526157]  ? lock_acquire+0x138/0x3c0
> > [  219.526391]  ? shrinker_lock+0x49/0x210 [i915]
> > [  219.526465]  ? mutex_trylock+0x15c/0x1a0
> > [  219.526694]  ? shrinker_lock+0x49/0x210 [i915]
> > [  219.526969]  ? i915_gem_shrinker_scan+0xc4/0x320 [i915]
> > [  219.527200]  i915_gem_shrinker_scan+0xc4/0x320 [i915]
> > [  219.527448]  ? i915_gem_shrinker_vmap+0x3a0/0x3a0 [i915]
> > [  219.527533]  shrink_slab.part.18+0x2d0/0x8d0
> > [  219.527613]  ? unregister_shrinker+0x1f0/0x1f0
> > [  219.527668]  ? mem_cgroup_iter+0x37d/0xc50
> > [  219.527728]  shrink_node+0x882/0xbe0
> > [  219.527847]  ? shrink_node_memcg+0x11c0/0x11c0
> > [  219.527882]  ? mark_held_locks+0xa8/0xf0
> > [  219.527931]  ? trace_hardirqs_on_caller+0x33f/0x590
> > [  219.527961]  ? ktime_get+0xad/0x140
> > [  219.528015]  do_try_to_free_pages+0x2d3/0xd70
> > [  219.528099]  ? allow_direct_reclaim.part.23+0x1d0/0x1d0
> > [  219.528132]  ? shrink_node+0xbe0/0xbe0
> > [  219.528213]  try_to_free_pages+0x1cd/0x570
> > [  219.528257]  ? do_try_to_free_pages+0xd70/0xd70
> > [  219.528355]  __alloc_pages_nodemask+0xadf/0x2110
> > [  219.528423]  ? unwind_next_frame+0x870/0x1970
> > [  219.528465]  ? deref_stack_reg+0x97/0xc0
> > [  219.528503]  ? gfp_pfmemalloc_allowed+0x150/0x150
> > [  219.528539]  ? register_sched_domain_sysctl+0x23a/0x1b90
> > [  219.528588]  ? unwind_next_frame+0x138/0x1970
> > [  219.528619]  ? kthread+0x30a/0x3d0
> > [  219.528677]  ? __read_once_size_nocheck.constprop.4+0x10/0x10
> > [  219.528698]  ? deref_stack_reg+0xc0/0xc0
> > [  219.528762]  ? __save_stack_trace+0x6e/0xd0
> > [  219.528822]  depot_save_stack+0x3bc/0x430
> > [  219.528870]  kasan_kmalloc+0x142/0x170
> > [  219.528912]  ? __kmalloc+0xf7/0x340
> > [  219.528935]  ? register_sched_domain_sysctl+0x23a/0x1b90
> > [  219.528957]  ? partition_sched_domains+0x4d4/0x840
> > [  219.528978]  ? sched_cpu_deactivate+0x11b/0x150
> > [  219.529001]  ? cpuhp_invoke_callback+0x160/0x15f0
> > [  219.529023]  ? cpuhp_thread_fun+0x35e/0x710
> > [  219.529044]  ? smpboot_thread_fn+0x50a/0x7f0
> > [  219.529065]  ? kthread+0x30a/0x3d0
> > [  219.529086]  ? ret_from_fork+0x24/0x50
> > [  219.529141]  ? register_sched_domain_sysctl+0x23a/0x1b90
> > [  219.529169]  ? register_sched_domain_sysctl+0x23a/0x1b90
> > [  219.529198]  ? set_track+0x87/0x100
> > [  219.529225]  ? init_object+0x6e/0x80
> > [  219.529275]  ? ___slab_alloc.constprop.36+0x232/0x3e0
> > [  219.529303]  ? ___slab_alloc.constprop.36+0x232/0x3e0
> > [  219.529325]  ? register_sched_domain_sysctl+0x23a/0x1b90
> > [  219.529410]  ? mark_held_locks+0xa8/0xf0
> > [  219.529453]  ? register_sched_domain_sysctl+0x23a/0x1b90
> > [  219.529479]  ? trace_hardirqs_on_caller+0x33f/0x590
> > [  219.529532]  __kmalloc+0xf7/0x340
> > [  219.529557]  ? register_sched_domain_sysctl+0x23a/0x1b90
> > [  219.529604]  register_sched_domain_sysctl+0x23a/0x1b90
> > [  219.529684]  ? sched_debug_show+0x20/0x20
> > [  219.529713]  ? debug_object_activate+0x530/0x530
> > [  219.529771]  ? rcu_lockdep_current_cpu_online+0xdc/0x130
> > [  219.529802]  ? partition_sched_domains+0x4ae/0x840
> > [  219.529825]  ? rcu_read_lock_sched_held+0x10f/0x130
> > [  219.529875]  partition_sched_domains+0x4d4/0x840
> > [  219.529955]  ? sched_init_domains+0x110/0x110
> > [  219.529981]  ? __wait_rcu_gp+0x24f/0x390
> > [  219.530054]  sched_cpu_deactivate+0x11b/0x150
> > [  219.530086]  ? sched_cpu_activate+0x1e0/0x1e0
> > [  219.530112]  ? __call_rcu.constprop.53+0x680/0x680
> > [  219.530132]  ? call_rcu_bh+0x10/0x10
> > [  219.530166]  ? debug_check_no_locks_freed+0x2a0/0x2a0
> > [  219.530201]  ? trace_raw_output_rcu_utilization+0xa0/0xa0
> > [  219.530267]  ? trace_raw_output_rcu_utilization+0xa0/0xa0
> > [  219.530337]  ? rcu_lockdep_current_cpu_online+0xdc/0x130
> > [  219.530370]  ? sched_cpu_activate+0x1e0/0x1e0
> > [  219.530397]  cpuhp_invoke_callback+0x160/0x15f0
> > [  219.530424]  ? lock_acquire+0x138/0x3c0
> > [  219.530445]  ? lock_acquire+0x138/0x3c0
> > [  219.530471]  ? cpuhp_thread_fun+0xaf/0x710
> > [  219.530507]  ? pci_mmcfg_check_reserved+0x100/0x100
> > [  219.530565]  cpuhp_thread_fun+0x35e/0x710
> > [  219.530618]  ? cpuhp_complete_idle_dead+0x10/0x10
> > [  219.530639]  smpboot_thread_fn+0x50a/0x7f0
> > [  219.530678]  ? sort_range+0x20/0x20
> > [  219.530709]  ? __kthread_parkme+0xba/0x1f0
> > [  219.530739]  ? schedule+0x84/0x1a0
> > [  219.530768]  ? __kthread_parkme+0xbf/0x1f0
> > [  219.530805]  ? sort_range+0x20/0x20
> > [  219.530831]  kthread+0x30a/0x3d0
> > [  219.530859]  ? _kthread_create_on_node+0xb0/0xb0
> > [  219.530900]  ret_from_fork+0x24/0x50
> > [  219.530999] Code: 01 00 00 00 85 c0 74 4a 89 e8 5b 5d c3 80 3d 48 37 4e 00 00 75 f2 89 c6 48 c7 c7 40 f0 61 c0 c6 05 36 37 4e 00 01 e8 ed 2a e1 c2 <0f> ff eb d9 80 3d 3f 37 4e 00 00 75 94 48 c7 c7 60 e8 61 c0 c6
> > [  219.531880] ---[ end trace 18ec0139488ea0c8 ]---
> > [  219.607967] IRQ 16: no longer affine to CPU1
> > [  219.670291] IRQ 24: no longer affine to CPU2
> > [  219.701489] IRQ 8: no longer affine to CPU3
> > [  219.701529] IRQ 9: no longer affine to CPU3
> > [  219.701582] IRQ 18: no longer affine to CPU3
> > [  219.701640] IRQ 25: no longer affine to CPU3
> > [  219.743857]  cache: parent cpu1 should not be sleeping
> > [  219.784549]  cache: parent cpu2 should not be sleeping
> > [  219.816041]  cache: parent cpu3 should not be sleeping
> > 
> > Signed-off-by: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx>
> > Cc: Imre Deak <imre.deak@xxxxxxxxx>
> > ---
> >  drivers/gpu/drm/i915/intel_runtime_pm.c | 10 +++-------
> >  1 file changed, 3 insertions(+), 7 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/intel_runtime_pm.c b/drivers/gpu/drm/i915/intel_runtime_pm.c
> > index 16790f2576ec..73ae17065e7c 100644
> > --- a/drivers/gpu/drm/i915/intel_runtime_pm.c
> > +++ b/drivers/gpu/drm/i915/intel_runtime_pm.c
> > @@ -3335,11 +3335,9 @@ void intel_runtime_pm_get(struct drm_i915_private *dev_priv)
> >   */
> >  bool intel_runtime_pm_get_if_in_use(struct drm_i915_private *dev_priv)
> >  {
> > -     struct pci_dev *pdev = dev_priv->drm.pdev;
> > -     struct device *kdev = &pdev->dev;
> > -
> >       if (IS_ENABLED(CONFIG_PM)) {
> > -             int ret = pm_runtime_get_if_in_use(kdev);
> > +             struct pci_dev *pdev = dev_priv->drm.pdev;
> > +             struct device *kdev = &pdev->dev;
> >  
> >               /*
> >                * In cases runtime PM is disabled by the RPM core and we get
> > @@ -3347,9 +3345,7 @@ bool intel_runtime_pm_get_if_in_use(struct drm_i915_private *dev_priv)
> >                * function, since the power state is undefined. This applies
> >                * atm to the late/early system suspend/resume handlers.
> >                */
> > -             WARN_ONCE(ret < 0,
> > -                       "pm_runtime_get_if_in_use() failed: %d\n", ret);
> > -             if (ret <= 0)
> > +             if (pm_runtime_get_if_in_use(kdev) <= 0)
> >                       return false;
> 
> Ok, this case means the same then for the caller: don't access the HW.
> The above comment needs to be updated too, with that:

I felt the code comment hinting where the ret < 0 was coming from just
worked, so I took your suggestion to add:

@@ -3328,10 +3328,13 @@ void intel_runtime_pm_get(struct drm_i915_private *dev_priv)
  * @dev_priv: i915 device instance
  *
  * This function grabs a device-level runtime pm reference if the device is
- * already in use and ensures that it is powered up.
+ * already in use and ensures that it is powered up. It is illegal to try
+ * and access the HW should intel_runtime_pm_get_if_in_use() report failure.
  *
  * Any runtime pm reference obtained by this function must have a symmetric
  * call to intel_runtime_pm_put() to release the reference again.
+ *
+ * Returns: True if the wakeref was acquired, or False otherwise.
  */

to the kerneldoc instead.
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/intel-gfx




[Index of Archives]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux