Mika Kuoppala <mika.kuoppala@xxxxxxxxxxxxxxx> writes: > Daniel Vetter <daniel@xxxxxxxx> writes: > >> On Thu, Aug 14, 2014 at 03:46:43PM +0300, Mika Kuoppala wrote: >>> We lost the software state tracking due to reset, so don't >>> complain if it doesn't match. >> >> This sounds more like gpu reset should be a bit more careful (even more >> careful than we already are compared to earlier kernels) with making sure >> the irq state is still sane after a reset? >> >> Or what exactly is the failure mode here? The commit message lacks a bit >> details in form of a nice text or even better: A testcase ;-) > > We have pm ref during reset. And then after reset, we kick > intel_gt_reset_powersave to re-enable the rps. Countrary to > suspend/thaw, we never disabled the interrupts. And the warn > triggers. > > I tried to disable the interrupts during reset handling but the > nonblocking __wait_seqno() triggered another state warning > it was taking a pm ref during or right after reset recovery for hw > access. > -Mika > Pretty difficult to hit also. I needed multiple tries of ctrl-c the process that submitted the hang and have a another client running in background doing gpu access. Timing issue related that we enable the rps through delayed workqueue? Here is the trace: [ 635.478701] [drm] Simulated gpu hang, resetting stop_rings [ 637.457126] ------------[ cut here ]------------ [ 637.458711] WARNING: CPU: 5 PID: 3595 at drivers/gpu/drm/i915/intel_pm.c:3607 gen6_enable_rps_interrupts+0x72/0x80 [i915]() [ 637.460361] Modules linked in: i915 drm_kms_helper drm kvm_intel kvm snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic snd_hda_intel snd_hda_controller snd_hda_codec snd_hwdep snd_pcm snd_seq_midi snd_rawmidi snd_seq_midi_event snd_seq mxm_wmi snd_timer snd_seq_device psmouse snd serio_raw ehci_pci bnep ehci_hcd rfcomm soundcore bluetooth wmi mac_hid parport_pc ppdev lp parport dm_crypt usbhid firewire_ohci firewire_core crc_itu_t e1000e ptp pps_core xhci_hcd usbcore i2c_algo_bit video usb_common [last unloaded: drm] [ 637.468170] CPU: 5 PID: 3595 Comm: kworker/5:0 Tainted: G W 3.16.0+ #240 [ 637.469545] Workqueue: events intel_gen6_powersave_work [i915] [ 637.471042] 00000000 00000000 ca0d3e54 c15adcca f8898260 ca0d3e84 c1047224 c17536b0 [ 637.472616] 00000005 00000e0b f8898260 00000e17 f87ff852 f87ff852 f6ec8000 f6ecbe68 [ 637.474301] ee851c00 ca0d3e94 c1047262 00000009 00000000 ca0d3ea8 f87ff852 f6ec8000 [ 637.475920] Call Trace: [ 637.477504] [<c15adcca>] dump_stack+0x48/0x60 [ 637.479060] [<c1047224>] warn_slowpath_common+0x84/0xa0 [ 637.480708] [<f87ff852>] ? gen6_enable_rps_interrupts+0x72/0x80 [i915] [ 637.481880] [<f87ff852>] ? gen6_enable_rps_interrupts+0x72/0x80 [i915] [ 637.483220] [<c1047262>] warn_slowpath_null+0x22/0x30 [ 637.484258] [<f87ff852>] gen6_enable_rps_interrupts+0x72/0x80 [i915] [ 637.485503] [<f8808ecd>] intel_gen6_powersave_work+0x57d/0x1020 [i915] [ 637.486516] [<c105e8bc>] process_one_work+0x10c/0x3c0 [ 637.487630] [<c105f523>] worker_thread+0xf3/0x470 [ 637.488618] [<c105f430>] ? create_and_start_worker+0x50/0x50 [ 637.489802] [<c1064cdb>] kthread+0x9b/0xb0 [ 637.490804] [<c15b4e01>] ret_from_kernel_thread+0x21/0x30 [ 637.491872] [<c1064c40>] ? flush_kthread_worker+0xb0/0xb0 [ 637.492862] ---[ end trace b31c16cec8a7abaa ]--- -Mika >> Thanks, Daniel >> >>> >>> v2: fix build error >>> >>> Signed-off-by: Mika Kuoppala <mika.kuoppala@xxxxxxxxx> >>> --- >>> drivers/gpu/drm/i915/intel_pm.c | 6 ++++-- >>> 1 file changed, 4 insertions(+), 2 deletions(-) >>> >>> diff --git a/drivers/gpu/drm/i915/intel_pm.c b/drivers/gpu/drm/i915/intel_pm.c >>> index 12f4e14..7a1309c 100644 >>> --- a/drivers/gpu/drm/i915/intel_pm.c >>> +++ b/drivers/gpu/drm/i915/intel_pm.c >>> @@ -3593,7 +3593,8 @@ static void gen8_enable_rps_interrupts(struct drm_device *dev) >>> struct drm_i915_private *dev_priv = dev->dev_private; >>> >>> spin_lock_irq(&dev_priv->irq_lock); >>> - WARN_ON(dev_priv->rps.pm_iir); >>> + if (!i915_reset_in_progress(&dev_priv->gpu_error)) >>> + WARN_ON(dev_priv->rps.pm_iir); >>> gen8_enable_pm_irq(dev_priv, dev_priv->pm_rps_events); >>> I915_WRITE(GEN8_GT_IIR(2), dev_priv->pm_rps_events); >>> spin_unlock_irq(&dev_priv->irq_lock); >>> @@ -3604,7 +3605,8 @@ static void gen6_enable_rps_interrupts(struct drm_device *dev) >>> struct drm_i915_private *dev_priv = dev->dev_private; >>> >>> spin_lock_irq(&dev_priv->irq_lock); >>> - WARN_ON(dev_priv->rps.pm_iir); >>> + if (!i915_reset_in_progress(&dev_priv->gpu_error)) >>> + WARN_ON(dev_priv->rps.pm_iir); >>> gen6_enable_pm_irq(dev_priv, dev_priv->pm_rps_events); >>> I915_WRITE(GEN6_PMIIR, dev_priv->pm_rps_events); >>> spin_unlock_irq(&dev_priv->irq_lock); >>> -- >>> 1.7.9.5 >>> >>> _______________________________________________ >>> Intel-gfx mailing list >>> Intel-gfx@xxxxxxxxxxxxxxxxxxxxx >>> http://lists.freedesktop.org/mailman/listinfo/intel-gfx >> >> -- >> Daniel Vetter >> Software Engineer, Intel Corporation >> +41 (0) 79 365 57 48 - http://blog.ffwll.ch > _______________________________________________ > Intel-gfx mailing list > Intel-gfx@xxxxxxxxxxxxxxxxxxxxx > http://lists.freedesktop.org/mailman/listinfo/intel-gfx _______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx http://lists.freedesktop.org/mailman/listinfo/intel-gfx