Re: i915 irq storm mitigation in 3.10

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Egbert, Daniel, others,

Am 22.07.2013 10:04, schrieb Egbert Eich:
Daniel Vetter writes:
 > On Sun, Jul 21, 2013 at 10:23 PM, Jan Niggemann <jn@xxxxxx> wrote:
 > >> But every time this happens we only let through a few
interrupts, so this
 > >> shouldn't affect you badly. Can you please check whether those
slowdowns
 > >> line up with 2 minute intervalls?
 > >
 > > I observed these slowdowns for a couple of weeks now. On my
machine, they
 > > only happen once, some minutes after a cold boot.
 > > They last for a minute or two, and then they are gone.
 > > I'd have guessed that the storm detection kicks in pretty
quickly after a
 > > storm is detected and that it would go unnoticed.
 >
 > Hm, that sounds like something doesn't quite work as expected. We
> should kill things once we get 5 interrupts or so in 1 second. So if > it's bad enough that it slows your machine down it really should only
 > be barely noticeable.
 >

The logs show that the disable mechanism got triggered, so there was
a storm that got detected.
The respective message is generated by the worker, everything up to
there (detection and marking disabled) seems to be fine.
I bet we are still getting interrupts but the respective bit in
hpd_event_bits doesn't get set any more. Since we unconditionally
queue the worker on interrupt there is surprise it is so busy.

Then this points to the call to hpd_irq_setup() in intel_hpd_irq_handler()
not doing what is expected, ie masking out the stormy interrupt.
Could it be that we can't mask/disable an interrupt before ACKing
it?

@Jan, could you also specify what hardware you are using (ie give us
an output of lspci -n)?
It's a Lenovo ThinkPad T400, the model is 7434-AG2.
root@muretop:~# lspci -n
00:00.0 0600: 8086:2a40 (rev 07)
00:02.0 0300: 8086:2a42 (rev 07)
00:02.1 0380: 8086:2a43 (rev 07)
00:03.0 0780: 8086:2a44 (rev 07)
00:19.0 0200: 8086:10f5 (rev 03)
00:1a.0 0c03: 8086:2937 (rev 03)
00:1a.1 0c03: 8086:2938 (rev 03)
00:1a.2 0c03: 8086:2939 (rev 03)
00:1a.7 0c03: 8086:293c (rev 03)
00:1b.0 0403: 8086:293e (rev 03)
00:1c.0 0604: 8086:2940 (rev 03)
00:1c.1 0604: 8086:2942 (rev 03)
00:1c.3 0604: 8086:2946 (rev 03)
00:1c.4 0604: 8086:2948 (rev 03)
00:1d.0 0c03: 8086:2934 (rev 03)
00:1d.1 0c03: 8086:2935 (rev 03)
00:1d.2 0c03: 8086:2936 (rev 03)
00:1d.7 0c03: 8086:293a (rev 03)
00:1e.0 0604: 8086:2448 (rev 93)
00:1f.0 0601: 8086:2917 (rev 03)
00:1f.2 0106: 8086:2929 (rev 03)
00:1f.3 0c05: 8086:2930 (rev 03)
03:00.0 0280: 8086:4237
15:00.0 0607: 1180:0476 (rev ba)

As to the log: I messed up the kernel parameters this morning... was out of coffee this morning and my 1,5y daughter played around me :-)

Here's my kernel log with drm.debug and printk.time enabled:
Uncompressed (22M): http://files.hz6.de/kern_20130722.log
bzip2'd (some 600 KB): http://files.hz6.de/kern_20130722.log.bz2

Regards
jan
_______________________________________________
Intel-gfx mailing list
Intel-gfx@xxxxxxxxxxxxxxxxxxxxx
http://lists.freedesktop.org/mailman/listinfo/intel-gfx




[Index of Archives]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux