On Thu, Jan 10, 2013 at 10:02:38AM -0500, Egbert Eich wrote: > Despite the many attempts to fix the issue with noisy hotplug interrupt lines > we are still seeing systems that suffer from this: > Recently we encountered a rather large scale installation of Q35 systems > which was hit by this issue rather severely: It seemed as if not all machines > of the same model were hit equally bad, in the worst cased hotplug > interrupt noise caused several 1000 interrupts / s. Those machines would not > even boot, instead the interrupt handler and the scheduled workers would keep > the CPU busy that eventually the watchdog would kick in and issue an NMI. > Other machines only received severa 10s to 100s of interrupts per sec - those > machines would run properly - just with an excessive system load. > More thorough investigations seemed to indicate that this condition > only happen at certain video modes. > > On another system - a laptop - a hotplug interrupt 'storm' occurred when > it was charging and the batteries were at certain charge levels. While > the system was still running fine its load was high enough that the user > noticed from the fan noise that a problem existed. > The latter system had a Sandybridge chipset, thus a totally different > generation from the former. > > All those cases seemed to have been caused by cross talk on badly routed > hotplug signal lines (or voltage instabilities). > This led to the conclusion that instead of trying to work around these > 'storms' for each individual system, there should be a generic way to detect > such a condition and take appropriate action: > > This patch series implements a hotplug 'storm' detection, disables the > respective interrupt for the hotplug pin when this condition is detected > and reverts to periodic output polling on the affected connector. > After a grace period of 2 minutes it will reenable hotplug on the affected > line. This will take care of cases in which this condition is only temporary. > Should the 'storm' condtion persist, this cycle will start over again. > > To implement this some rearrangements in the code were required: > - The interrupt status bit which signals a hotplug needed to be recorded > for each connector. > - The interrupt enable functions needed to be separate, also they need > to be able to enable interrupts for each hotplug line independently. Nice work, and we know that we need this since quite a while. But unfortunately we've not yet come around to implement something. Some high-level comments on how I think this should best be handled: - imo dv_priv->hotplug_supported_mask should die - it leaks platform specific irq magic from i915_irq.c into every connector/encoder. And we have had the bugs and confusions to prove that it's not a good idea. I think it'd be better if we add a new HOTPLUG_PIN_FOO enum that encoders register interest in, and the platform code in i915_irq.c then maps from/to that. On a quick check we have hotplug pins for CRT, TV, SDVO_B&C and PORT_A-D (for DP&HDMI). Also note that on PCH_SPLIT platforms port A is not in the same register, further platforms will make an even cuter mess of this ... - I think the the hpd pin should be track in the encoder, not in the connector. The only encoders where there's not a 1:1 relationship (sdvo and ddi on hsw) want it there. Also, we already have the ->hot_plug callback in the encoder, which will be useful for later extensions. - Since some encoders share the same hpd pin (HDMI&DP on pre-hsw) I think we should keep the noise statistic data in the device's dev_priv somewhere in an array, with one set for each hpd pin from the enum above. - In 3.8 the drm hpd/polling helpers are much improved and don't randomly poll everything any more. So if a hpd connector isn't marked as OUTPUT_POLL, it wont ever get polled. Which means if you disable the hpd irq for it, we need to have our own poll work to do that for us. The long-term goal I have is to pimp the encoder->hot_plug callback also for this case, to avoid re-running the connector detect code on unrelated outputs (which can sometimes cause havoc). Eventually a want a hpd interrupt to only run the ->hot_plug callbacks on encoders which are interested in that signal, hence this slight overkill ... Ofc, that requires that we move a lot of the ->detect logic into ->hot_plug, but that's the only way to do sane EDID cache and similar things on outputs where hpd should work (DP/HDMI). - The math buff in me would like hpd stroms to gracefully degrade into polling at 10s or so. We could achieve that with irq source masking and scheduling the work item to do the hotplug handling with an (increasing) delay if there's too many interrupts from a given hpd pin. But that requires that we can mask hotplug interrupts properly, which seems to be impossible with the PORT_HOTPLUG regs on gmch/SoC platforms :( So I think your logic is nice enough ;-) Yours, Daniel -- Daniel Vetter Software Engineer, Intel Corporation +41 (0) 79 365 57 48 - http://blog.ffwll.ch