On Tue, Feb 26, 2013 at 05:39:46PM -0800, Linus Torvalds wrote: > On Mon, Feb 25, 2013 at 4:05 PM, Dave Airlie <airlied@xxxxxxxx> wrote: > > > > Highlights: > > > > i915: all over the map, haswell power well enhancements, valleyview macro horrors cleaned up, killing lots of legacy GTT > > code, > > Lowlight: > > There's something wrong with i915 DP detection or whatever. I get > stuff like this: > > [ 5.710827] [drm:intel_dp_aux_wait_done] *ERROR* dp aux hw did not > signal timeout (has irq: 1)! > [ 5.720810] [drm:intel_dp_aux_wait_done] *ERROR* dp aux hw did not > signal timeout (has irq: 1)! > [ 5.730794] [drm:intel_dp_aux_wait_done] *ERROR* dp aux hw did not > signal timeout (has irq: 1)! > [ 5.740782] [drm:intel_dp_aux_wait_done] *ERROR* dp aux hw did not > signal timeout (has irq: 1)! > [ 5.750775] [drm:intel_dp_aux_wait_done] *ERROR* dp aux hw did not > signal timeout (has irq: 1)! > [ 5.750778] [drm:intel_dp_aux_ch] *ERROR* dp_aux_ch not done status > 0xa145003f > ..... > [ 8.149931] [drm:intel_dp_aux_ch] *ERROR* dp_aux_ch not done status > 0xa145003f > > and after that the screen ends up black. > > It's happened twice now, but is not 100% repeatable. It looks like the > message itself is new, but the black screen is also new and does seem > to happen when I get the message, so... > > The second time I touched the power button, and the machine came back. > Apparently the suspend/resume cycle made it all magically work: the > suspend caused the same errors, but then the resume made it all good > again. > > Some kind of missed initialization at bootup? It's not reliable enough > to bisect, but I obviously suspect commit 9ee32fea5fe8 ("drm/i915: > irq-drive the dp aux communication") since that is where the message > was added.. > > Btw, looking at that commit, what do you think the semantics of the > timeout in something like > > done = wait_event_timeout(dev_priv->gmbus_wait_queue, C, 10); > > would be? What's that magic "10"? It's some totally random number. > > Guys, it should be something meaningful. If you meant a tenth of a > second, use HZ/10 or something. Because just the plain "10" is crazy. > I happen to have CONFIG_HZ_1000=y, and you're apparently waiting for a > hundreth of a second. Was that what you intended? Because if it was, > it is still crap, since CONFIG_HZ might be 100, and then you're > waiting for ten times longer. > > IOW, passing in a random number like that is crazy. It cannot possibly > be right. > > I have no idea whether the timeout has anything to do with anything, > but it reinforces my suspicion that there is something wrong with that > commit. Ok, I've merged two patches from Paulo, one to fixup the harmless jiffies vs. msec confusion. And the other to plug a race in our irq handler which did lead to missed dp aux interrupts according to some digging done by Imre. The important patch is the current tip of git://people.freedesktop.org/~danvet/drm-intel drm-intel-fixes 44498aea293b37af1d463acd9658cdce1ecdf427 drm/i915: also disable south interrupts when handling them Just in case you want to give it a quick whirl. Since the failed dp aux transaction caused the resume modeset to fail for you (resulting in the black screen) I hope that this should fix both issues. I'll forward the pull to Dave in a few days since atm I'm stalling a bit for confirmation on another little regression fix. And there's nothing earth-shattering in my -fixes queue right now. Cheers, Daniel -- Daniel Vetter Software Engineer, Intel Corporation +41 (0) 79 365 57 48 - http://blog.ffwll.ch _______________________________________________ dri-devel mailing list dri-devel@xxxxxxxxxxxxxxxxxxxxx http://lists.freedesktop.org/mailman/listinfo/dri-devel