On Thu, Feb 28, 2013 at 10:15 AM, Josh Boyer <jwboyer@xxxxxxxxx> wrote: > On Thu, Feb 28, 2013 at 10:09 AM, Alex Deucher <alexdeucher@xxxxxxxxx> wrote: >> On Thu, Feb 28, 2013 at 8:44 AM, Josh Boyer <jwboyer@xxxxxxxxx> wrote: >>> On Thu, Feb 28, 2013 at 8:38 AM, Alex Deucher <alexdeucher@xxxxxxxxx> wrote: >>>>>>>> ca57802e521de54341efc8a56f70571f79ffac72 is the first bad commit >>>>>>> >>>>>>> So I don't think that's actually the cause of the problem. Or at least >>>>>>> not that alone. I reverted it on top of Linus' latest tree and I still >>>>>>> get the lockups. >>>>>> >>>>>> Actually, git bisect does seem to have gotten it correct. Once I >>>>>> actually tested the revert of just that on top of Linus' tree (commit >>>>>> d895cb1af1), things seem to be working much better. I've rebooted a >>>>>> dozen times without a lockup. The most I've seen it take on a kernel >>>>>> with that commit included is 3 reboots, so that's definitely at least an >>>>>> improvement. >>>>> >>>>> I give up. GPU issues are not my thing. 2 reboots after I sent that it >>>>> gave me pretty rainbow static again. So it might have been an >>>>> improvement, but revert it is not a solution. >>>>> >>>>> Looking at there rest of the commits, the whole GPU rework might be >>>>> suspect, but I clearly have no clue. >>>> >>>> GPUs are tricky beasts :) >>> >>> Understatement ;). >>> >>>> ca57802e521de54341efc8a56f70571f79ffac72 mostly likely wasn't the >>>> problem anyway since it only affects 6xx/7xx and your card is handled >>>> by the evergreen code. I'll put together some patches to help narrow >>>> down the problem. >>> >>> Yeah, that's the biggest problem I have, not knowing which functions are >>> actually being executed for this card. It looks like a combination of >>> stuff in evergreen.c and ni.c, but I have no idea. >>> >>> Patches would be great. If nothing else, I'm really good at building >>> kernels and rebooting by now. >> >> Two possible fixes attached. The first attempts a full reset of all >> blocks if the MC (memory controller) is hung. That may work better >> than just resetting the MC. The second just disables MC reset. I'm >> not sure we can reliably tell if it's busy due to display requests >> hitting the MC periodically which would lead to needlessly resetting >> it possibly leading to failures like you are seeing. > > OK. I'll test them individually. It will probably take a bit because > I'll want to do numerous reboots if things seem "fixed" with one or the > other. > > I'll let you know how things go. I applied each individually on top of Linus' tree as of this morning (commit 2a7d2b96d5) built, installed, and tested. 0001-drm-radeon-XXX-try-a-full-reset-if-the-MC-is-busy.patch failed in two reboots. 0001-drm-radeon-XXX-skip-MC-reset-as-it-s-probably-not-hu.patch has gone 21 reboots without a hang/rainbow static. You'll understand if I'm hesitant to declare success, but resetting the MC does indeed appear to be the issue. I'll keep rebooting for a while to make sure. josh _______________________________________________ dri-devel mailing list dri-devel@xxxxxxxxxxxxxxxxxxxxx http://lists.freedesktop.org/mailman/listinfo/dri-devel