On Thu, Feb 28, 2013 at 1:59 PM, Josh Boyer <jwboyer@xxxxxxxxx> wrote: > On Thu, Feb 28, 2013 at 10:15 AM, Josh Boyer <jwboyer@xxxxxxxxx> wrote: >> On Thu, Feb 28, 2013 at 10:09 AM, Alex Deucher <alexdeucher@xxxxxxxxx> wrote: >>> On Thu, Feb 28, 2013 at 8:44 AM, Josh Boyer <jwboyer@xxxxxxxxx> wrote: >>>> On Thu, Feb 28, 2013 at 8:38 AM, Alex Deucher <alexdeucher@xxxxxxxxx> wrote: >>>>>>>>> ca57802e521de54341efc8a56f70571f79ffac72 is the first bad commit >>>>>>>> >>>>>>>> So I don't think that's actually the cause of the problem. Or at least >>>>>>>> not that alone. I reverted it on top of Linus' latest tree and I still >>>>>>>> get the lockups. >>>>>>> >>>>>>> Actually, git bisect does seem to have gotten it correct. Once I >>>>>>> actually tested the revert of just that on top of Linus' tree (commit >>>>>>> d895cb1af1), things seem to be working much better. I've rebooted a >>>>>>> dozen times without a lockup. The most I've seen it take on a kernel >>>>>>> with that commit included is 3 reboots, so that's definitely at least an >>>>>>> improvement. >>>>>> >>>>>> I give up. GPU issues are not my thing. 2 reboots after I sent that it >>>>>> gave me pretty rainbow static again. So it might have been an >>>>>> improvement, but revert it is not a solution. >>>>>> >>>>>> Looking at there rest of the commits, the whole GPU rework might be >>>>>> suspect, but I clearly have no clue. >>>>> >>>>> GPUs are tricky beasts :) >>>> >>>> Understatement ;). >>>> >>>>> ca57802e521de54341efc8a56f70571f79ffac72 mostly likely wasn't the >>>>> problem anyway since it only affects 6xx/7xx and your card is handled >>>>> by the evergreen code. I'll put together some patches to help narrow >>>>> down the problem. >>>> >>>> Yeah, that's the biggest problem I have, not knowing which functions are >>>> actually being executed for this card. It looks like a combination of >>>> stuff in evergreen.c and ni.c, but I have no idea. >>>> >>>> Patches would be great. If nothing else, I'm really good at building >>>> kernels and rebooting by now. >>> >>> Two possible fixes attached. The first attempts a full reset of all >>> blocks if the MC (memory controller) is hung. That may work better >>> than just resetting the MC. The second just disables MC reset. I'm >>> not sure we can reliably tell if it's busy due to display requests >>> hitting the MC periodically which would lead to needlessly resetting >>> it possibly leading to failures like you are seeing. >> >> OK. I'll test them individually. It will probably take a bit because >> I'll want to do numerous reboots if things seem "fixed" with one or the >> other. >> >> I'll let you know how things go. > > I applied each individually on top of Linus' tree as of this morning > (commit 2a7d2b96d5) built, installed, and tested. > > 0001-drm-radeon-XXX-try-a-full-reset-if-the-MC-is-busy.patch failed in > two reboots. > > 0001-drm-radeon-XXX-skip-MC-reset-as-it-s-probably-not-hu.patch has gone > 21 reboots without a hang/rainbow static. You'll understand if I'm > hesitant to declare success, but resetting the MC does indeed appear to > be the issue. I'll keep rebooting for a while to make sure. OK, I'm still running on the kernel with that patch and things still work. The only other "issue" I'm seeing at the moment is my dmesg is full of: [349316.595749] radeon 0000:01:00.0: MC busy: 0x00000409, clearing. [349436.654946] radeon 0000:01:00.0: MC busy: 0x00000409, clearing. [349436.655997] radeon 0000:01:00.0: MC busy: 0x00000409, clearing. [349496.698441] radeon 0000:01:00.0: MC busy: 0x00000409, clearing. [349556.726767] radeon 0000:01:00.0: MC busy: 0x00000409, clearing. [349556.727797] radeon 0000:01:00.0: MC busy: 0x00000409, clearing. So hopefully your patch is on the way into Linus' tree at some point soon. josh _______________________________________________ dri-devel mailing list dri-devel@xxxxxxxxxxxxxxxxxxxxx http://lists.freedesktop.org/mailman/listinfo/dri-devel