On Mon, Apr 22, 2013 at 10:55 PM, Michel Dänzer <michel@xxxxxxxxxxx> wrote: > On Mon, 2013-04-22 at 16:19 -0700, Andy Lutomirski wrote: >> On Thu, Apr 18, 2013 at 2:12 PM, Alex Deucher <alexdeucher@xxxxxxxxx> wrote: >> > On Thu, Apr 18, 2013 at 5:11 PM, Andy Lutomirski <luto@xxxxxxxxxxxxxx> wrote: >> >> On Mon, Apr 8, 2013 at 7:01 AM, Alex Deucher <alexdeucher@xxxxxxxxx> wrote: >> >>> On Fri, Apr 5, 2013 at 5:11 PM, Andy Lutomirski <luto@xxxxxxxxxxxxxx> wrote: >> >>>> Every day or so, I'll click something and my screens go blank for a >> >>>> second or two. dmesg complains about a lockup, and afterwards >> >>>> everything is painfully slow. (Even switching focus to other emacs >> >>>> windows takes a second or two.) Once this happens, if I restart X, I >> >>>> get a blank screen, although the mouse still works and I can switch >> >>>> VTs and use the console. >> >>> >> >>> Try disabling hyperZ. Set env var R600_HYPERZ=0 (mesa 9.1) or >> >>> R600_DEBUG=nohyperz (mesa git). >> >> >> >> It lasted longer. I have both of those environment variables set on >> >> the Xorg process but not on clients. Do I need it everywhere? >> > >> > For anything that uses the 3D driver. >> >> This didn't appear to fix it, although it may have fixed some >> graphical glitches in gmail's compose window. > > Seems rather unlikely that's directly related to HyperZ, but who knows. > > >> [350788.530966] radeon 0000:08:00.0: GPU lockup CP stall for more than 40769msec >> [350788.530970] radeon 0000:08:00.0: GPU lockup (waiting for >> 0x000000000000178f last fence id 0x000000000000178e) >> [350788.532047] radeon 0000:08:00.0: Saved 103 dwords of commands on ring 0. >> [350788.532051] radeon 0000:08:00.0: GPU softreset: 0x00000003 >> [350788.547792] radeon 0000:08:00.0: GRBM_STATUS = 0xA0003828 >> [350788.547794] radeon 0000:08:00.0: GRBM_STATUS_SE0 = 0x00000007 >> [350788.547797] radeon 0000:08:00.0: GRBM_STATUS_SE1 = 0x00000007 >> [350788.547799] radeon 0000:08:00.0: SRBM_STATUS = 0x200000C0 >> [350788.547802] radeon 0000:08:00.0: R_008674_CP_STALLED_STAT1 = 0x00000000 >> [350788.547805] radeon 0000:08:00.0: R_008678_CP_STALLED_STAT2 = 0x00000000 >> [350788.547807] radeon 0000:08:00.0: R_00867C_CP_BUSY_STAT = 0x00000004 >> [350788.547810] radeon 0000:08:00.0: R_008680_CP_STAT = 0x80008647 >> [350788.547811] radeon 0000:08:00.0: GRBM_SOFT_RESET=0x00007F6B >> [350788.547866] radeon 0000:08:00.0: GRBM_STATUS = 0x00003828 >> [350788.547869] radeon 0000:08:00.0: GRBM_STATUS_SE0 = 0x00000007 >> [350788.547872] radeon 0000:08:00.0: GRBM_STATUS_SE1 = 0x00000007 >> [350788.547874] radeon 0000:08:00.0: SRBM_STATUS = 0x200000C0 >> [350788.547877] radeon 0000:08:00.0: R_008674_CP_STALLED_STAT1 = 0x00000000 >> [350788.547879] radeon 0000:08:00.0: R_008678_CP_STALLED_STAT2 = 0x00000000 >> [350788.547882] radeon 0000:08:00.0: R_00867C_CP_BUSY_STAT = 0x00000000 >> [350788.547884] radeon 0000:08:00.0: R_008680_CP_STAT = 0x00000000 >> [350788.565361] radeon 0000:08:00.0: GPU reset succeeded, trying to resume >> [350788.583801] [drm] probing gen 2 caps for device 8086:1d1a = 2/0 >> [350788.583807] [drm] enabling PCIE gen 2 link speeds, disable with >> radeon.pcie_gen2=0 >> [350788.590840] [drm] PCIE GART of 512M enabled (table at 0x0000000000040000). >> [350788.590976] radeon 0000:08:00.0: WB enabled >> [350788.590978] radeon 0000:08:00.0: fence driver on ring 0 use gpu >> addr 0x0000000040000c00 and cpu addr 0xffff880442f58c00 >> [350788.590979] radeon 0000:08:00.0: fence driver on ring 3 use gpu >> addr 0x0000000040000c0c and cpu addr 0xffff880442f58c0c >> [350788.607480] [drm] ring test on 0 succeeded in 2 usecs >> [350788.607560] [drm] ring test on 3 succeeded in 1 usecs >> [350788.615053] [drm] ib test on ring 0 succeeded in 0 usecs >> [350788.615133] [drm] ib test on ring 3 succeeded in 1 usecs >> >> I'm not convinced there's an actual hang. 40 seconds is a long time, >> and I've only ever seen this when clicking something, and when this >> happens, the screen goes blank immediately (not after a 40 second >> delay). > > Hmm, now that you mention this, I notice in your original report it > claims that the CP stalled for 'more than 5102593msec', which is clearly > bogus. Looks like something's wrong with the lockup detection. > Did this start after a kernel update or something like that? It's recent. It may have been when F18 switched from 3.7 to 3.8. I think there are bugs in the lockup detection and in the lockup recovery. Firefox, in particular, is *really* slow afterwards. Are interrupts possibly getting dropped or misconfigured during the reset? --Andy _______________________________________________ dri-devel mailing list dri-devel@xxxxxxxxxxxxxxxxxxxxx http://lists.freedesktop.org/mailman/listinfo/dri-devel