I've got the RC6 bug

daniel at ffwll.ch (Daniel Vetter) · Wed, 18 Jan 2012 01:24:26 +0100

On Wed, Jan 18, 2012 at 01:16:02AM +0100, CC wrote:
> On Mon, Jan 16, 2012 at 5:36 PM, Daniel Vetter <daniel at ffwll.ch> wrote:
> 
> > On Mon, Jan 16, 2012 at 05:18:17PM +0100, CC wrote:
> > > Hi,
> > >
> > > I've heard that you need users having the RC6 bug.
> > >
> > > I have the following setup:
> > > CPU: Intel Core i5-2500K
> > > Mainboard: ASRock Z68 Pro3-M
> > > Memory: Corsair Vengeance CMZ8GX3M2A1866C9
> > >
> > > Although the CPU doesn't support VT-d, I disabled all virtualization
> > > support in the UEFI setup.
> > >
> > > I use Arch Linux and Gnome 3 in the fallback mode. The problem is more
> > > drastic without fallback mode, however.
> > >
> > > Whenever I enable RC6, I get the a few of these errors in dmesg:
> > >
> > > [   48.900000] WARNING: at drivers/gpu/drm/i915/i915_drv.c:387
> > > __gen6_gt_wait_for_fifo+0x94/0xa0 [i915]()
> > > [   48.900002] Hardware name: To Be Filled By O.E.M.
> > > [   48.900002] Modules linked in: ipv6 fuse ext2 snd_hda_codec_hdmi
> > > snd_hda_codec_realtek mei(C) joydev r8169 shpchp pci_hotplug usbhid hid
> > > snd_hda_intel iTCO_wdt mii iTCO_vendor_support i2c_i801 snd_hda_codec
> > > processor snd_hwdep snd_pcm snd_timer snd soundcore snd_page_alloc
> > psmouse
> > > serio_raw pcspkr evdev ext4 mbcache jbd2 crc16 xhci_hcd ehci_hcd usbcore
> > > i915 drm_kms_helper drm intel_agp i2c_algo_bit button intel_gtt i2c_core
> > > video sd_mod ahci libahci libata scsi_mod
> > > [   48.900019] Pid: 623, comm: Xorg Tainted: G        WC  3.1.9-2-ARCH #1
> > > [   48.900020] Call Trace:
> > > [   48.900023]  [<ffffffff81061bef>] warn_slowpath_common+0x7f/0xc0
> > > [   48.900025]  [<ffffffff81061c4a>] warn_slowpath_null+0x1a/0x20
> > > [   48.900028]  [<ffffffffa00e0764>] __gen6_gt_wait_for_fifo+0x94/0xa0
> > > [i915]
> > > [   48.900032]  [<ffffffffa015d2d5>] ring_write_tail+0x65/0x120 [i915]
> > > [   48.900036]  [<ffffffffa01619bc>] render_ring_flush+0xbc/0xe0 [i915]
> > > [   48.900040]  [<ffffffffa010b803>] i915_gem_flush_ring+0x43/0x250
> > [i915]
> > > [   48.900044]  [<ffffffffa0112b50>]
> > > i915_gem_do_execbuffer.isra.7+0x1020/0x16d0 [i915]
> > > [   48.900048]  [<ffffffffa01136bb>] i915_gem_execbuffer2+0x8b/0x240
> > [i915]
> > > [   48.900051]  [<ffffffffa0098434>] drm_ioctl+0x3e4/0x4c0 [drm]
> > > [   48.900053]  [<ffffffff810746cb>] ? recalc_sigpending+0x1b/0x50
> > > [   48.900057]  [<ffffffffa0113630>] ? i915_gem_execbuffer+0x430/0x430
> > > [i915]
> > > [   48.900059]  [<ffffffff8101e9b1>] ? fpu_finit+0x21/0x40
> > > [   48.900061]  [<ffffffff8116fddf>] do_vfs_ioctl+0x8f/0x500
> > > [   48.900063]  [<ffffffff81014beb>] ? sys_rt_sigreturn+0x1eb/0x200
> > > [   48.900064]  [<ffffffff811702e1>] sys_ioctl+0x91/0xa0
> > > [   48.900066]  [<ffffffff8140c3c2>] system_call_fastpath+0x16/0x1b
> > > [   48.900067] ---[ end trace 9a23b8b32b16a424 ]---
> >
> > This is a known side-effect of a dying gpu. It essentially means that the
> > gpu refuses to wake up from deep-sleep states.
> >
> > > and then
> > >
> > > [   53.163526] [drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer
> > > elapsed... GPU hung
> > > [   53.165046] [drm] capturing error event; look for more information in
> > > /debug/dri/0/i915_error_state
> > > [   53.177356] [drm:i915_wait_request] *ERROR* i915_wait_request returns
> > > -11 (awaiting 1593 at 1592, next 1594)
> > > [   53.181979] [drm:init_ring_common] *ERROR* render ring initialization
> > > failed ctl 00000000 head 00000000 tail 00000000 start 00000000
> > > [   53.185522] [drm:init_ring_common] *ERROR* gen6 bsd ring
> > initialization
> > > failed ctl 00000000 head 00000000 tail 00000000 start 00000000
> > > [   53.188558] [drm:init_ring_common] *ERROR* blt ring initialization
> > > failed ctl 00000000 head 00000000 tail 00000000 start 00000000
> > > [   55.330146] [drm:i915_hangcheck_elapsed] *ERROR* Hangcheck timer
> > > elapsed... GPU hung
> > > [   55.332202] [drm:i915_wait_request] *ERROR* i915_wait_request returns
> > > -11 (awaiting 1594 at 1591, next 1595)
> > > [   55.333258] [drm:i915_reset] *ERROR* GPU hanging too fast, declaring
> > > wedged!
> > > [   55.333260] [drm:i915_reset] *ERROR* Failed to reset chip.
> > >
> > > Of course, I'd be willing to test out stuff. I'd need a bit of guide,
> > > however.
> >
> > Can you please attach i915_error_state from debugfs (you need to retrigger
> > the issue)? It contains a gpu dump which is useful to diagnose the bug.
> >
> > Yours, Daniel
> > --
> > Daniel Vetter
> > Mail: daniel at ffwll.ch
> > Mobile: +41 (0)79 365 57 48
> >
> 
> I attached the error state.

Nice one, your gpu seems to have simply disappeared. And the ringbuffer
contains a rather peculiar cmd sequence. Putting Chris (maybe he
recognizes the pattern) and Ben (he's got a patch in the works to dump a
debug register that might be interesting here) on cc. It's too late atm
for me to think about this some more.

Thanks, Daniel
-- 
Daniel Vetter
Mail: daniel at ffwll.ch
Mobile: +41 (0)79 365 57 48