On Fri, Jul 12, 2013 at 7:12 PM, Ben Widawsky <benjamin.widawsky at intel.com> wrote: > FWD'd from our internal list now that we have more insight. > ----- Forwarded message from Ben Widawsky <benjamin.widawsky at intel.com> ----- > > Date: Thu, 11 Jul 2013 10:32:03 -0700 > From: Ben Widawsky <benjamin.widawsky at intel.com> > To: linux-gfx at linux.intel.com > Subject: intel_gpu_top broken for HSW. Ideas needed > Message-ID: <20130711173202.GB8802 at intel.com> > > Hi everybody. > > While investigating a hard hang on Haswell. Eero noticed that > intel_gpu_top helped to invoke the hang faster. I used this in my test > case to validation, and they are suspecting it is a known issue which we > have not yet worked around (and cannot reasonably workaround). > > [internal bug sighting redacted] > > To sum up, we cannot concurrently access registers within the same > cacheline. It has the potential to hit a known bug. > > I see some choices: > 1. Don't do anything. > 2. Try to eliminate shared registers as much as possible. Instdone is > used by the hangcheck, and we can eliminate hangcheck with a > module parameter. Eero, can you try this as a workaround, btw? > 3. Somehow make the kernel collect the top data and serialize access > there. > > Anyone else have input? I personally do not use top very much, so I > won't be volunteering to do any of these. For now I'd just vote for a warning on gen6+ on the intel-gpu-top screen that this might hang hw. If anyone cares we could add a debugfs interface (or finally get real approval for the performance counters the hw has an expose them properly). Not a intel_gpu_top user myself though. -Daniel -- Daniel Vetter Software Engineer, Intel Corporation +41 (0) 79 365 57 48 - http://blog.ffwll.ch