[benjamin.widawsky@xxxxxxxxx: intel_gpu_top broken for HSW. Ideas needed]

mika.kuoppala at linux.intel.com (Mika Kuoppala) · Mon, 15 Jul 2013 12:42:16 +0300

Ben Widawsky <benjamin.widawsky at intel.com> writes:

> FWD'd from our internal list now that we have more insight.
> ----- Forwarded message from Ben Widawsky <benjamin.widawsky at intel.com> -----
>
> Date: Thu, 11 Jul 2013 10:32:03 -0700
> From: Ben Widawsky <benjamin.widawsky at intel.com>
> To: linux-gfx at linux.intel.com
> Subject: intel_gpu_top broken for HSW. Ideas needed
> Message-ID: <20130711173202.GB8802 at intel.com>
>
> Hi everybody.
>
> While investigating a hard hang on Haswell. Eero noticed that
> intel_gpu_top helped to invoke the hang faster. I used this in my test
> case to validation, and they are suspecting it is a known issue which we
> have not yet worked around (and cannot reasonably workaround).
>
> [internal bug sighting redacted]
>
> To sum up, we cannot concurrently access registers within the same
> cacheline. It has the potential to hit a known bug.
>
> I see some choices:
> 1. Don't do anything.
> 2. Try to eliminate shared registers as much as possible. Instdone is
>    used by the hangcheck, and we can eliminate hangcheck with a
>    module parameter. Eero, can you try this as a workaround, btw?

Commit: 92cab7345131db7af18f630a799ce6b2e8e624c5 gets rid of
instdone on hangcheck.

-Mika

> 3. Somehow make the kernel collect the top data and serialize access
>    there.
>
> Anyone else have input? I personally do not use top very much, so I
> won't be volunteering to do any of these.
>
> ----- End forwarded message -----
>
> -- 
> Ben Widawsky, Intel Open Source Technology Center
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx at lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx