Re: [PATCH] [RFC] drm/i915: Generate a hang error code

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, 5 Feb 2014 16:15:02 +0100
Daniel Vetter <daniel@xxxxxxxx> wrote:

> On Wed, Feb 05, 2014 at 02:59:08PM +0000, Jesse Barnes wrote:
> > On Tue,  4 Feb 2014 12:18:55 +0000
> > Ben Widawsky <benjamin.widawsky@xxxxxxxxx> wrote:
> > 
> > > We get a large number of bugs which have a, "hey I have that too"
> > > because they see a GPU hang in dmesg. While two machines of the same
> > > model having a GPU hang is indeed a coincidence, it is far from enough
> > > evidence to suggest they are the same.
> > > 
> > > In order to reduce this effect, and hopefully get people to file new bug
> > > reports, clearly the error message itself has been insufficient (see ref
> > > at the bottom for a new bug report with this characteristic).
> > > 
> > > The algorithm is purposely pretty naive. I don't think we need much in
> > > order to avoid the problem I am trying to solve, and keeping it naive
> > > gives us some ability to make a decent test case.
> > 
> > I like the direction of this.  If we can get some basic info into the
> > dmesg part of things (the only part regular users will actually look
> > at) we can probably avoid some of the "me too" action we see on general
> > GPU hangs.  Having PID, comm, and some sort of hang signature are all
> > good steps in that direction imo.
> 
> tbh I don't see much value in regular users trying to triage gpu hang. If
> they're not damn sure that they have a dupe (which means same platform,
> versions of the software stack and crashing games) I much prefer if they
> just send in a duplicate bug for us to triage.
> 
> With the mis-design of bugzilla it's much harder to untangle a wrong
> me-too than mark something as duplicate. And especially long-running bugs
> are a royal pain if there's too much wrong me-too noise in there.
> 
> Not a comment on the patch itself, just a general comment wrt avoiding
> me-too gpu hang reports.

So you're saying the GPU error decode tool should create a bug template
for people so we don't get the "me too" reports?

What I see above is that it's really important to avoid the "me too"
stuff, and to do it in such a way that false positives are minimized
(e.g. the IPEHR bit Ubuntu used to use).  So I guess I don't see what's
unconvincing here.  Today we have no way of differentiating w/o digging
in to the error record, which users definitely won't do, and this patch
seems like it could only help with that... so count me confused.

Jesse
_______________________________________________
Intel-gfx mailing list
Intel-gfx@xxxxxxxxxxxxxxxxxxxxx
http://lists.freedesktop.org/mailman/listinfo/intel-gfx




[Index of Archives]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux