Re: [Freedreno] [RFC 0/4] drm/msm: GPU crash state

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Jan 05, 2018 at 06:32:22PM +0000, Chris Wilson wrote:
> Quoting Jordan Crouse (2018-01-05 18:00:17)
> > This is a request for comment on code to store and dump a GPU state
> > a hang with inspiration from the very good i915 GPU error state and
> > the binary GPU snapshot in the downstream kernel.
> > 
> > The goal is to store and provide enough information to debug software
> > and hardware issues on the Adreno hardware in a semi human-readable
> > format that can also be parsed by scripts.
> > 
> > The goal for this request for comment is to get some consensus
> > about the format and work through some of the technical issues.
> 
> My biggest regret for i915/error is that we didn't adopt a sensible file
> format and organically grew it from dmesg-style logging. This is quite a
> hindrance when it comes to trying to improve the capture whilst
> maintaining compatibility with the existing tools. Switching to json/yaml
> at this point won't be too difficult to spot the change in format, just a
> large chunk of technical debt to pay off. So I would recommend you pick a
> an adaptable, human readable, file format for ease of tool development.

This is a really great suggestion. The downstream qcom kernel uses a strictly
binary format which is also problematic for other reasons. I like the idea of
having something standard and extensible while remaining human readable without
tools.

> The second important feature for capturing error state is to include as
> much user information as possible. You want to be able to identify which
> library generated the hang in a post-mortem dump from a user in 6-12
> months time, and just as importantly, why the library did what it did. I
> like the idea of userspace being able to attach buffers that are
> included in the error state (supplied as auxiliary information to the
> guilty command stream) to provide a flight-data-recorder from the user's
> pov. So design your interface with a view to extending to include blobs.

I love the ascii85 and compression stuff that i915 does and that would fit in
well a nice file format as well.

> It would be interesting to have a common file format... While
> interpreting the data is going to highly specific to a gpu/driver, the
> data itself will be similar between drivers. If we had a common file
> format, we could extend something like mesa/intel/aubinator_error_decode
> and throw in a bunch of xml descriptors for the different gpus. Just a
> thought...

I'm definitely open to this. There is never anything wrong with improved
debugging for everybody.

Thanks,
Jordan

-- 
The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project
--
To unsubscribe from this list: send the line "unsubscribe linux-arm-msm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [Linux for Sparc]     [IETF Annouce]     [Security]     [Bugtraq]     [Linux MIPS]     [ECOS]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux