Re: [RFC 0/4] drm/msm: GPU crash state

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Quoting Jordan Crouse (2018-01-05 18:00:17)
> This is a request for comment on code to store and dump a GPU state
> a hang with inspiration from the very good i915 GPU error state and
> the binary GPU snapshot in the downstream kernel.
> 
> The goal is to store and provide enough information to debug software
> and hardware issues on the Adreno hardware in a semi human-readable
> format that can also be parsed by scripts.
> 
> The goal for this request for comment is to get some consensus
> about the format and work through some of the technical issues.

My biggest regret for i915/error is that we didn't adopt a sensible file
format and organically grew it from dmesg-style logging. This is quite a
hindrance when it comes to trying to improve the capture whilst
maintaining compatibility with the existing tools. Switching to json/yaml
at this point won't be too difficult to spot the change in format, just a
large chunk of technical debt to pay off. So I would recommend you pick a
an adaptable, human readable, file format for ease of tool development.

The second important feature for capturing error state is to include as
much user information as possible. You want to be able to identify which
library generated the hang in a post-mortem dump from a user in 6-12
months time, and just as importantly, why the library did what it did. I
like the idea of userspace being able to attach buffers that are
included in the error state (supplied as auxiliary information to the
guilty command stream) to provide a flight-data-recorder from the user's
pov. So design your interface with a view to extending to include blobs.

It would be interesting to have a common file format... While
interpreting the data is going to highly specific to a gpu/driver, the
data itself will be similar between drivers. If we had a common file
format, we could extend something like mesa/intel/aubinator_error_decode
and throw in a bunch of xml descriptors for the different gpus. Just a
thought...
-Chris
--
To unsubscribe from this list: send the line "unsubscribe linux-arm-msm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [Linux for Sparc]     [IETF Annouce]     [Security]     [Bugtraq]     [Linux MIPS]     [ECOS]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux