On Fri, Jan 05, 2018 at 06:32:22PM +0000, Chris Wilson wrote: > Quoting Jordan Crouse (2018-01-05 18:00:17) > > This is a request for comment on code to store and dump a GPU state > > a hang with inspiration from the very good i915 GPU error state and > > the binary GPU snapshot in the downstream kernel. > > > > The goal is to store and provide enough information to debug software > > and hardware issues on the Adreno hardware in a semi human-readable > > format that can also be parsed by scripts. > > > > The goal for this request for comment is to get some consensus > > about the format and work through some of the technical issues. > > My biggest regret for i915/error is that we didn't adopt a sensible file > format and organically grew it from dmesg-style logging. This is quite a > hindrance when it comes to trying to improve the capture whilst > maintaining compatibility with the existing tools. Switching to json/yaml > at this point won't be too difficult to spot the change in format, just a > large chunk of technical debt to pay off. So I would recommend you pick a > an adaptable, human readable, file format for ease of tool development. This is a really great suggestion. The downstream qcom kernel uses a strictly binary format which is also problematic for other reasons. I like the idea of having something standard and extensible while remaining human readable without tools. > The second important feature for capturing error state is to include as > much user information as possible. You want to be able to identify which > library generated the hang in a post-mortem dump from a user in 6-12 > months time, and just as importantly, why the library did what it did. I > like the idea of userspace being able to attach buffers that are > included in the error state (supplied as auxiliary information to the > guilty command stream) to provide a flight-data-recorder from the user's > pov. So design your interface with a view to extending to include blobs. I love the ascii85 and compression stuff that i915 does and that would fit in well a nice file format as well. > It would be interesting to have a common file format... While > interpreting the data is going to highly specific to a gpu/driver, the > data itself will be similar between drivers. If we had a common file > format, we could extend something like mesa/intel/aubinator_error_decode > and throw in a bunch of xml descriptors for the different gpus. Just a > thought... I'm definitely open to this. There is never anything wrong with improved debugging for everybody. Thanks, Jordan -- The Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, a Linux Foundation Collaborative Project _______________________________________________ dri-devel mailing list dri-devel@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/dri-devel