Re: How to dump gfx and waves after GPU reset happened?

Mikhail Gavrilov <mikhail.v.gavrilov@xxxxxxxxx> · Thu, 9 May 2019 15:24:53 +0500

On Mon, 6 May 2019 at 17:34, Koenig, Christian <Christian.Koenig@xxxxxxx> wrote:
>
> That won't work. The kernel can't wait for spawned processes to finish
> because it is holding locks.
>
> The script could as last operation trigger a manual reset, but that
> would not be the same as a timeout reset because you don't know the
> cause of it and would always need to do a full engine reset.
>
> Sorry, but what you are suggesting here (collect data and then reset) is
> not easily doable.
>

I am understand, but I am really liked how it implemented in intel driver.
For example after gpu hang all debug data available by path
/sys/class/drm/card0/error

[  512.296756] i915 0000:00:02.0: GPU HANG: ecode 7:1:0xfffffffe, in
gnome-shell [1753], hang on rcs0
[  512.296761] [drm] GPU hangs can indicate a bug anywhere in the
entire gfx stack, including userspace.
[  512.296762] [drm] Please file a _new_ bug report on
bugs.freedesktop.org against DRI -> DRM/Intel
[  512.296763] [drm] drm/i915 developers can then reassign to the
right component if it's not a kernel issue.
[  512.296764] [drm] The gpu crash dump is required to analyze gpu
hangs, so please always attach it.
[  512.296766] [drm] GPU crash dump saved to /sys/class/drm/card0/error
[  512.296875] i915 0000:00:02.0: Resetting chip for hang on rcs0
[  563.280960] i915 0000:00:02.0: Resetting chip for hang on rcs0
[  571.281666] i915 0000:00:02.0: Resetting chip for hang on rcs0

--
Best Regards,
Mike Gavrilov.
_______________________________________________
amd-gfx mailing list
amd-gfx@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/amd-gfx