Hi, The goal of this patchset is to improve debugging device resets on amdgpu. The first patch creates a new module parameter to disable soft recoveries, ensuring every recovery go through the full device reset, making easier to generate resets from userspace tools like [0] and [1]. This is important to validate how the stack behaves on resets, from end-to-end. The last patches are a rework to alloc devcoredump dynamically and to move it to a better source file. I have dropped the patches that add more information to devcoredump for now, until I figure out a better way to do so, like storing the IB address in the fence. Thanks, André [0] https://gitlab.freedesktop.org/andrealmeid/gpu-timeout [1] https://github.com/andrealmeid/vulkan-triangle-v1 Changelog: v2: https://lore.kernel.org/dri-devel/20230713213242.680944-1-andrealmeid@xxxxxxxxxx/ - Drop the IB and ring patch - Drop patch that limited information from kernel threads - Add patch to move coredump to amdgpu_reset v1: https://lore.kernel.org/dri-devel/20230711213501.526237-1-andrealmeid@xxxxxxxxxx/ - Drop "Mark contexts guilty for causing soft recoveries" patch - Use GFP_NOWAIT for devcoredump allocation André Almeida (5): drm/amdgpu: Create a module param to disable soft recovery drm/amdgpu: Allocate coredump memory in a nonblocking way drm/amdgpu: Rework coredump to use memory dynamically drm/amdgpu: Move coredump code to amdgpu_reset file drm/amdgpu: Create version number for coredumps drivers/gpu/drm/amd/amdgpu/amdgpu.h | 6 +- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 67 +----------------- drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 9 +++ drivers/gpu/drm/amd/amdgpu/amdgpu_reset.c | 79 ++++++++++++++++++++++ drivers/gpu/drm/amd/amdgpu/amdgpu_reset.h | 14 ++++ drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c | 6 +- 6 files changed, 111 insertions(+), 70 deletions(-) -- 2.41.0