Due to the complexity of its stack and the apps that we run on it, GPU resets are for granted. What's left for driver developers is how to make resets a smooth experience as possible. While some OS's can recover or show an error message in such cases, Linux is more a hit-and-miss due to its lack of standardization and guidelines of what to do in such cases. This is the goal of this document, to proper define what should happen after a GPU reset so developers can start acting on top of this. An IGT test should be created to validate this for each driver. Initially my approach was to expose an uevent for GPU resets, as it can be seen here[1]. However, even if an uevent can be useful for some use cases (e.g. telemetry and error reporting), for the "OS integration" case of GPU resets it would be more productive to have something defined through the stack. Thanks, André [1] https://lore.kernel.org/amd-gfx/20221125175203.52481-1-andrealmeid@xxxxxxxxxx/ André Almeida (1): drm: Create documentation about device resets Documentation/gpu/drm-reset.rst | 51 +++++++++++++++++++++++++++++++++ Documentation/gpu/index.rst | 1 + 2 files changed, 52 insertions(+) create mode 100644 Documentation/gpu/drm-reset.rst -- 2.39.1