Em 27/06/2023 14:47, Christian König escreveu:
Am 27.06.23 um 15:23 schrieb André Almeida:
Create a section that specifies how to deal with DRM device resets for
kernel and userspace drivers.
Acked-by: Pekka Paalanen <pekka.paalanen@xxxxxxxxxxxxx>
Signed-off-by: André Almeida <andrealmeid@xxxxxxxxxx>
---
v4:
https://lore.kernel.org/lkml/20230626183347.55118-1-andrealmeid@xxxxxxxxxx/
Changes:
- Grammar fixes (Randy)
Documentation/gpu/drm-uapi.rst | 68 ++++++++++++++++++++++++++++++++++
1 file changed, 68 insertions(+)
diff --git a/Documentation/gpu/drm-uapi.rst
b/Documentation/gpu/drm-uapi.rst
index 65fb3036a580..3cbffa25ed93 100644
--- a/Documentation/gpu/drm-uapi.rst
+++ b/Documentation/gpu/drm-uapi.rst
@@ -285,6 +285,74 @@ for GPU1 and GPU2 from different vendors, and a
third handler for
mmapped regular files. Threads cause additional pain with signal
handling as well.
+Device reset
+============
+
+The GPU stack is really complex and is prone to errors, from hardware
bugs,
+faulty applications and everything in between the many layers. Some
errors
+require resetting the device in order to make the device usable
again. This
+sections describes the expectations for DRM and usermode drivers when a
+device resets and how to propagate the reset status.
+
+Kernel Mode Driver
+------------------
+
+The KMD is responsible for checking if the device needs a reset, and
to perform
+it as needed. Usually a hang is detected when a job gets stuck
executing. KMD
+should keep track of resets, because userspace can query any time
about the
+reset stats for an specific context.
Maybe drop the part "for a specific context". Essentially the reset
query could use global counters instead and we won't need the context
any more here.
Right, I wrote like this to reflect how it's currently implemented.
If follow correctly what you meant, KMD could always notify the global
count for UMD, and we would move to the UMD the responsibility to manage
the reset counters, right? This would also simplify my
DRM_IOCTL_GET_RESET proposal. I'll apply your suggestion to the next doc
version.
Apart from that this sounds good to me, feel free to add my rb.
Regards,
Christian.