Re: [RFC PATCH v3 2/4] drm: Create DRM_IOCTL_GET_RESET

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Am 22.06.23 um 10:22 schrieb Pekka Paalanen:
On Wed, 21 Jun 2023 13:33:56 -0300
André Almeida <andrealmeid@xxxxxxxxxx> wrote:

Em 21/06/2023 05:09, Pekka Paalanen escreveu:
On Tue, 20 Jun 2023 21:57:17 -0300
André Almeida <andrealmeid@xxxxxxxxxx> wrote:
Create a new DRM ioctl operation to get the numbers of resets for a
given context. The numbers reflect just the resets that happened after
the context was created, and not since the machine was booted.

Create a debugfs interface to make easier to test the API without real
resets.

Signed-off-by: André Almeida <andrealmeid@xxxxxxxxxx>
---
   drivers/gpu/drm/drm_debugfs.c |  2 ++
   drivers/gpu/drm/drm_ioctl.c   | 58 +++++++++++++++++++++++++++++++++++
   include/drm/drm_device.h      |  3 ++
   include/drm/drm_drv.h         |  3 ++
   include/uapi/drm/drm.h        | 21 +++++++++++++
   include/uapi/drm/drm_mode.h   | 15 +++++++++
   6 files changed, 102 insertions(+)
...
diff --git a/include/uapi/drm/drm.h b/include/uapi/drm/drm.h
index a87bbbbca2d4..a84559aa0d77 100644
--- a/include/uapi/drm/drm.h
+++ b/include/uapi/drm/drm.h
@@ -1169,6 +1169,27 @@ extern "C" {
    */
   #define DRM_IOCTL_MODE_GETFB2		DRM_IOWR(0xCE, struct drm_mode_fb_cmd2)
+/**
+ * DRM_IOCTL_GET_RESET - Get information about device resets
+ *
+ * This operation requests from the device information about resets. It should
+ * consider only resets that happens after the context is created, therefore,
+ * the counter should be zero during context creation.
+ *
+ * dev_reset_count tells how many resets have happened on this device, and
+ * ctx_reset_count tells how many of such resets were caused by this context.
+ *
+ * Flags can be used to tell if a reset is in progress, and userspace should
+ * wait until it's not in progress anymore to be able to create a new context;
+ * and to tell if the VRAM is considered lost. There's no safe way to clean this
+ * flag so if a context see this flag set, it should be like that until the end
+ * of the context.
Is "this flag" the VRAM_LOST? Or any flag?

Does this mean that not all resets are fatal to the context? Is there
any kind of reset that should not be fatal to a context? All the
rendering APIs seem to assume that any reset is fatal and the context
must be destroyed.
I got this flag from the `AMDGPU_CTX_OP_QUERY_STATE2` operation, and
it's used to notify that the reset was fatal for a giving context,
although the idea of non-fatal resets seems to be a bit controversial
for now, so I think it will be better if I leave this flag for latter
improvements of the API.
Which flag is "this flag"? There are RESET_IN_PROGRESS and VRAM_LOST.
Both are fine by me to exist.

I think I made a wrong conclusion here. Somehow I read that it would be
possible to have a reset happen, and if VRAM is not lost, then the
context could work again.

Yeah, that's exactly what AMD tries to do.

And no, I'm absolutely not keen about that idea.

Regards,
Christian.

Should there be some wording added to say the context is permanently
broken on any kind of reset? Or is that for UMD to decide?


Thanks,
pq

+ */
+#define DRM_IOCTL_GET_RESET		DRM_IOWR(0xCF, struct drm_get_reset)
+
+#define DRM_RESET_IN_PROGRESS	0x1
+#define DRM_RESET_VRAM_LOST	0x2
Ok, so the dmabuf lost is being communicated here, but how would a
userspace process know on which device a dmabuf resides on?

Let's assume process A uses device 1 to draw, exports a dmabuf, sends
it to process B which imports it to device 2. Device 1 resets and loses
VRAM contents. How would process B notice that the dmabuf is lost when
it never touches device 1 itself?
+
   /*
    * Device specific ioctls should only be in their respective headers
    * The device specific ioctl range is from 0x40 to 0x9f.
diff --git a/include/uapi/drm/drm_mode.h b/include/uapi/drm/drm_mode.h
index 43691058d28f..c3257bd1af9c 100644
--- a/include/uapi/drm/drm_mode.h
+++ b/include/uapi/drm/drm_mode.h
@@ -1308,6 +1308,21 @@ struct drm_mode_rect {
   	__s32 y2;
   };
+/**
+ * struct drm_get_reset - Get information about a DRM device resets
+ * @ctx_id: the context id to be queried about resets
+ * @flags: flags
+ * @dev_reset_count: global counter of resets for a given DRM device
+ * @ctx_reset_count: of all the resets counted by this device, how many were
+ * caused by this context.
+ */
+struct drm_get_reset {
+	__u32 ctx_id;
+	__u32 flags;
+	__u64 dev_reset_count;
+	__u64 ctx_reset_count;
+};
+
   #if defined(__cplusplus)
   }
   #endif
Thanks,
pq




[Index of Archives]     [Linux DRI Users]     [Linux Intel Graphics]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [XFree86]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Linux Kernel]     [Linux SCSI]     [XFree86]
  Powered by Linux