Re: [PATCH 01/13] drm: execution context for GEM buffers v4

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi!

On 6/19/23 11:20, Christian König wrote:
Hi guys,

Am 19.06.23 um 10:59 schrieb Thomas Hellström (Intel):
[SNIP]

I really need to find some time to work on that anyway.
I've been playing with drm_exec for a couple weeks now, and I wanted
to share something I hacked to try and make the API simpler and
more robust against misuse (see the below diff, which is a slightly
adjusted version of your work).

It would be good if we could have someone taking charge of this series and address all review comments, I see some of my comments getting lost, we have multiple submitters and I can't find a dri-devel patchwork entry for this. Anyway some comments below.

I can try to find some time for the series this week (As long as nobody comes along and has any burning roof).



In this version, the user is no longer in control of the retry
loop. Instead, it provides an expression (a call to a
sub-function) to be re-evaluated each time a contention is
detected. IMHO, this makes the 'prepare-objs' functions easier to
apprehend, and avoids any mistake like calling
drm_exec_continue_on_contention() in an inner loop, or breaking
out of the drm_exec_while_all_locked() loop unintentionally.

In i915 we've had a very similar helper to this, and while I agree this newer version would probably help make code cleaner, but OTOH there also are some places where the short drm_exec_while_all_locked() -likeblock don't really motivate a separate function. Porting i915 to the current version will take some work, For  the xe driver both versions would work fine.

Yeah, this is actually what my first version of this looked like. But I abandoned that approach because we have a lot of cases were we just quickly want to lock a few GEM objects and don't want the extra overhead of putting all the state into some bag to forward it to a function.


Some additional review comments not related to the interface change below:


It also makes the internal management a bit simpler, since we
no longer call drm_exec_cleanup() on the first attempt, and can
thus get rid of the DRM_EXEC_DUMMY trick.

In the below diff, I also re-introduced native support for
duplicates as an opt-in, so we don't have to do things like:

    ret = drm_exec_prepare_obj(exec, obj, num_fences);
    if (ret == -EALREADY)
        ret = dma_resv_reserve_fences(obj->resv, num_fences);
    if (ret)
        return ret;

and can just do:

    ret = drm_exec_prepare_obj(exec, obj, num_fences);
    if (ret)
        return;

Of course drivers can open-code a wrapper doing the same thing, but
given at least pvr and radeon need this, it'd be nice if the core
could support it natively.

That's mostly it. Just wanted to share what I had in case you're
interested. If not, that's fine too.

Regards,

Boris
---
  Documentation/gpu/drm-mm.rst |  12 ++
  drivers/gpu/drm/Kconfig      |   6 +
  drivers/gpu/drm/Makefile     |   2 +
  drivers/gpu/drm/drm_exec.c   | 274 +++++++++++++++++++++++++++++++++++
  include/drm/drm_exec.h       | 130 +++++++++++++++++
  5 files changed, 424 insertions(+)
  create mode 100644 drivers/gpu/drm/drm_exec.c
  create mode 100644 include/drm/drm_exec.h

diff --git a/Documentation/gpu/drm-mm.rst b/Documentation/gpu/drm-mm.rst
index fe40ee686f6e..c9f120cfe730 100644
--- a/Documentation/gpu/drm-mm.rst
+++ b/Documentation/gpu/drm-mm.rst
@@ -524,6 +524,18 @@ DRM Sync Objects
  .. kernel-doc:: drivers/gpu/drm/drm_syncobj.c
     :export:
  +DRM Execution context
+=====================
+
+.. kernel-doc:: drivers/gpu/drm/drm_exec.c
+   :doc: Overview
+
+.. kernel-doc:: include/drm/drm_exec.h
+   :internal:
+
+.. kernel-doc:: drivers/gpu/drm/drm_exec.c
+   :export:
+
  GPU Scheduler
  =============
  diff --git a/drivers/gpu/drm/Kconfig b/drivers/gpu/drm/Kconfig
index 76991720637c..01a38fcdb1c4 100644
--- a/drivers/gpu/drm/Kconfig
+++ b/drivers/gpu/drm/Kconfig
@@ -194,6 +194,12 @@ config DRM_TTM
        GPU memory types. Will be enabled automatically if a device driver
        uses it.
  +config DRM_EXEC
+    tristate
+    depends on DRM
+    help
+      Execution context for command submissions
+
  config DRM_BUDDY
      tristate
      depends on DRM
diff --git a/drivers/gpu/drm/Makefile b/drivers/gpu/drm/Makefile
index 1873f64db171..18a02eaf2d49 100644
--- a/drivers/gpu/drm/Makefile
+++ b/drivers/gpu/drm/Makefile
@@ -79,6 +79,8 @@ obj-$(CONFIG_DRM_PANEL_ORIENTATION_QUIRKS) += drm_panel_orientation_quirks.o
  #
  # Memory-management helpers
  #
+#
+obj-$(CONFIG_DRM_EXEC) += drm_exec.o
    obj-$(CONFIG_DRM_BUDDY) += drm_buddy.o
  diff --git a/drivers/gpu/drm/drm_exec.c b/drivers/gpu/drm/drm_exec.c
new file mode 100644
index 000000000000..e0ad1a3e1610
--- /dev/null
+++ b/drivers/gpu/drm/drm_exec.c
@@ -0,0 +1,274 @@
+/* SPDX-License-Identifier: GPL-2.0 OR MIT */
+
+#include <drm/drm_exec.h>
+#include <drm/drm_gem.h>
+#include <linux/dma-resv.h>
+
+/**
+ * DOC: Overview
+ *
+ * This component mainly abstracts the retry loop necessary for locking + * multiple GEM objects while preparing hardware operations (e.g. command
+ * submissions, page table updates etc..).
+ *
+ * If a contention is detected while locking a GEM object the cleanup procedure + * unlocks all previously locked GEM objects and locks the contended one first
+ * before locking any further objects.
+ *
+ * After an object is locked fences slots can optionally be reserved on the
+ * dma_resv object inside the GEM object.
+ *
+ * A typical usage pattern should look like this::
+ *
+ * int prepare_objs_func(struct drm_exec *exec, ...)
+ * {
+ *    struct drm_gem_object *boA, *boB;
+ *     int ret;
+ *
+ *    <retrieve boA and boB here>
+ *
+ *    ret = drm_exec_prepare_obj(&exec, boA, 1);
+ *    if (ret)
+ *        return ret;
+ *
+ *    ret = drm_exec_prepare_obj(&exec, boB, 1);
+ *    if (ret)
+ *        return ret;
+ *
+ *     return 0;
+ * }
+ *
+ * int some_func()
+ * {
+ *    struct drm_exec exec;
+ *    unsigned long index;
+ *    int ret;
+ *
+ *    drm_exec_init(&exec, true);
+ *    ret = drm_exec_until_all_locked(&exec, prepare_objs_func(&exec, ...));
+ *    if (ret)
+ *        goto error;
+ *
+ *    drm_exec_for_each_locked_object(&exec, index, obj) {
+ *        dma_resv_add_fence(obj->resv, fence, DMA_RESV_USAGE_READ);
+ *        ...
+ *    }
+ *    drm_exec_fini(&exec);
+ *
+ * See struct dma_exec for more details.
+ */
+
+/* Unlock all objects and drop references */
+static void drm_exec_unlock_all(struct drm_exec *exec)
+{
+    struct drm_gem_object *obj;
+    unsigned long index;
+
+    drm_exec_for_each_locked_object(exec, index, obj) {
+        dma_resv_unlock(obj->resv);
+        drm_gem_object_put(obj);
+    }
+
+    drm_gem_object_put(exec->prelocked);
+    exec->prelocked = NULL;
+}
+
+/**
+ * drm_exec_init - initialize a drm_exec object
+ * @exec: the drm_exec object to initialize
+ * @interruptible: if locks should be acquired interruptible
+ *
+ * Initialize the object and make sure that we can track locked objects.
+ */
+void drm_exec_init(struct drm_exec *exec, u32 flags)
+{
+    exec->flags = flags;
+    exec->objects = kmalloc(PAGE_SIZE, GFP_KERNEL);
+
+    /* If allocation here fails, just delay that till the first use */
+    exec->max_objects = exec->objects ? PAGE_SIZE / sizeof(void *) : 0;
+    exec->num_objects = 0;
+    exec->contended = NULL;
+    exec->prelocked = NULL;
+    ww_acquire_init(&exec->ticket, &reservation_ww_class);
+}
+EXPORT_SYMBOL(drm_exec_init);
+
+/**
+ * drm_exec_fini - finalize a drm_exec object
+ * @exec: the drm_exec object to finalize
+ *
+ * Unlock all locked objects, drop the references to objects and free all memory
+ * used for tracking the state.
+ */
+void drm_exec_fini(struct drm_exec *exec)
+{
+    drm_exec_unlock_all(exec);
+    kvfree(exec->objects);
+    if (exec->contended)
+        drm_gem_object_put(exec->contended);
+    ww_acquire_fini(&exec->ticket);
+}
+EXPORT_SYMBOL(drm_exec_fini);
+
+/**
+ * drm_exec_reset - reset a drm_exec object after a contention
+ * @exec: the drm_exec object to reset
+ *
+ * Unlock all locked objects and resets the number of objects locked.
+ */
+void drm_exec_reset(struct drm_exec *exec)
+{
+    WARN_ON(!exec->contended);
+    drm_exec_unlock_all(exec);
+    exec->num_objects = 0;
+}
+EXPORT_SYMBOL(drm_exec_reset);
+
+/* Track the locked object in the array */
+static int drm_exec_obj_locked(struct drm_exec *exec,
+                   struct drm_gem_object *obj)
+{
+    if (unlikely(exec->num_objects == exec->max_objects)) {
+        size_t size = exec->max_objects * sizeof(void *);
+        void *tmp;
+
+        tmp = kvrealloc(exec->objects, size, size + PAGE_SIZE,
+                GFP_KERNEL);
+        if (!tmp)
+            return -ENOMEM;

Sometimes you need to just temporarily lock an object and then unlock it again if it goes out of scope before reaching the end of _until_all_locked(). In that case you might need to remove a lock from the array. I *think* for all use-cases in i915 it would suffice to take a snapshot of num_objects, and unlock everything above that, having exec->objects behave like a stack, but was ever a list considered instead of a realloced array?

Yes, the problem is that linked lists really suck regarding their cache line locality. That's why I've came up with this approach here.

What we could maybe do is to allow unlocking objects, but with the cost of linear backward searching for them in the array.


+
+        exec->objects = tmp;
+        exec->max_objects += PAGE_SIZE / sizeof(void *);
+    }
+    drm_gem_object_get(obj);
+    exec->objects[exec->num_objects++] = obj;
+
+    return 0;
+}
+
+/* Make sure the contended object is locked first */
+static int drm_exec_lock_contended(struct drm_exec *exec)
+{
+    struct drm_gem_object *obj = exec->contended;
+    int ret;
+
+    if (likely(!obj))
+        return 0;
+
+    if (exec->flags & DRM_EXEC_FLAG_INTERRUPTIBLE) {
+        ret = dma_resv_lock_slow_interruptible(obj->resv,
+                               &exec->ticket);
+        if (unlikely(ret))
+            goto error_dropref;
+    } else {
+        dma_resv_lock_slow(obj->resv, &exec->ticket);
+    }
+

Sometimes you want to just drop the contended lock after the above relaxation. (Eviction would be one example), and not add as prelocked, if the contended object goes out of scope. Eviction would in some situations be one such example, -EDEADLOCK leading to an error path where the object should otherwise be freed is another. Perhaps we could add an argument to prepare_obj() as to whether the object should be immediately put after relaxation.

I was considering a try_prepare version as well, that should cover this use case.

That sounds a bit different from this use-case. The use-case above would, on -EDEADLOCK actually unlock everything, then lock-slow the contending lock and then immediately unlock it and drop. It sounds like try_prepare would just skip locking and continue with everything locked so far still locked?



+    ret = drm_exec_obj_locked(exec, obj);
+    if (unlikely(ret)) {
+        dma_resv_unlock(obj->resv);
+        goto error_dropref;
+    }
+
+    swap(exec->prelocked, obj);
+
+error_dropref:
+    /* Always cleanup the contention so that error handling can kick in */
+    drm_gem_object_put(obj);
+    exec->contended = NULL;
+    return ret;
+}
+
+/**
+ * drm_exec_prepare_obj - prepare a GEM object for use
+ * @exec: the drm_exec object with the state
+ * @obj: the GEM object to prepare
+ * @num_fences: how many fences to reserve
+ *
+ * Prepare a GEM object for use by locking it and reserving fence slots. All
+ * successfully locked objects are put into the locked container.
+ *
+ * Returns: -EDEADLK if a contention is detected, -EALREADY when object is + * already locked, -ENOMEM when memory allocation failed and zero for success.
+ */
+int drm_exec_prepare_obj(struct drm_exec *exec, struct drm_gem_object *obj,
+             unsigned int num_fences)
+{
+    int ret;
+
+    ret = drm_exec_lock_contended(exec);
+    if (unlikely(ret))
+        return ret;
+
+    if (exec->prelocked == obj) {
+        drm_gem_object_put(exec->prelocked);
+        exec->prelocked = NULL;
+
+        return dma_resv_reserve_fences(obj->resv, num_fences);
+    }
+
+    if (exec->flags & DRM_EXEC_FLAG_INTERRUPTIBLE)
+        ret = dma_resv_lock_interruptible(obj->resv, &exec->ticket);
+    else
+        ret = dma_resv_lock(obj->resv, &exec->ticket);
+
+    if (unlikely(ret == -EDEADLK)) {
+        drm_gem_object_get(obj);
+        exec->contended = obj;
+        return -EDEADLK;
+    }
+
+    if (unlikely(ret == -EALREADY &&
+        (exec->flags & DRM_EXEC_FLAG_ALLOW_DUPLICATES)))
+        goto reserve_fences;
+
+    if (unlikely(ret))
+        return ret;
+
+    ret = drm_exec_obj_locked(exec, obj);
+    if (ret)
+        goto error_unlock;
+
+reserve_fences:
+    /* Keep locked when reserving fences fails */
+    return dma_resv_reserve_fences(obj->resv, num_fences);

Ugh, what is the use-case for keeping things locked here? How would a caller tell the difference between an error where everything is locked and nothing is locked? IMO, we should unlock on error here. If there indeed is a use-case we should add a separate function for reserving fences for all locked objects, rather than returning sometimes locked on error sometime not.

We return the object locked here because it was to much churn to remove it again from the array and we are getting fully cleaned up at the end anyway.

OK, so if we add an unlock functionality, we could just have a consistent locking state on error return?

Thanks,
Thomas


Regards,
Christian.


Thanks,

Thomas


+
+error_unlock:
+    dma_resv_unlock(obj->resv);
+    return ret;
+}
+EXPORT_SYMBOL(drm_exec_prepare_obj);
+
+/**
+ * drm_exec_prepare_array - helper to prepare an array of objects
+ * @exec: the drm_exec object with the state
+ * @objects: array of GEM object to prepare
+ * @num_objects: number of GEM objects in the array
+ * @num_fences: number of fences to reserve on each GEM object
+ *
+ * Prepares all GEM objects in an array, handles contention but aports on first + * error otherwise. Reserves @num_fences on each GEM object after locking it.
+ *
+ * Returns: -EALREADY when object is already locked, -ENOMEM when memory
+ * allocation failed and zero for success.
+ */
+int drm_exec_prepare_array(struct drm_exec *exec,
+               struct drm_gem_object **objects,
+               unsigned int num_objects,
+               unsigned int num_fences)
+{
+    int ret;
+
+    for (unsigned int i = 0; i < num_objects; ++i) {
+        ret = drm_exec_prepare_obj(exec, objects[i], num_fences);
+        if (ret)
+            return ret;
+    }
+
+    return 0;
+}
+EXPORT_SYMBOL(drm_exec_prepare_array);
+
+MODULE_DESCRIPTION("DRM execution context");
+MODULE_LICENSE("Dual MIT/GPL");
diff --git a/include/drm/drm_exec.h b/include/drm/drm_exec.h
new file mode 100644
index 000000000000..b1a5da0509c1
--- /dev/null
+++ b/include/drm/drm_exec.h
@@ -0,0 +1,130 @@
+/* SPDX-License-Identifier: GPL-2.0 OR MIT */
+
+#ifndef __DRM_EXEC_H__
+#define __DRM_EXEC_H__
+
+#include <linux/ww_mutex.h>
+
+struct drm_gem_object;
+
+/**
+ * enum drm_exec_flags - Execution context flags
+ */
+enum drm_exec_flags {
+    /**
+     * DRM_EXEC_FLAG_INTERRUPTIBLE: Set to true to use interruptible locking
+     * functions.
+     */
+    DRM_EXEC_FLAG_INTERRUPTIBLE = BIT(0),
+
+    /**
+     * DRM_EXEC_FLAG_ALLOW_DUPLICATES: Set to true to allow EALREADY errors.
+     */
+    DRM_EXEC_FLAG_ALLOW_DUPLICATES = BIT(1),
+};
+
+/**
+ * struct drm_exec - Execution context
+ */
+struct drm_exec {
+    /**
+     * @flags: Combinations of DRM_EXEC_FLAG_* flags.
+     */
+    u32 flags;
+
+    /**
+     * @ticket: WW ticket used for acquiring locks
+     */
+    struct ww_acquire_ctx    ticket;
+
+    /**
+     * @num_objects: number of objects locked
+     */
+    unsigned int        num_objects;
+
+    /**
+     * @max_objects: maximum objects in array
+     */
+    unsigned int        max_objects;
+
+    /**
+     * @objects: array of the locked objects
+     */
+    struct drm_gem_object    **objects;
+
+    /**
+     * @contended: contended GEM object we backed off for
+     */
+    struct drm_gem_object    *contended;
+
+    /**
+     * @prelocked: already locked GEM object due to contention
+     */
+    struct drm_gem_object *prelocked;
+};
+
+/**
+ * drm_exec_for_each_locked_object - iterate over all the locked objects
+ * @exec: drm_exec object
+ * @index: unsigned long index for the iteration
+ * @obj: the current GEM object
+ *
+ * Iterate over all the locked GEM objects inside the drm_exec object.
+ */
+#define drm_exec_for_each_locked_object(exec, index, obj) \
+    for (index = 0, obj = (exec)->objects[0];        \
+         index < (exec)->num_objects;            \
+         ++index, obj = (exec)->objects[index])
+
+/**
+ * drm_exec_until_all_locked - retry objects preparation until all objects
+ * are locked
+ * @exec: drm_exec object
+ * @expr: expression to be evaluated on each attempt
+ *
+ * This helper tries to prepare objects and if a deadlock is detected,
+ * rollbacks and retries.
+ *
+ * @expr is typically a function that tries to prepare objects using
+ * drm_exec_prepare_obj().
+ *
+ * If we take drm_exec_prepare_array() as an example, you should do:
+ *
+ *    ret = drm_exec_until_all_locked(exec,
+ *                    drm_exec_prepare_array(exec,
+ *                                   objs,
+ *                                   num_objs,
+ *                                   num_fences));
+ *    if (ret)
+ *        goto error_path;
+ *
+ *    ...
+ *
+ * Returns: 0 on success, a negative error code on failure.
+ */
+#define drm_exec_until_all_locked(exec, expr)        \
+    ({                        \
+        __label__ retry;            \
+        int __ret;                \
+retry:                            \
+        __ret = expr;                \
+        if ((exec)->contended) {        \
+            WARN_ON(__ret != -EDEADLK);    \
+            drm_exec_reset(exec);        \
+            goto retry;            \
+        }                    \
+        ww_acquire_done(&(exec)->ticket);    \
+        __ret;                    \
+    })
+
+void drm_exec_init(struct drm_exec *exec, u32 flags);
+void drm_exec_fini(struct drm_exec *exec);
+void drm_exec_reset(struct drm_exec *exec);
+int drm_exec_prepare_obj(struct drm_exec *exec, struct drm_gem_object *obj,
+             unsigned int num_fences);
+int drm_exec_prepare_array(struct drm_exec *exec,
+               struct drm_gem_object **objects,
+               unsigned int num_objects,
+               unsigned int num_fences);
+
+#endif



[Index of Archives]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux