Re: [PATCH 3/5] dma-fence: Add a single fence fast path for fence merging

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



And pushed to drm-misc-next.

Sorry I'm still catching up from the holidays,
Christian.

Am 09.01.25 um 11:53 schrieb Tvrtko Ursulin:

Christian - it looks this patch could be merged now.

Thanks,

Tvrtko

On 15/11/2024 10:21, Tvrtko Ursulin wrote:
From: Tvrtko Ursulin <tvrtko.ursulin@xxxxxxxxxx>

Testing some workloads in two different scenarios, such as games running
under Gamescope on a Steam Deck, or vkcube under a Plasma desktop, shows
that in a significant portion of calls the dma_fence_unwrap_merge helper
is called with just a single unsignalled fence.

Therefore it is worthile to add a fast path for that case and so bypass
the memory allocation and insertion sort attempts.

Tested scenarios:

1) Hogwarts Legacy under Gamescope

~1500 calls per second to __dma_fence_unwrap_merge.

Percentages per number of fences buckets, before and after checking for
signalled status, sorting and flattening:

    N       Before      After
    0       0.85%
    1      69.80%        ->   The new fast path.
   2-9     29.36%        9%   (Ie. 91% of this bucket flattened to 1 fence)
  10-19
  20-40
   50+

2) Cyberpunk 2077 under Gamescope

~2400 calls per second.

    N       Before      After
    0       0.71%
    1      52.53%        ->    The new fast path.
   2-9     44.38%      50.60%  (Ie. half resolved to a single fence)
  10-19     2.34%
  20-40     0.06%
   50+

3) vkcube under Plasma

90 calls per second.

    N       Before      After
    0
    1
   2-9      100%         0%   (Ie. all resolved to a single fence)
  10-19
  20-40
   50+

In the case of vkcube all invocations in the 2-9 bucket were actually
just two input fences.

v2:
  * Correct local variable name and hold on to unsignaled reference. (Chistian)

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@xxxxxxxxxx>
Cc: Christian König <christian.koenig@xxxxxxx>
Cc: Friedrich Vock <friedrich.vock@xxxxxx>
---
  drivers/dma-buf/dma-fence-unwrap.c | 11 ++++++++++-
  1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/drivers/dma-buf/dma-fence-unwrap.c b/drivers/dma-buf/dma-fence-unwrap.c
index 6345062731f1..2a059ac0ed27 100644
--- a/drivers/dma-buf/dma-fence-unwrap.c
+++ b/drivers/dma-buf/dma-fence-unwrap.c
@@ -84,8 +84,8 @@ struct dma_fence *__dma_fence_unwrap_merge(unsigned int num_fences,
                         struct dma_fence **fences,
                         struct dma_fence_unwrap *iter)
  {
+    struct dma_fence *tmp, *unsignaled = NULL, **array;
      struct dma_fence_array *result;
-    struct dma_fence *tmp, **array;
      ktime_t timestamp;
      int i, j, count;
  @@ -94,6 +94,8 @@ struct dma_fence *__dma_fence_unwrap_merge(unsigned int num_fences,
      for (i = 0; i < num_fences; ++i) {
          dma_fence_unwrap_for_each(tmp, &iter[i], fences[i]) {
              if (!dma_fence_is_signaled(tmp)) {
+                dma_fence_put(unsignaled);
+                unsignaled = dma_fence_get(tmp);
                  ++count;
              } else {
                  ktime_t t = dma_fence_timestamp(tmp);
@@ -107,9 +109,16 @@ struct dma_fence *__dma_fence_unwrap_merge(unsigned int num_fences,
      /*
       * If we couldn't find a pending fence just return a private signaled
       * fence with the timestamp of the last signaled one.
+     *
+     * Or if there was a single unsignaled fence left we can return it
+     * directly and early since that is a major path on many workloads.
       */
      if (count == 0)
          return dma_fence_allocate_private_stub(timestamp);
+    else if (count == 1)
+        return unsignaled;
+
+    dma_fence_put(unsignaled);
        array = kmalloc_array(count, sizeof(*array), GFP_KERNEL);
      if (!array)




[Index of Archives]     [Linux DRI Users]     [Linux Intel Graphics]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [XFree86]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Linux Kernel]     [Linux SCSI]     [XFree86]
  Powered by Linux