Re: [RFC 1/5] drm/amdgpu: Fix migration rate limiting accounting

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 





On 15/05/2024 08:14, Christian König wrote:
Am 08.05.24 um 20:09 schrieb Tvrtko Ursulin:
From: Tvrtko Ursulin <tvrtko.ursulin@xxxxxxxxxx>

The logic assumed any migration attempt worked and therefore would over-
account the amount of data migrated during buffer re-validation. As a
consequence client can be unfairly penalised by incorrectly considering
its migration budget spent.

Fix it by looking at the before and after buffer object backing store and
only account if there was a change.

FIXME:
I think this needs a better solution to account for migrations between
VRAM visible and non-visible portions.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@xxxxxxxxxx>
Cc: Christian König <christian.koenig@xxxxxxx>
Cc: Friedrich Vock <friedrich.vock@xxxxxx>
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 26 +++++++++++++++++++++-----
  1 file changed, 21 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
index ec888fc6ead8..22708954ae68 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
@@ -784,12 +784,15 @@ static int amdgpu_cs_bo_validate(void *param, struct amdgpu_bo *bo)
          .no_wait_gpu = false,
          .resv = bo->tbo.base.resv
      };
+    struct ttm_resource *old_res;
      uint32_t domain;
      int r;
      if (bo->tbo.pin_count)
          return 0;
+    old_res = bo->tbo.resource;
+
      /* Don't move this buffer if we have depleted our allowance
       * to move it. Don't move anything if the threshold is zero.
       */
@@ -817,16 +820,29 @@ static int amdgpu_cs_bo_validate(void *param, struct amdgpu_bo *bo)
      amdgpu_bo_placement_from_domain(bo, domain);
      r = ttm_bo_validate(&bo->tbo, &bo->placement, &ctx);
-    p->bytes_moved += ctx.bytes_moved;
-    if (!amdgpu_gmc_vram_full_visible(&adev->gmc) &&
-        amdgpu_res_cpu_visible(adev, bo->tbo.resource))
-        p->bytes_moved_vis += ctx.bytes_moved;
-
      if (unlikely(r == -ENOMEM) && domain != bo->allowed_domains) {
          domain = bo->allowed_domains;
          goto retry;
      }
+    if (!r) {
+        struct ttm_resource *new_res = bo->tbo.resource;
+        bool moved = true;
+
+        if (old_res == new_res)
+            moved = false;
+        else if (old_res && new_res &&
+             old_res->mem_type == new_res->mem_type)
+            moved = false;

The old resource might already be destroyed after you return from validation. So this here won't work.

Apart from that even when a migration attempt fails the moved bytes should be accounted.

When the validation attempt doesn't caused any moves then the bytecount here would be zero.

So as far as I can see that is as fair as you can get.

Right, I think I suffered a bit of tunnel vision here and completely ignore the _ctx_.moved_bytes part. Scratch this one too then.

Regards,

Tvrtko


Regards,
Christian.

PS: Looks like our mail servers are once more not very reliable.

If you get mails from me multiple times please just ignore it.

+
+        if (moved) {
+            p->bytes_moved += ctx.bytes_moved;
+            if (!amdgpu_gmc_vram_full_visible(&adev->gmc) &&
+                amdgpu_res_cpu_visible(adev, bo->tbo.resource))
+                p->bytes_moved_vis += ctx.bytes_moved;
+        }
+    }
+
      return r;
  }




[Index of Archives]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux