Re: [PATCH 1/3] drm/i915: audit bo->resource usage

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 31/08/2022 15:53, Matthew Auld wrote:
On 31/08/2022 14:34, Christian König wrote:
Am 31.08.22 um 14:50 schrieb Matthew Auld:
On 31/08/2022 13:35, Christian König wrote:
Am 31.08.22 um 14:06 schrieb Matthew Auld:
On 31/08/2022 12:03, Christian König wrote:
Am 31.08.22 um 12:37 schrieb Matthew Auld:
[SNIP]

That hopefully just leaves i915_ttm_shrink(), which is swapping out shmem ttm_tt and is calling ttm_bo_validate() with empty placements to force the pipeline-gutting path, which importantly unpopulates the ttm_tt for us (since ttm_tt_unpopulate is not exported it seems). But AFAICT it looks like that will now also nuke the bo->resource, instead of just leaving it in system memory. My assumption is that when later calling ttm_bo_validate(), it will just do the bo_move_null() in i915_ttm_move(), instead of re-populating the ttm_tt and then potentially copying it back to local-memory?

Well you do ttm_bo_validate() with something like GTT domain, don't you? This should result in re-populating the tt object, but I'm not 100% sure if that really works as expected.

AFAIK for domains we either have system memory (which uses ttm_tt and might be shmem underneath) or local-memory. But perhaps i915 is doing something wrong here, or abusing TTM in some way. I'm not sure tbh.

Anyway, I think we have two cases here:

- We have some system memory only object. After doing i915_ttm_shrink(), bo->resource is now NULL. We then call ttm_bo_validate() at some later point, but here we don't need to copy anything, but it also looks like ttm_bo_handle_move_mem() won't populate the ttm_tt or us either, since mem_type == TTM_PL_SYSTEM. It looks like i915_ttm_move() was taking care of this, but now we just call ttm_bo_move_null().

- We have a local-memory only object, which was evicted to shmem, and then swapped out by the shrinker like above. The bo->resource is NULL. However this time when calling ttm_bo_validate() we need to actually do a copy in i915_ttm_move(), as well as re-populate the ttm_tt. i915_ttm_move() was taking care of this, but now we just call ttm_bo_move_null().

Perhaps i915 is doing something wrong in the above two cases?

Mhm, as far as I can see that should still work.

See previously you should got a transition from SYSTEM->GTT in i915_ttm_move() to re-create your backing store. Not you get NULL->SYSTEM which is handled by ttm_bo_move_null() and then SYSTEM->GTT.

What is GTT here in TTM world? Also I'm not seeing where there is this SYSTEM->GTT transition? Maybe I'm blind. Just to be clear, i915 is only calling ttm_bo_validate() once when acquiring the pages, and we don't call it again, unless it was evicted (and potentially swapped out).

Well GTT means TTM_PL_TT.

And calling it only once is perfectly fine, TTM will internally see that we need two hops to reach TTM_PL_TT and so does the NULL->SYSTEM transition and then SYSTEM->TT.

Ah interesting, so that's what the multi-hop thing does. But AFAICT i915 is not using either TTM_PL_TT or -EMULTIHOP.

Mhm, it could be that we then have a problem and the i915 driver only sees NULL->TT directly. But I really don't know the i915 driver code good enough to judge that.

Can you take a look at this and test it maybe?

I'll grab a machine and try to see what is going on here.

Well at least the issue with the firmware not loading looks to be fixed now.

So running some eviction + oom tests it looks it now does:

/* eviction kicks in */
i915_ttm_move(bo):  LMEM -> PL_SYSTEM

/* shrinker/oom kicks in at some point */
i915_ttm_shrink(bo):
    bo->resource = NULL, /* pipeline_gutting */
    shmem ttm_tt is unpopulated and pages are correctly swapped out

/* user touches the same object later */
i915_ttm_move(bo):  NULL -> LMEM, bo_move_null()

So seems to incorrectly skip swapping it back in and then copy over to lmem. It just allocates directly in lmem.

And previously the last two steps would have been:

i915_ttm_shrink(bo):
    bo->resource = PL_SYSTEM, /* pipeline_gutting */
    shmem ttm_tt is unpopulated and pages are correctly swapped out

i915_ttm_move(bo):
    PL_SYSTEM -> LMEM,
    ttm_tt is repopulated and pages are copied over to lmem




Also what is the difference between TTM_PL_TT and TM_PL_SYSTEM? When should you use one over the other?

TTM_PL_SYSTEM means the device is not accessing the buffer and TTM has the control over the backing store and can swapout/swapin as it wants it.

TTM_PL_TT means that the device is accessing the data (TT stands for translation table) and so TTM can't swap the backing store in/out.

TTM_PL_VRAM well that one is obvious.

Thanks for the explanation. So it looks like i915 is using TTM_PL_SYSTEM even for device access it seems.


Thanks,
Christian.



As far as I can see that should work like it did before.

Christian.



If you just validated to SYSTEM memory before I think the tt object wouldn't have been populated either.

Regards,
Christian.



Thanks,
Christian.



I've been considering to replacing the ttm_bo_type with a bunch of behavior flags for a bo. I'm hoping that this will clean things up a bit.

Regards,
Christian.


      caching = i915_ttm_select_tt_caching(obj);
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c b/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
index 9a7e50534b84bb..c420d1ab605b6f 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c
@@ -560,7 +560,7 @@ int i915_ttm_move(struct ttm_buffer_object *bo, bool evict,
      bool clear;
      int ret;
-    if (GEM_WARN_ON(!obj)) {
+    if (GEM_WARN_ON(!obj) || !bo->resource) {
          ttm_bo_move_null(bo, dst_mem);
          return 0;
      }









[Index of Archives]     [AMD Graphics]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux