Re: [PATCH 37/45] drm/ttm: add a helper to allocate a temp tt for copies.

Christian König <christian.koenig@xxxxxxx> · Fri, 25 Sep 2020 15:40:32 +0200

Am 25.09.20 um 15:17 schrieb Daniel Vetter:
[SNIP]
Eviction is not a problem because the driver gets asked where to put an
evicted BO and then TTM does all the moving.
Hm I guess then I don't quite get where you see the ping-pong
happening, I thought that only happens for evicting stuff.

No, eviction is the nice case. One step after another.

E.g. from VRAM to GTT, then from GTT to SYSTEM and then eventually 
swapped out.

But hey not much actual working experience with ttm over here, I'm just reading
:-) I thought the issue is that ttm wants to evict from $something to
SYSTEM, and to do that the driver first needs to set a GTT mapping for
the SYSTEM ttm_resource allocation, so that it can use the
blitter/sdma engine or whatever to move the data over. But for
swap-in/validation I'm confused how you can end up with the "wrong"
placement, that feels like a driver bug.

The problem happens in the other direction and always directly triggered 
by the driver.

How exactly can you get into a situation with validation where ttm
gives you SYSTEM, but not GTT and the driver has to fix that up? I'm
not really following I think, I guess there's something obvious I'm
missing.

For example a buffer which was evicted to SYSTEM needs to be moved back 
directly to VRAM.

Or we want to evacuate all buffers from VRAM to SYSTEM because of 
hibernation.

etc....

Or should we instead move the entire eviction logic out from ttm into
drivers, building it up from helpers?
I've been playing with that thought for a while as well, but then
decided against it.

The main problem I see is that we sometimes need to evict things from
other drivers.

E.g. when we overcommitted system memory and move things to swap for
example.
Hm yeah ttm has that limit to avoid stepping into the shrinker,
directly calling into another driver to keep within the limit while
ignoring that there's other memory users and caches out there still
feels wrong, it's kinda a parallel world vs shrinker callbacks. And
there's nothing stopping you from doing the SYSTEM->SWAP movement from
a shrinker callback with the locking rules we've established around
dma_resv (just needs to be a trylock).

So feels a bit backwards if we design ttm eviction around this part of it ...

Ok, that's a good point. Need to think about that a bit more then.

Then drivers which need gtt for
moving stuff out of vram can do that right away. Also, this would
allow us to implement very fancy eviction algorithms like all the
nonsense we're doing in i915 for gtt handling on gen2/3 (but I really
hope that never ever becomes a thing again in future gpus, so this is
maybe more a what-if kind of thing). Not sure how that would look
like, maybe a special validate function which takes a ttm_resource the
driver already found (through evicting stuff or whatever) and then ttm
just does the move and book-keeping and everything. And drivers would
at first only call validate without allowing any eviction. Ofc anyone
without special needs could use the standard eviction function that
validate already has.
Spinning this a bit more, we could have different default eviction
functions with this, e.g. so all the drivers that need gtt mapping for
moving stuff around can share that code, but with specific&flat
control flow instead of lots of ping-ping. And drivers that don't need
gtt mapping (like i915, we just need dma_map_sg which we assume works
always, or something from the ttm dma page pool, which really always
works) can then use something simpler that's completely flat.
Ok you need to explain a bit more what exactly the problem with the GTT
eviction is here :)
So the full set of limitations are
- range limits
- power-of-two alignement of start
- some other (smaller) power of two alignment for size (lol)
- "color", i.e. different caching modes needs at least one page of
empty space in-between

Stuffing all that into a generic eviction logic is imo silly. On top
of that we have the eviction collector where we scan the entire thing
until we've built up a sufficiently big hole, then evict just these
buffers. If we don't do this then pretty much any big buffer with
constraints results in the entire GTT getting evicted. Again something
that's only worth it if you have ridiculous placement constraints like
these old intel chips. gen2/3 in i915.ko is maybe a bit extreme, but
having the driver in control of the eviction code feels like a much
better design than ttm inflicting a one-size-fits-all on everyone. Ofc
with defaults and building blocks and all that.

Yeah, that makes a lot of sense.

Christian.

-Daniel

_______________________________________________
dri-devel mailing list
dri-devel@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/dri-devel