Re: [PATCH 1/2] drm/ttm: cleanup and add TTM_PL_FLAG_TEMPORARY

Thomas Hellström (Intel) <thomas_os@xxxxxxxxxxxx> · Mon, 31 May 2021 14:09:59 +0200

On 5/31/21 2:02 PM, Christian König wrote:
Am 31.05.21 um 13:19 schrieb Thomas Hellström (Intel):

On 5/31/21 12:56 PM, Christian König wrote:
Am 31.05.21 um 12:46 schrieb Thomas Hellström (Intel):

On 5/31/21 12:32 PM, Christian König wrote:
Am 31.05.21 um 11:52 schrieb Thomas Hellström (Intel):
Hi, Lang,

On 5/31/21 10:19 AM, Yu, Lang wrote:
[Public]

Hi,
On 5/27/21 3:30 AM, Lang Yu wrote:
Make TTM_PL_FLAG_* start from zero and add
TTM_PL_FLAG_TEMPORARY flag for temporary
GTT allocation use.
GTT is a driver private acronym, right? And it doesn't look like
TTM_PL_FLAG_TEMPORARY will be used in core TTM, so should we 
instead set
aside a mask in the PL flag for driver-private use?
Hi Thomas,

Thanks for your comments and advice, GTT means Graphics 
Translation Table here, seems
a general acronym. TTM_PL_FLAG_TEMPORARY may also be used by 
other drives.
I have made other patches for this. Please help review.

Regards,
Lang

My point was really that the flag naming and documentation should 
reflect what core ttm is doing with that flag. If there is no 
specific core TTM usage, IMO we should move it to a driver 
specific flag to avoid future confusion. In particular a writer 
of a generic TTM resource manager should be able to know without 
looking at an old commit message what the placement flag is 
intended for.

So here we add a flag where core TTM forces a bo move on validate 
and that's it. And that appears to be how it's used when an 
amdgpu bo is evicted to GTT, (btw should it be accounted in this 
situation?)

The other use is to force the amdgpu driver to temporarily accept 
it into GTT when there is a lack of space, and IMHO that's a 
driver specific use and we should allocate a mask for driver 
specific flags for that.

So shouldn't this be two flags, really?

Well one flag makes sense for the use case at hand that drivers 
want to signal to TTM that an allocation is only temporary and not 
considered valid.

That we then use this flag to implement temporary GTT allocations 
to avoid problems during eviction is just extending that use case.

OK, but it looked like there were actually two use-cases. One for 
possibly long-term VRAM evictions to GTT, the other for the 
temporary use-case where the hop resource allocations sometimes 
failed. Or did I misunderstand the code?

Ok sounds like we need more documentation here. That's really one 
use case.

Key point is we need temporary allocation during multihop which 
should be handled differently to normal allocations.

Yes, that part is clear from the patches. The part that I can't fit 
into that pattern is why the evict flags when evicting from visible 
VRAM to GTT or ordinary VRAM is marked with TTM_PL_FLAG_TEMPORARY. 
Wouldn't those remain evicted in that placement until re-validated to 
visible VRAM at an unknown future time?

Not necessarily.

The situation we ran into was the following:
1. OOM on VRAM, we try to evict.

2. GTT space is used up as well, ok evict directly to SYSTEM.

3. For VRAM->SYSTEM eviction we use a temporary bounce buffer.

4. Waiting for the bounce buffer to become idle is interrupted by a 
signal so BO is still backed by bounce buffer.

5. Next CS, BO is validated with VRAM|GTT. TTM sees that it is in GTT 
and doesn't move the buffer.

6. CS goes into nirvana because bounce buffers are not meant to be 
used for CS (we can ignore alignment and accounting for them).

Yes, makes sense to me.

/Thomas