Re: [Intel-xe] ttm_bo and multiple backing store segments

Christian König <christian.koenig@xxxxxxx> · Wed, 19 Jul 2023 11:02:32 +0200

Hi guys,

massive sorry for the delayed response, this mail felt totally through 
my radar without being noticed.

Am 17.07.23 um 19:24 schrieb Rodrigo Vivi:
On Thu, Jun 29, 2023 at 02:10:58PM -0700, Welty, Brian wrote:
Hi Christian / Thomas,

Wanted to ask if you have explored or thought about adding support in TTM
such that a ttm_bo could have more than one underlying backing store segment
(that is, to have a tree of ttm_resources)?

We already use something similar on amdgpu where basically the VRAM 
resources are stitched together from multiple backing pages.

That is not exactly the same, but it comes close.

We are considering to support such BOs for Intel Xe driver.
They are indeed the best one to give an opinion here.
I just have some dummy questions and comments below.

Some of the benefits:
  * devices with page fault support can fault (and migrate) backing store
    at finer granularity than the entire BO

We've considered that once as well and I even started hacking something 
together, but the problem was that at least at that point it wasn't 
doable because of limitations in the Linux memory management.

Basically the extended attributes used to control caching of pages where 
only definable per VMA! So when one piece of the BO would have been in 
uncached VRAM while another piece would be in cached system system 
memory you immediately ran into problems.

I think that issue is fixed by now, but I'm not 100% sure.

In general I think it might be beneficial, but I'm not 100% sure if it's 
worth the additional complexity.

Regards,
Christian.

what advantage does this bring? to each workload?
is it a performance on huge bo?

  * BOs can support having multiple backing store segments, which can be
    in different memory domains/regions
what locking challenges would this bring?
is this more targeting gpu + cpu? or only for our multi-tile platforms?
and what's the advantage this is bringing to real use cases?
(probably the svm/hmm question below answers my questions, but...)

  * BO eviction could operate on smaller granularity than entire BO
I believe all the previous doubts apply to this item as well...

Or is the thinking that workloads should use SVM/HMM instead of GEM_CREATE
if they want above benefits?

Is this something you are open to seeing an RFC series that starts perhaps
with just extending ttm_bo_validate() to see how this might shape up?
Imho the RFC always help... a piece of code to see the idea usually draws
more attention from devs than ask in text mode. But more text explaining
the reasons behind are also helpful even with the RFC.

Thanks,
Rodrigo.

-Brian