-----Original Message-----
From: Zeng, Oak
Sent: August 21, 2023 11:07 AM
To: Dave Airlie <airlied@xxxxxxxxx>
Cc: Brost, Matthew <matthew.brost@xxxxxxxxx>; Thomas Hellström
<thomas.hellstrom@xxxxxxxxxxxxxxx>; Philip Yang <Philip.Yang@xxxxxxx>; Felix
Kuehling <felix.kuehling@xxxxxxx>; dri-devel@xxxxxxxxxxxxxxxxxxxxx; intel-
xe@xxxxxxxxxxxxxxxxxxxxx; Vishwanathapura, Niranjana
<niranjana.vishwanathapura@xxxxxxxxx>; Christian König
<christian.koenig@xxxxxxx>
Subject: RE: Implement svm without BO concept in xe driver
-----Original Message-----
From: dri-devel <dri-devel-bounces@xxxxxxxxxxxxxxxxxxxxx> On Behalf Of Dave
Airlie
Sent: August 20, 2023 6:21 PM
To: Zeng, Oak <oak.zeng@xxxxxxxxx>
Cc: Brost, Matthew <matthew.brost@xxxxxxxxx>; Thomas Hellström
<thomas.hellstrom@xxxxxxxxxxxxxxx>; Philip Yang <Philip.Yang@xxxxxxx>;
Felix
Kuehling <felix.kuehling@xxxxxxx>; Welty, Brian <brian.welty@xxxxxxxxx>;
dri-
devel@xxxxxxxxxxxxxxxxxxxxx; intel-xe@xxxxxxxxxxxxxxxxxxxxx; Vishwanathapura,
Niranjana <niranjana.vishwanathapura@xxxxxxxxx>; Christian König
<christian.koenig@xxxxxxx>
Subject: Re: Implement svm without BO concept in xe driver
On Thu, 17 Aug 2023 at 12:13, Zeng, Oak <oak.zeng@xxxxxxxxx> wrote:
-----Original Message-----
From: Dave Airlie <airlied@xxxxxxxxx>
Sent: August 16, 2023 6:52 PM
To: Felix Kuehling <felix.kuehling@xxxxxxx>
Cc: Zeng, Oak <oak.zeng@xxxxxxxxx>; Christian König
<christian.koenig@xxxxxxx>; Thomas Hellström
<thomas.hellstrom@xxxxxxxxxxxxxxx>; Brost, Matthew
<matthew.brost@xxxxxxxxx>; maarten.lankhorst@xxxxxxxxxxxxxxx;
Vishwanathapura, Niranjana <niranjana.vishwanathapura@xxxxxxxxx>;
Welty,
Brian <brian.welty@xxxxxxxxx>; Philip Yang <Philip.Yang@xxxxxxx>; intel-
xe@xxxxxxxxxxxxxxxxxxxxx; dri-devel@xxxxxxxxxxxxxxxxxxxxx
Subject: Re: Implement svm without BO concept in xe driver
On Thu, 17 Aug 2023 at 08:15, Felix Kuehling <felix.kuehling@xxxxxxx>
wrote:
On 2023-08-16 13:30, Zeng, Oak wrote:
I spoke with Thomas. We discussed two approaches:
1) make ttm_resource a central place for vram management functions
such as
eviction, cgroup memory accounting. Both the BO-based driver and BO-less
SVM
codes call into ttm_resource_alloc/free functions for vram allocation/free.
*This way BO driver and SVM driver shares the eviction/cgroup logic,
no
need to reimplment LRU eviction list in SVM driver. Cgroup logic should be
in
ttm_resource layer. +Maarten.
*ttm_resource is not a perfect match for SVM to allocate vram. It is
still
a
big overhead. The *bo* member of ttm_resource is not needed for SVM -
this
might end up with invasive changes to ttm...need to look into more details
Overhead is a problem. We'd want to be able to allocate, free and evict
memory at a similar granularity as our preferred migration and page
fault granularity, which defaults to 2MB in our SVM implementation.
2) svm code allocate memory directly from drm-buddy allocator, and
expose
memory eviction functions from both ttm and svm so they can evict
memory
from each other. For example, expose the ttm_mem_evict_first function
from
ttm side so hmm/svm code can call it; expose a similar function from svm
side
so
ttm can evict hmm memory.
I like this option. One thing that needs some thought with this is how
to get some semblance of fairness between the two types of clients.
Basically how to choose what to evict. And what share of the available
memory does each side get to use on average. E.g. an idle client may get
all its memory evicted while a busy client may get a bigger share of the
available memory.
I'd also like to suggest we try to write any management/generic code
in driver agnostic way as much as possible here. I don't really see
much hw difference should be influencing it.
I do worry about having effectively 2 LRUs here, you can't really have
two "leasts".
Like if we hit the shrinker paths who goes first? do we shrink one
object from each side in turn?
One way to solve this fairness problem is to create a driver agnostic
drm_vram_mgr. Maintain a single LRU in drm_vram_mgr. Move the memory
eviction/cgroups memory accounting logic from ttm_resource manager to
drm_vram_mgr. Both BO-based driver and SVM driver calls to drm_vram_mgr
to
allocate/free memory.
I am not sure whether this meets the 2M allocate/free/evict granularity
requirement Felix mentioned above. SVM can allocate 2M size blocks. But BO
driver should be able to allocate any arbitrary sized blocks - So the eviction is
also
arbitrary size.
Also will we have systems where we can expose system SVM but userspace
may choose to not use the fine grained SVM and use one of the older
modes, will that path get emulated on top of SVM or use the BO paths?
If by "older modes" you meant the gem_bo_create (such as xe_gem_create
or
amdgpu_gem_create), then today both amd and intel implement those
interfaces using BO path. We don't have a plan to emulate that old mode on
tope
of SVM, afaict.
I'm not sure how the older modes manifest in the kernel I assume as bo
creates (but they may use userptr), SVM isn't a specific thing, it's a
group of 3 things.
1) coarse-grained SVM which I think is BO
2) fine-grained SVM which is page level
3) fine-grained system SVM which is HMM
I suppose I'm asking about the previous versions and how they would
operate in a system SVM capable system.
I got your question now.
As I understand it, the system SVM provides similar functionality as BO-based
SVM (i.e., share virtual address space b/t cpu and gpu program, no explicit
memory placement for gpu program), but they have different user interface
(malloc, mmap vs bo create, vm bind).
From functionality perspective, on a system SVM capable system, we don't need
#1/#2. Once #3 is implemented and turned out be as performant as #1/#2, we
can ask user space to switch to #3.
As far as I know, AMD doesn't have #1/#2 - their BO-based driver *requires* all
valid GPU virtual address be mapped to GPU page table *before* GPU kernel
submission, aka a GPU page fault is treated as fatal. Felix please fix me, as my
AMD knowledge is fading away...