On 9/20/23 09:44, Thomas Hellström wrote:
Hi,
On 9/20/23 07:37, Christian König wrote:
Am 19.09.23 um 17:23 schrieb Thomas Hellström:
On 9/19/23 17:16, Danilo Krummrich wrote:
On 9/19/23 14:21, Thomas Hellström wrote:
Hi Christian
On 9/19/23 14:07, Christian König wrote:
Am 13.09.23 um 17:46 schrieb Danilo Krummrich:
On 9/13/23 17:33, Christian König wrote:
Am 13.09.23 um 17:15 schrieb Danilo Krummrich:
On 9/13/23 16:26, Christian König wrote:
Am 13.09.23 um 14:16 schrieb Danilo Krummrich:
As mentioned in a different mail thread, the reply is based
on the assumption
that we don't support anything else than GPUVM updates from
the IOCTL.
I think that this assumption is incorrect.
Well, more precisely I should have said "don't support GPUVM
updated from within
fence signaling critical sections". And looking at the code,
that doesn't seem what
you're doing there.
Vulkan is just once specific use case, but this here should
probably be able to handle other use cases as well.
Especially with HMM you get the requirement that you need to
be able to invalidate GPUVM mappings without grabbing a
reservation lock.
What do you mean with "invalidate GPUVM mappings" in this
context? drm_gpuvm_bo_evict()
should only be called from a ttm_device_funcs::move callback,
we should hold the dma-resv
lock there.
Well the question is which dma-resv lock do we hold?
In the move callback we only hold the dma-resv lock of the BO
which is moved, but when that is a shared BO then that's not
the same as the one for the VM.
Correct, Thomas' idea was to use the GEM's dma_resv lock to
protect drm_gpuvm_bo::evicted
and then actually move the drm_gpuvm_bo to the VM's evicted list
once we grabbed all
dma-resv locks when locking the VM's BOs using drm_exec. We can
remove them from the evicted
list on validate(). This way we never touch the evicted list
without holding at least the VM's
dma-resv lock.
Do you have any concerns about that?
Scratching my head a bit how that is supposed to work.
This implies that you go over all the evicted BOs during
validation and not just the one mentioned in the CS.
That might work for Vulkan, but is pretty much a no-go for OpenGL.
See what the eviction lock in amdgpu is doing for example.
The eviction_lock seems to protect a VM state "evicting" of
whether any BO that
is associated with the VM is currently evicting. At the same
time amdgpu protects
the eviceted list of the VM with a different lock. So this
seems to be entirely
unrelated. Tracking a "currently evicting" state is not part
of the GPUVM
implementation currently and hence nothing would change for
amdgpu there.
Sorry for the confusion we use different terminology in amdgpu.
The eviction lock and evicted state is for the VM page tables,
e.g. if the whole VM is currently not used and swapped out or
even de-allocated.
This is necessary because we have cases where we need to access
the VM data without holding the dma-resv lock of this VM.
Especially figuring out which parts of an address space contain
mappings and which doesn't.
I think this is fine, this has nothing to do with lists of
evicted GEM objects or external GEM
objects, right? Marking mappings (drm_gpuva) as invalidated
(DRM_GPUVA_INVALIDATED) or accessing
the VA space does not require any dma-resv locks.
I hope so, but I'm not 100% sure.
This is a requirement which comes with HMM handling, you won't
see this with Vulkan (or OpenGL, VAAPI etc..).
The invalidation lock on the other hand is what in this
discussion is called eviction lock. This one is needed because
what I wrote above, during the move callback only the dma-resv
of the BO which is moved is locked, but not necessarily the
dma-resv of the VM.
That's yet another thing, right? This is used to track whether
*any* BO that belongs to the VM is
currently being evicted, correct? As mentioned, as by now this
is not supported in GPUVM and hence
would be the same driver specific code with the same driver
specifc lock.
That is most likely a show stopper using this for OpenGL based
workloads as far as I can see. For those you need to able to
figure out which non-VM BOs have been evicted and which parts of
the VM needs updates.
We identify those with a bool in the gpuvm_bo, and that bool is
protected by the bo_resv. In essence, the "evicted" list must be
made up-to-date with all relevant locks held before traversing in
the next exec.
What I still miss with this idea is how do we find all the
drm_gpuvm_bo structures with the evicted bool set to true? When
doing the drm_exec dance we come across all external ones and can
add them to the list if needed, but what about the BOs having the
VM's dma-resv?
Oh, they can be added to the evict list directly (no bool needed) in
the eviction code, like in v3. Since for those we indeed hold the
VM's dma_resv since it's aliased with the object's dma-resv.
Yeah, I wanted to note what Danilo seems to think about as well. How
do we figure out the non-VM BOs evicted?
We can't walk over the list of all non-VM BOs on every submission,
that's to much overhead for cases with lots of non-VM BOs.
And we can't rely on userspace sending all non-VM BOs as used list
down to the kernel with each submission.
Regards,
Christian.
No, that's not needed: Mechanism below.
1) We maintain an evicted list. Typically protected by the vm resv.
2) Each gpuvm_bo has a bool "evicted". Protected by the bo resv.
a) Evicting a vm bo: The vm resv is held by the eviction code. Just
put it on the evicted list.
b) Evicting a shared/external bo: The bo resv is held by the eviction
code. Set the "evicted" bool
c) Validating the evicted list on exec: Loop through all
*external/shared* bos. Lock them. After locking, check the "evicted"
bool, if it's true. put the bo on the evicted list (we hold the VM
resv at this point) and clear the "evicted" bool. Note that other vms
will have their own gpuvm_bo which is marked evicted.
I have this coded up in a patch for Xe and it seems to be working
properly.
/Thomas
Something along the lines of the attach patch.
From 12778a3f1b2ca055ff658864c538f944550c9adf Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Thomas=20Hellstr=C3=B6m?= <thomas.hellstrom@xxxxxxxxxxxxxxx>
Date: Thu, 14 Sep 2023 10:23:52 +0200
Subject: [PATCH] drm/gpuvm: Adjustment for extobj eviction.
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Signed-off-by: Thomas Hellström <thomas.hellstrom@xxxxxxxxxxxxxxx>
---
drivers/gpu/drm/drm_gpuvm.c | 32 ++++++++++++++++++++++++--------
include/drm/drm_gpuvm.h | 7 ++++++-
2 files changed, 30 insertions(+), 9 deletions(-)
diff --git a/drivers/gpu/drm/drm_gpuvm.c b/drivers/gpu/drm/drm_gpuvm.c
index 11a0aee1c038..029c38d7fa4d 100644
--- a/drivers/gpu/drm/drm_gpuvm.c
+++ b/drivers/gpu/drm/drm_gpuvm.c
@@ -956,6 +956,11 @@ drm_gpuvm_prepare_objects(struct drm_gpuvm *gpuvm,
ret = drm_exec_prepare_obj(exec, vm_bo->obj, num_fences);
if (ret)
break;
+
+ if (vm_bo->evicted) {
+ drm_gpuvm_bo_list_add(vm_bo, evict);
+ vm_bo->evicted = false;
+ }
}
/* Drop ref in case we break out of the loop. */
drm_gpuvm_bo_put(vm_bo);
@@ -1431,6 +1436,21 @@ drm_gpuvm_bo_extobj_add(struct drm_gpuvm_bo *vm_bo)
}
EXPORT_SYMBOL_GPL(drm_gpuvm_bo_extobj_add);
+void
+drm_gpuvm_bo_evict(struct drm_gpuvm_bo *vm_bo, bool evict)
+{
+ if (drm_gpuvm_is_extobj(vm_bo->vm, vm_bo->obj)) {
+ vm_bo->evicted = evict;
+ return;
+ }
+
+ if (evict)
+ drm_gpuvm_bo_list_add(vm_bo, evict);
+ else
+ drm_gpuvm_bo_list_del(vm_bo, evict);
+}
+EXPORT_SYMBOL(drm_gpuvm_bo_evict);
+
/**
* drm_gpuvm_bo_evict() - add / remove a &drm_gem_object to / from a
* &drm_gpuvms evicted list
@@ -1441,18 +1461,14 @@ EXPORT_SYMBOL_GPL(drm_gpuvm_bo_extobj_add);
* list containing a mapping of this &drm_gem_object.
*/
void
-drm_gpuvm_bo_evict(struct drm_gem_object *obj, bool evict)
+drm_gpuvm_gem_evict(struct drm_gem_object *obj, bool evict)
{
struct drm_gpuvm_bo *vm_bo;
- drm_gem_for_each_gpuvm_bo(vm_bo, obj) {
- if (evict)
- drm_gpuvm_bo_list_add(vm_bo, evict);
- else
- drm_gpuvm_bo_list_del(vm_bo, evict);
- }
+ drm_gem_for_each_gpuvm_bo(vm_bo, obj)
+ drm_gpuvm_bo_evict(vm_bo, evict);
}
-EXPORT_SYMBOL_GPL(drm_gpuvm_bo_evict);
+EXPORT_SYMBOL_GPL(drm_gpuvm_gem_evict);
static int
__drm_gpuva_insert(struct drm_gpuvm *gpuvm,
diff --git a/include/drm/drm_gpuvm.h b/include/drm/drm_gpuvm.h
index dce26a923d5d..c2216f18243f 100644
--- a/include/drm/drm_gpuvm.h
+++ b/include/drm/drm_gpuvm.h
@@ -550,6 +550,9 @@ struct drm_gpuvm_bo {
*/
struct kref kref;
+ /** @evicted: Whether the bo needs revalidation and rebinding. */
+ bool evicted;
+
/**
* @list: Structure containing all &list_heads.
*/
@@ -615,7 +618,9 @@ struct drm_gpuvm_bo *
drm_gpuvm_bo_find(struct drm_gpuvm *gpuvm,
struct drm_gem_object *obj);
-void drm_gpuvm_bo_evict(struct drm_gem_object *obj, bool evict);
+void drm_gpuvm_bo_evict(struct drm_gpuvm_bo *vm_bo, bool evict);
+
+void drm_gpuvm_gem_evict(struct drm_gem_object *obj, bool evict);
void drm_gpuvm_bo_extobj_add(struct drm_gpuvm_bo *vm_bo);
/**
--
2.41.0