On 25.04.24 09:15, Christian König wrote:
Am 25.04.24 um 09:06 schrieb Friedrich Vock:
On 25.04.24 08:58, Christian König wrote:
Am 25.04.24 um 08:46 schrieb Friedrich Vock:
On 25.04.24 08:32, Christian König wrote:
Am 24.04.24 um 18:57 schrieb Friedrich Vock:
Used by userspace to adjust buffer priorities in response to
changes in
application demand and memory pressure.
Yeah, that was discussed over and over again. One big design criteria
is that we can't have global priorities from userspace!
The background here is that this can trivially be abused.
I see your point when apps are allowed to prioritize themselves above
other apps, and I agree that should probably be disallowed at least
for
unprivileged apps.
Disallowing this is a pretty trivial change though, and I don't really
see the abuse potential in being able to downgrade your own priority?
Yeah, I know what you mean and I'm also leaning towards that
argumentation. But another good point is also that it doesn't actually
help.
For example when you have desktop apps fighting with a game, you
probably don't want to use static priorities, but rather evict the
apps which are inactive and keep the apps which are active in the
background.
Sadly things are not as simple as "evict everything from app 1, keep
everything from app 2 active". The simplest failure case of this is
games that already oversubscribe VRAM on their own. Keeping the whole
app inside VRAM is literally impossible there, and it helps a lot to
know which buffers the app is most happy with evicting.
In other words the priority just tells you which stuff from each app
to evict first, but not which app to globally throw out.
Yeah, but per-buffer priority system could do both of these.
Yeah, but we already have that. See amdgpu_bo_list_entry_cmp() and
amdgpu_bo_list_create().
This is the per application priority which can be used by userspace to
give priority to each BO in a submission (or application wide).
The problem is rather that amdgpu/TTM never really made good use of
that information.
I think it's nigh impossible to make good use of priority information if
you wrap it in the BO list which you only know on submit. For example,
you don't know when priorities change unless you duplicate all the
tracking work (that the application has to do too!) in the kernel. You
also have no way of knowing the priority changed until right when the
app wants to submit work using that BO, and starting to move BOs around
at that point is bad for submission latency. That's why I didn't go
forward with tracking priorities on a BO-list basis.
Also, the priorities being local to a submission is actually not that
great when talking about lowering priorities. Consider a case where an
app's working set fits into VRAM completely, but combined with the
working set of other apps running in parallel, VRAM is oversubscribed.
The app recognizes this and asks the kernel to evict one of its
rarely-used buffers by setting the priority to the lowest possible, to
make space for the other applications.
Without global priorities, the kernel can't honor that request, even
though it would solve the oversubscription with minimal performance
impact. Even with per-app priorities, the kernel isn't likely to evict
buffers from the requesting application unless all the other
applications have a higher priority.
Regards,
Friedrich
Regards,
Christian.
Regards,
Friedrich
Regards,
Christian.
Regards,
Friedrich
What we can do is to have per process priorities, but that needs
to be
in the VM subsystem.
That's also the reason why I personally think that the handling
shouldn't be inside TTM at all.
Regards,
Christian.
Signed-off-by: Friedrich Vock <friedrich.vock@xxxxxx>
---
drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c | 20 ++++++++++++++++++++
include/uapi/drm/amdgpu_drm.h | 1 +
2 files changed, 21 insertions(+)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
index 5ca13e2e50f50..6107810a9c205 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
@@ -836,8 +836,10 @@ int amdgpu_gem_op_ioctl(struct drm_device *dev,
void *data,
{
struct amdgpu_device *adev = drm_to_adev(dev);
struct drm_amdgpu_gem_op *args = data;
+ struct ttm_resource_manager *man;
struct drm_gem_object *gobj;
struct amdgpu_vm_bo_base *base;
+ struct ttm_operation_ctx ctx;
struct amdgpu_bo *robj;
int r;
@@ -851,6 +853,9 @@ int amdgpu_gem_op_ioctl(struct drm_device *dev,
void *data,
if (unlikely(r))
goto out;
+ memset(&ctx, 0, sizeof(ctx));
+ ctx.interruptible = true;
+
switch (args->op) {
case AMDGPU_GEM_OP_GET_GEM_CREATE_INFO: {
struct drm_amdgpu_gem_create_in info;
@@ -898,6 +903,21 @@ int amdgpu_gem_op_ioctl(struct drm_device *dev,
void *data,
amdgpu_bo_unreserve(robj);
break;
+ case AMDGPU_GEM_OP_SET_PRIORITY:
+ if (args->value > AMDGPU_BO_PRIORITY_MAX_USER)
+ args->value = AMDGPU_BO_PRIORITY_MAX_USER;
+ ttm_bo_update_priority(&robj->tbo, args->value);
+ if (robj->tbo.evicted_type != TTM_NUM_MEM_TYPES) {
+ ttm_bo_try_unevict(&robj->tbo, &ctx);
+ amdgpu_bo_unreserve(robj);
+ } else {
+ amdgpu_bo_unreserve(robj);
+ man = ttm_manager_type(robj->tbo.bdev,
+ robj->tbo.resource->mem_type);
+ ttm_mem_unevict_evicted(robj->tbo.bdev, man,
+ true);
+ }
+ break;
default:
amdgpu_bo_unreserve(robj);
r = -EINVAL;
diff --git a/include/uapi/drm/amdgpu_drm.h
b/include/uapi/drm/amdgpu_drm.h
index bdbe6b262a78d..53552dd489b9b 100644
--- a/include/uapi/drm/amdgpu_drm.h
+++ b/include/uapi/drm/amdgpu_drm.h
@@ -531,6 +531,7 @@ union drm_amdgpu_wait_fences {
#define AMDGPU_GEM_OP_GET_GEM_CREATE_INFO 0
#define AMDGPU_GEM_OP_SET_PLACEMENT 1
+#define AMDGPU_GEM_OP_SET_PRIORITY 2
/* Sets or returns a value associated with a buffer. */
struct drm_amdgpu_gem_op {
--
2.44.0