Am 01.06.23 um 13:14 schrieb Chong Li:
enforce process isolation between graphics and compute via using the same reserved vmid.
Signed-off-by: Chong Li <chongli2@xxxxxxx>
---
drivers/gpu/drm/amd/amdgpu/amdgpu.h | 1 +
drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 9 +++++++++
drivers/gpu/drm/amd/amdgpu/amdgpu_ids.c | 13 ++++++++++++-
3 files changed, 22 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
index ce196badf42d..48c5c547d85a 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
@@ -215,6 +215,7 @@ extern int amdgpu_force_asic_type;
extern int amdgpu_smartshift_bias;
extern int amdgpu_use_xgmi_p2p;
extern int amdgpu_mtype_local;
+extern int enforce_isolation;
#ifdef CONFIG_HSA_AMD
extern int sched_policy;
extern bool debug_evictions;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
index 3d91e123f9bd..2e0ebd92b4cf 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
@@ -973,6 +973,15 @@ MODULE_PARM_DESC(
4 = AMDGPU_CPX_PARTITION_MODE)");
module_param_named(user_partt_mode, amdgpu_user_partt_mode, uint, 0444);
+
+/**
+ * DOC: enforce_isolation (int)
+ * enforce process isolation between graphics and compute via using the same reserved vmid.
+ */
+int enforce_isolation = 0;
Please move that to the other declarations above.
+module_param(enforce_isolation, int, 0444);
IIRC you can also use bool here.
+MODULE_PARM_DESC(enforce_isolation, "enforce process isolation between graphics and compute . 1 = On, 0 = Off");
This way you can drop the "1 = On, 0 = Off" part from the description
because "enforce_isolation=on" should then be accepted on the kernel
commandline as well.
+
/* These devices are not supported by amdgpu.
* They are supported by the mach64, r128, radeon drivers
*/
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ids.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ids.c
index c991ca0b7a1c..33efa17d08ff 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ids.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ids.c
@@ -409,7 +409,7 @@ int amdgpu_vmid_grab(struct amdgpu_vm *vm, struct amdgpu_ring *ring,
if (r || !idle)
goto error;
- if (vm->reserved_vmid[vmhub]) {
+ if (vm->reserved_vmid[vmhub] || (enforce_isolation && (vmhub == AMDGPU_GFXHUB(0)))) {
r = amdgpu_vmid_grab_reserved(vm, ring, job, &id, fence);
if (r || !id)
goto error;
@@ -578,6 +578,17 @@ void amdgpu_vmid_mgr_init(struct amdgpu_device *adev)
list_add_tail(&id_mgr->ids[j].list, &id_mgr->ids_lru);
}
}
+
+ if (enforce_isolation) {
+ struct amdgpu_vmid_mgr *id_mgr = &adev->vm_manager.id_mgr[AMDGPU_GFXHUB(0)];
+ struct amdgpu_vmid *id = NULL;
Empty line between declaration and code please.
+ ++id_mgr->reserved_use_count;
+ id = list_first_entry(&id_mgr->ids_lru, struct amdgpu_vmid,
+ list);
+ /* Remove from normal round robin handling */
+ list_del_init(&id->list);
+ id_mgr->reserved = id;
It would be good if we don't duplicate this hunk here and in
amdgpu_vmid_alloc_reserved().
We should probably cleanup amdgpu_vmid_alloc_reserved() a bit and move
the check for vm->reserved_vmid into amdgpu_vm_ioctl().
This way we could call amdgpu_vmid_alloc_reserved() here as well.
Apart from that looks good from the technical side.
Regards,
Christian.
+ }
}
/**