Re: [PATCH] drm/amdkfd: explicitly create/destroy queue attributes under /sys

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Am 09.12.21 um 23:27 schrieb Felix Kuehling:
Am 2021-12-09 um 5:14 p.m. schrieb Chen, Xiaogang:
On 12/9/2021 12:40 PM, Felix Kuehling wrote:
Am 2021-12-09 um 2:49 a.m. schrieb Xiaogang.Chen:
From: Xiaogang Chen <xiaogang.chen@xxxxxxx>

When application is about finish it destroys queues it has created by
an ioctl. Driver deletes queue
entry(/sys/class/kfd/kfd/proc/pid/queues/queueid/)
which is directory including this queue all attributes. Low level
kernel
code deletes all attributes under this directory. The lock from
kernel is
on queue entry, not its attributes. At meantime another user space
application
can read the attributes. There is possibility that the application can
hold/read the attributes while kernel is deleting the queue entry,
cause
the application have invalid memory access, then killed by kernel.

Driver changes: explicitly create/destroy each attribute for each
queue,
let kernel put lock on each attribute too.
Is this working around a bug in kobject_del? Shouldn't that code take
care of the necessary locking itself?

Regards,
    Felix
The patches do not change kobject/kernfs that are too low level and
would involve deeper discussions.
Made changes at higher level(kfd) instead.

Have tested with MSF tool overnight.
OK. I'm OK with your changes. The patch is

Reviewed-by: Felix Kuehling <Felix.Kuehling@xxxxxxx>

But I think we should let the kernfs folks know that there is a problem
anyway. It might save someone else a lot of time and headaches down the
line. Ideally we'd come up with a small reproducer (dummy driver and a
user mode tool (could just be a bash script)) that doesn't require
special AMD hardware and the whole ROCm stack.

I think we could do this in the DKMS/release branches, but for upstream we should rather fix the underlying problem.

Additional to that this is explicitely what we should not do if I understood Greg correctly in previous discussions, but take that with a grain of salt since I'm not an expert on the topic.

Regards,
Christian.


Regards,
   Felix


Thanks
Xiaogang

Signed-off-by: Xiaogang Chen <xiaogang.chen@xxxxxxx>
---
   drivers/gpu/drm/amd/amdkfd/kfd_priv.h    |  3 +++
   drivers/gpu/drm/amd/amdkfd/kfd_process.c | 33
+++++++-----------------
   2 files changed, 13 insertions(+), 23 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
index 0c3f911e3bf4..045da300749e 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
@@ -546,6 +546,9 @@ struct queue {
         /* procfs */
       struct kobject kobj;
+    struct attribute attr_guid;
+    struct attribute attr_size;
+    struct attribute attr_type;
   };
     enum KFD_MQD_TYPE {
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_process.c
b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
index 9158f9754a24..04a5638f9196 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_process.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
@@ -73,6 +73,8 @@ static void evict_process_worker(struct
work_struct *work);
   static void restore_process_worker(struct work_struct *work);
     static void kfd_process_device_destroy_cwsr_dgpu(struct
kfd_process_device *pdd);
+static void kfd_sysfs_create_file(struct kobject *kobj, struct
attribute *attr,
+                char *name);
     struct kfd_procfs_tree {
       struct kobject *kobj;
@@ -441,35 +443,12 @@ static ssize_t kfd_sysfs_counters_show(struct
kobject *kobj,
       return 0;
   }
   -static struct attribute attr_queue_size = {
-    .name = "size",
-    .mode = KFD_SYSFS_FILE_MODE
-};
-
-static struct attribute attr_queue_type = {
-    .name = "type",
-    .mode = KFD_SYSFS_FILE_MODE
-};
-
-static struct attribute attr_queue_gpuid = {
-    .name = "gpuid",
-    .mode = KFD_SYSFS_FILE_MODE
-};
-
-static struct attribute *procfs_queue_attrs[] = {
-    &attr_queue_size,
-    &attr_queue_type,
-    &attr_queue_gpuid,
-    NULL
-};
-
   static const struct sysfs_ops procfs_queue_ops = {
       .show = kfd_procfs_queue_show,
   };
     static struct kobj_type procfs_queue_type = {
       .sysfs_ops = &procfs_queue_ops,
-    .default_attrs = procfs_queue_attrs,
   };
     static const struct sysfs_ops procfs_stats_ops = {
@@ -511,6 +490,10 @@ int kfd_procfs_add_queue(struct queue *q)
           return ret;
       }
   +    kfd_sysfs_create_file(&q->kobj, &q->attr_guid, "guid");
+    kfd_sysfs_create_file(&q->kobj, &q->attr_size, "size");
+    kfd_sysfs_create_file(&q->kobj, &q->attr_type, "type");
+
       return 0;
   }
   @@ -655,6 +638,10 @@ void kfd_procfs_del_queue(struct queue *q)
       if (!q)
           return;
   +    sysfs_remove_file(&q->kobj, &q->attr_guid);
+    sysfs_remove_file(&q->kobj, &q->attr_size);
+    sysfs_remove_file(&q->kobj, &q->attr_type);
+
       kobject_del(&q->kobj);
       kobject_put(&q->kobj);
   }




[Index of Archives]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux