Am 2021-12-09 um 5:14 p.m. schrieb Chen, Xiaogang: > > On 12/9/2021 12:40 PM, Felix Kuehling wrote: >> Am 2021-12-09 um 2:49 a.m. schrieb Xiaogang.Chen: >>> From: Xiaogang Chen <xiaogang.chen@xxxxxxx> >>> >>> When application is about finish it destroys queues it has created by >>> an ioctl. Driver deletes queue >>> entry(/sys/class/kfd/kfd/proc/pid/queues/queueid/) >>> which is directory including this queue all attributes. Low level >>> kernel >>> code deletes all attributes under this directory. The lock from >>> kernel is >>> on queue entry, not its attributes. At meantime another user space >>> application >>> can read the attributes. There is possibility that the application can >>> hold/read the attributes while kernel is deleting the queue entry, >>> cause >>> the application have invalid memory access, then killed by kernel. >>> >>> Driver changes: explicitly create/destroy each attribute for each >>> queue, >>> let kernel put lock on each attribute too. >> Is this working around a bug in kobject_del? Shouldn't that code take >> care of the necessary locking itself? >> >> Regards, >> Felix > > The patches do not change kobject/kernfs that are too low level and > would involve deeper discussions. > Made changes at higher level(kfd) instead. > > Have tested with MSF tool overnight. OK. I'm OK with your changes. The patch is Reviewed-by: Felix Kuehling <Felix.Kuehling@xxxxxxx> But I think we should let the kernfs folks know that there is a problem anyway. It might save someone else a lot of time and headaches down the line. Ideally we'd come up with a small reproducer (dummy driver and a user mode tool (could just be a bash script)) that doesn't require special AMD hardware and the whole ROCm stack. Regards, Felix > > Thanks > Xiaogang > >> >>> Signed-off-by: Xiaogang Chen <xiaogang.chen@xxxxxxx> >>> --- >>> drivers/gpu/drm/amd/amdkfd/kfd_priv.h | 3 +++ >>> drivers/gpu/drm/amd/amdkfd/kfd_process.c | 33 >>> +++++++----------------- >>> 2 files changed, 13 insertions(+), 23 deletions(-) >>> >>> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h >>> b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h >>> index 0c3f911e3bf4..045da300749e 100644 >>> --- a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h >>> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h >>> @@ -546,6 +546,9 @@ struct queue { >>> /* procfs */ >>> struct kobject kobj; >>> + struct attribute attr_guid; >>> + struct attribute attr_size; >>> + struct attribute attr_type; >>> }; >>> enum KFD_MQD_TYPE { >>> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_process.c >>> b/drivers/gpu/drm/amd/amdkfd/kfd_process.c >>> index 9158f9754a24..04a5638f9196 100644 >>> --- a/drivers/gpu/drm/amd/amdkfd/kfd_process.c >>> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_process.c >>> @@ -73,6 +73,8 @@ static void evict_process_worker(struct >>> work_struct *work); >>> static void restore_process_worker(struct work_struct *work); >>> static void kfd_process_device_destroy_cwsr_dgpu(struct >>> kfd_process_device *pdd); >>> +static void kfd_sysfs_create_file(struct kobject *kobj, struct >>> attribute *attr, >>> + char *name); >>> struct kfd_procfs_tree { >>> struct kobject *kobj; >>> @@ -441,35 +443,12 @@ static ssize_t kfd_sysfs_counters_show(struct >>> kobject *kobj, >>> return 0; >>> } >>> -static struct attribute attr_queue_size = { >>> - .name = "size", >>> - .mode = KFD_SYSFS_FILE_MODE >>> -}; >>> - >>> -static struct attribute attr_queue_type = { >>> - .name = "type", >>> - .mode = KFD_SYSFS_FILE_MODE >>> -}; >>> - >>> -static struct attribute attr_queue_gpuid = { >>> - .name = "gpuid", >>> - .mode = KFD_SYSFS_FILE_MODE >>> -}; >>> - >>> -static struct attribute *procfs_queue_attrs[] = { >>> - &attr_queue_size, >>> - &attr_queue_type, >>> - &attr_queue_gpuid, >>> - NULL >>> -}; >>> - >>> static const struct sysfs_ops procfs_queue_ops = { >>> .show = kfd_procfs_queue_show, >>> }; >>> static struct kobj_type procfs_queue_type = { >>> .sysfs_ops = &procfs_queue_ops, >>> - .default_attrs = procfs_queue_attrs, >>> }; >>> static const struct sysfs_ops procfs_stats_ops = { >>> @@ -511,6 +490,10 @@ int kfd_procfs_add_queue(struct queue *q) >>> return ret; >>> } >>> + kfd_sysfs_create_file(&q->kobj, &q->attr_guid, "guid"); >>> + kfd_sysfs_create_file(&q->kobj, &q->attr_size, "size"); >>> + kfd_sysfs_create_file(&q->kobj, &q->attr_type, "type"); >>> + >>> return 0; >>> } >>> @@ -655,6 +638,10 @@ void kfd_procfs_del_queue(struct queue *q) >>> if (!q) >>> return; >>> + sysfs_remove_file(&q->kobj, &q->attr_guid); >>> + sysfs_remove_file(&q->kobj, &q->attr_size); >>> + sysfs_remove_file(&q->kobj, &q->attr_type); >>> + >>> kobject_del(&q->kobj); >>> kobject_put(&q->kobj); >>> }