Re: [PATCH] drm/amdkfd: explicitly create/destroy queue attributes under /sys

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Am 2021-12-09 um 5:14 p.m. schrieb Chen, Xiaogang:
>
> On 12/9/2021 12:40 PM, Felix Kuehling wrote:
>> Am 2021-12-09 um 2:49 a.m. schrieb Xiaogang.Chen:
>>> From: Xiaogang Chen <xiaogang.chen@xxxxxxx>
>>>
>>> When application is about finish it destroys queues it has created by
>>> an ioctl. Driver deletes queue
>>> entry(/sys/class/kfd/kfd/proc/pid/queues/queueid/)
>>> which is directory including this queue all attributes. Low level
>>> kernel
>>> code deletes all attributes under this directory. The lock from
>>> kernel is
>>> on queue entry, not its attributes. At meantime another user space
>>> application
>>> can read the attributes. There is possibility that the application can
>>> hold/read the attributes while kernel is deleting the queue entry,
>>> cause
>>> the application have invalid memory access, then killed by kernel.
>>>
>>> Driver changes: explicitly create/destroy each attribute for each
>>> queue,
>>> let kernel put lock on each attribute too.
>> Is this working around a bug in kobject_del? Shouldn't that code take
>> care of the necessary locking itself?
>>
>> Regards,
>>    Felix
>
> The patches do not change kobject/kernfs that are too low level and
> would involve deeper discussions.
> Made changes at higher level(kfd) instead.
>
> Have tested with MSF tool overnight.

OK. I'm OK with your changes. The patch is

Reviewed-by: Felix Kuehling <Felix.Kuehling@xxxxxxx>

But I think we should let the kernfs folks know that there is a problem
anyway. It might save someone else a lot of time and headaches down the
line. Ideally we'd come up with a small reproducer (dummy driver and a
user mode tool (could just be a bash script)) that doesn't require
special AMD hardware and the whole ROCm stack.

Regards,
  Felix


>
> Thanks
> Xiaogang
>
>>
>>> Signed-off-by: Xiaogang Chen <xiaogang.chen@xxxxxxx>
>>> ---
>>>   drivers/gpu/drm/amd/amdkfd/kfd_priv.h    |  3 +++
>>>   drivers/gpu/drm/amd/amdkfd/kfd_process.c | 33
>>> +++++++-----------------
>>>   2 files changed, 13 insertions(+), 23 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
>>> b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
>>> index 0c3f911e3bf4..045da300749e 100644
>>> --- a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
>>> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
>>> @@ -546,6 +546,9 @@ struct queue {
>>>         /* procfs */
>>>       struct kobject kobj;
>>> +    struct attribute attr_guid;
>>> +    struct attribute attr_size;
>>> +    struct attribute attr_type;
>>>   };
>>>     enum KFD_MQD_TYPE {
>>> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_process.c
>>> b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
>>> index 9158f9754a24..04a5638f9196 100644
>>> --- a/drivers/gpu/drm/amd/amdkfd/kfd_process.c
>>> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
>>> @@ -73,6 +73,8 @@ static void evict_process_worker(struct
>>> work_struct *work);
>>>   static void restore_process_worker(struct work_struct *work);
>>>     static void kfd_process_device_destroy_cwsr_dgpu(struct
>>> kfd_process_device *pdd);
>>> +static void kfd_sysfs_create_file(struct kobject *kobj, struct
>>> attribute *attr,
>>> +                char *name);
>>>     struct kfd_procfs_tree {
>>>       struct kobject *kobj;
>>> @@ -441,35 +443,12 @@ static ssize_t kfd_sysfs_counters_show(struct
>>> kobject *kobj,
>>>       return 0;
>>>   }
>>>   -static struct attribute attr_queue_size = {
>>> -    .name = "size",
>>> -    .mode = KFD_SYSFS_FILE_MODE
>>> -};
>>> -
>>> -static struct attribute attr_queue_type = {
>>> -    .name = "type",
>>> -    .mode = KFD_SYSFS_FILE_MODE
>>> -};
>>> -
>>> -static struct attribute attr_queue_gpuid = {
>>> -    .name = "gpuid",
>>> -    .mode = KFD_SYSFS_FILE_MODE
>>> -};
>>> -
>>> -static struct attribute *procfs_queue_attrs[] = {
>>> -    &attr_queue_size,
>>> -    &attr_queue_type,
>>> -    &attr_queue_gpuid,
>>> -    NULL
>>> -};
>>> -
>>>   static const struct sysfs_ops procfs_queue_ops = {
>>>       .show = kfd_procfs_queue_show,
>>>   };
>>>     static struct kobj_type procfs_queue_type = {
>>>       .sysfs_ops = &procfs_queue_ops,
>>> -    .default_attrs = procfs_queue_attrs,
>>>   };
>>>     static const struct sysfs_ops procfs_stats_ops = {
>>> @@ -511,6 +490,10 @@ int kfd_procfs_add_queue(struct queue *q)
>>>           return ret;
>>>       }
>>>   +    kfd_sysfs_create_file(&q->kobj, &q->attr_guid, "guid");
>>> +    kfd_sysfs_create_file(&q->kobj, &q->attr_size, "size");
>>> +    kfd_sysfs_create_file(&q->kobj, &q->attr_type, "type");
>>> +
>>>       return 0;
>>>   }
>>>   @@ -655,6 +638,10 @@ void kfd_procfs_del_queue(struct queue *q)
>>>       if (!q)
>>>           return;
>>>   +    sysfs_remove_file(&q->kobj, &q->attr_guid);
>>> +    sysfs_remove_file(&q->kobj, &q->attr_size);
>>> +    sysfs_remove_file(&q->kobj, &q->attr_type);
>>> +
>>>       kobject_del(&q->kobj);
>>>       kobject_put(&q->kobj);
>>>   }



[Index of Archives]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux