Re: [PATCH] drm/amdkfd: Have kfd driver use same PASID values from graphic driver

Felix Kuehling <felix.kuehling@xxxxxxx> · Thu, 21 Nov 2024 09:22:00 -0600

On 2024-11-20 22:58, Chen, Xiaogang wrote:
>>>   @@ -1822,15 +1804,20 @@ struct kfd_process *kfd_lookup_process_by_pasid(u32 pasid)
>>>   {
>>>       struct kfd_process *p, *ret_p = NULL;
>>>       unsigned int temp;
>>> +    int i;
>>>         int idx = srcu_read_lock(&kfd_processes_srcu);
>>>         hash_for_each_rcu(kfd_processes_table, temp, p, kfd_processes) {
>>> -        if (p->pasid == pasid) {
>>> -            kref_get(&p->ref);
>>> -            ret_p = p;
>>> -            break;
>>> +        for (i = 0; i < p->n_pdds; i++) {
>>> +            if (p->pdds[i]->pasid == pasid) {
>>> +                kref_get(&p->ref);
>>> +                ret_p = p;
>>> +                break;
>>> +            }
>> I think this won't work correctly. The same PASID can be used for different processes on different GPUs because each adev manages its own PASID->amdgpu_vm lookup table. So kfd_lookup_process_by_pasid needs a new parameter that identifies the GPU adev, and you should only compare pasids, if the adev matches.
> 
> I think it is the main concern here: the pasid used here is global in driver by amdgpu_pasid_alloc(16) at amdgpu_driver_open_kms.  Each time a render node(partition) got opened, a new pasid value is generated. Its lifetime is until render node got closed. A pdd just uses this pasid.  And each adev has its own xarray which saves pasids for this adev.

I think I had a misunderstanding here. I saw the xarray in adev and assumed that each adev allocated PASIDs independently. But there is also a global amdgpu_pasid_ida that I missed. If the PASID allocation is global in the amdgpu_pasid_ida, then each PASID uniquely identifies a VM your code should be fine.

@Christian, we discussed the number of PASIDs consumed on systems with many GPUs and partitions. If the PASIDs are managed globally, then running out of PASIDs is a concern. Do you think we should change this?

Regards,
  Felix

> 
> Regards
> 
> Xiaogang