On 2024-11-20 22:58, Chen, Xiaogang wrote: >>> @@ -1822,15 +1804,20 @@ struct kfd_process *kfd_lookup_process_by_pasid(u32 pasid) >>> { >>> struct kfd_process *p, *ret_p = NULL; >>> unsigned int temp; >>> + int i; >>> int idx = srcu_read_lock(&kfd_processes_srcu); >>> hash_for_each_rcu(kfd_processes_table, temp, p, kfd_processes) { >>> - if (p->pasid == pasid) { >>> - kref_get(&p->ref); >>> - ret_p = p; >>> - break; >>> + for (i = 0; i < p->n_pdds; i++) { >>> + if (p->pdds[i]->pasid == pasid) { >>> + kref_get(&p->ref); >>> + ret_p = p; >>> + break; >>> + } >> I think this won't work correctly. The same PASID can be used for different processes on different GPUs because each adev manages its own PASID->amdgpu_vm lookup table. So kfd_lookup_process_by_pasid needs a new parameter that identifies the GPU adev, and you should only compare pasids, if the adev matches. > > I think it is the main concern here: the pasid used here is global in driver by amdgpu_pasid_alloc(16) at amdgpu_driver_open_kms. Each time a render node(partition) got opened, a new pasid value is generated. Its lifetime is until render node got closed. A pdd just uses this pasid. And each adev has its own xarray which saves pasids for this adev. I think I had a misunderstanding here. I saw the xarray in adev and assumed that each adev allocated PASIDs independently. But there is also a global amdgpu_pasid_ida that I missed. If the PASID allocation is global in the amdgpu_pasid_ida, then each PASID uniquely identifies a VM your code should be fine. @Christian, we discussed the number of PASIDs consumed on systems with many GPUs and partitions. If the PASIDs are managed globally, then running out of PASIDs is a concern. Do you think we should change this? Regards, Felix > > Regards > > Xiaogang