Hi Felix, On 2/17/2023 1:29 AM, Felix Kuehling wrote: >> Feb 16 13:22:32 kernel: kfd kfd: amdgpu: Failed to resume IOMMU for device >> 1002:9874 >> Feb 16 13:22:32 kernel: kfd kfd: amdgpu: device 1002:9874 NOT added due to errors > This look like IOMMU device initialization still fails (but more gracefully > now). Vasant, is that expected? My fix is to gracefully handle failure paths in IOMMU. So above logs are expected. Basically it means IOMMU couldn't attach devices to new domain (because it couldn't enable PASID on AMD GPU as ACS RR/UF flags are missing, see commit 201007ef707 ) and we did fall back to old domain properly. It also means that GPU will not be able to use PASID/PRI. If you need these feauteres then you have to look into commit 201007ef707 and see how we can enable PASID for GPU (without ACS UF/RR flag?). > > This would lead to KFD not being available on Carrizo with this kernel, which is > probably not a big limitation in practice. It would only affect compute > applications using the ROCm user mode stack. I don't think anyone does that > these days on these old APUs. > > The SMU errors seem unrelated to this unless there is some subtle interaction > I'm missing. I have no idea about GPU warning. All I can say is IOMMU side looks good but PASID/PRI is not enabled for GPU. -Vasant