Re: [PATCH v3 1/1] PCI: Add translated request only flag for pci_enable_pasid()

Baolu Lu <baolu.lu@xxxxxxxxxxxxxxx> · Tue, 31 Jan 2023 20:56:13 +0800

On 2023/1/31 2:38, Bjorn Helgaas wrote:
PCI: Add translated request only flag for pci_enable_pasid()

The PCIe fabric routes Memory Requests based on the TLP address, ignoring
the PASID. In order to ensure system integrity, commit 201007ef707a ("PCI:
Enable PASID only when ACS RR & UF enabled on upstream path") requires
some ACS features being supported on device's upstream path when enabling
PCI/PASID.

However, above change causes the Linux kernel boots to black screen on a
system with below graphic device:
We need a PCIe concept-level description of the issue first, i.e., in
terms of DMA, PASID, ACS, etc.  Then we can mention the AMD GPU issue
as an instance.

How about below description?

PCIe endpoints can use ATS to request DMA remapping hardware to
translate an IOVA to its mapped physical address. If the translation is
missing or the permissions are insufficient, the PRI is used to trigger
an I/O page fault. The IOMMU driver will fill the mapping with desired
permissions and return the translated address to the device.

The translated address is specified by the IOMMU driver. The IOMMU
driver ensures that the address is a DMA buffer address instead of any
P2P address in the PCI fabric. Therefore, any translated memory request
will eventually be routed to IOMMU regardless of whether there is ACS
control in the up-streaming path.

AMD GPU is one of those devices. Furthermore, it always uses translated
memory requests for PASID.

00:01.0 VGA compatible controller: Advanced Micro Devices, Inc.
         [AMD/ATI] Wani [Radeon R5/R6/R7 Graphics] (rev ca)
         (prog-if 00 [VGA controller])
         DeviceName: ATI EG BROADWAY
         Subsystem: Hewlett-Packard Company Device 8332

The kernel trace looks like below:

  Call Trace:
   <TASK>
   amd_iommu_attach_device+0x2e0/0x300
   __iommu_attach_device+0x1b/0x90
   iommu_attach_group+0x65/0xa0
   amd_iommu_init_device+0x16b/0x250 [iommu_v2]
   kfd_iommu_resume+0x4c/0x1a0 [amdgpu]
   kgd2kfd_resume_iommu+0x12/0x30 [amdgpu]
   kgd2kfd_device_init.cold+0x346/0x49a [amdgpu]
   amdgpu_amdkfd_device_init+0x142/0x1d0 [amdgpu]
   amdgpu_device_init.cold+0x19f5/0x1e21 [amdgpu]
   ? _raw_spin_lock_irqsave+0x23/0x50
   amdgpu_driver_load_kms+0x15/0x110 [amdgpu]
   amdgpu_pci_probe+0x161/0x370 [amdgpu]
   local_pci_probe+0x41/0x80
   pci_device_probe+0xb3/0x220
   really_probe+0xde/0x380
   ? pm_runtime_barrier+0x50/0x90
   __driver_probe_device+0x78/0x170
   driver_probe_device+0x1f/0x90
   __driver_attach+0xce/0x1c0
   ? __pfx___driver_attach+0x10/0x10
   bus_for_each_dev+0x73/0xa0
   bus_add_driver+0x1ae/0x200
   driver_register+0x89/0xe0
   ? __pfx_init_module+0x10/0x10 [amdgpu]
   do_one_initcall+0x59/0x230
   do_init_module+0x4a/0x200
   __do_sys_init_module+0x157/0x180
   do_syscall_64+0x5b/0x80
   ? handle_mm_fault+0xff/0x2f0
   ? do_user_addr_fault+0x1ef/0x690
   ? exc_page_fault+0x70/0x170
   entry_SYSCALL_64_after_hwframe+0x72/0xdc
The stack trace doesn't seem like it shows a failure, so I'm not sure
it's useful this time.  If it is, we can at least strip out the
irrelevant pieces.

I will drop above from the commit message.

The AMD iommu driver allocates a new domain (called v2 domain) for the
"v2 domain" needs to be something greppable -- an identifier,
filename, etc.

The code reads,

2052         if (iommu_feature(iommu, FEATURE_GT) &&
2053             iommu_feature(iommu, FEATURE_PPR)) {
2054                 iommu->is_iommu_v2   = true;

So, how about

..The AMD GPU has a private interface to its own AMD IOMMU, which could
be detected by the FEATURE_GT && FEATURE_PPR features. The AMD iommu
driver allocates a special domain for the GPU device ..

?

Best regards,
baolu