Re: AMD IO_PAGE_FAULT w/NTB on Write ops?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On 2019-04-20 3:06 a.m., Eric Pilmore wrote:
> What we have found is that the Xeon based host can successfully ioread
> to this mapped shared buffer, but whenever it attempts an iowrite to
> this region, it results in an IO_PAGE_FAULT on the AMD based host:
> 
> AMD-Vi: Event logged [IO_PAGE_FAULT device=23:01.2 domain=0x0000
> address=0x00000000fde1c18c flags=0x0070]
> 
> Going in the opposite direction there are no issues, i.e. the AMD
> based host can successfully ioread/iowrite to the mapped in buffer
> exported by the Xeon host.  Or if both hosts are Xeon's, then
> everything works fine also.
> 
> I have looked high and low, and have not been able to interpret what
> the "flags=0x0070" represent. I assume they are indicating some write
> permission error, but was wondering if anybody here might know?

See the AMD IOMMU spec[1]. Figure 51. 0x0070 indicates the PE, RW and PR
bits are set which means a Write request to a present page was denied
because the peripheral did not have permission.

> More importantly, does anybody know why the AMD IOMMU might seemingly
> default to not allow Write operations to the exported memory? Is there
> some additional BIOS or kernel boot parameter setting that needs to be
> set?

Yeah, I don't think the IOMMU defaults to allow write operations to
exported memory. That would be extremely broken....

> lspci on the AMD hosts of the external PCI-e switch:
>    23:00.0 PCI bridge: PMC-Sierra Inc. Device 8536
>    23:00.1 Bridge: PMC-Sierra Inc. Device 8536
> 
> The 23:00.1 BDF is the NTB bridge. The BDF (23:01.2) in the error
> message represents the "NTB translated" BDF of the request that came
> from the peer, i.e. the 01.2 is the proxy-id. Is there a chance that
> this proxy-id is causing some confusion for the AMD IOMMU?

I suspect the proxy IDs are the problem. On Intel hardware, we had to
add support so that it allowed requests for all proxy IDs for a given
device. We probably have to do something similar to the AMD IOMMU driver.

My guess is that the reason writes work and not reads is because the
write TLPs are posted and thus the switch doesn't apply the Proxy ID
seeing it doesn't expect a completion. Thus the IOMMU sees the TLPs as
coming from a permitted peripheral and doesn't complain.

Logan

[1] https://www.amd.com/system/files/TechDocs/48882_IOMMU.pdf



[Index of Archives]     [DMA Engine]     [Linux Coverity]     [Linux USB]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Greybus]

  Powered by Linux