Hi Folks, Before I ask my questions, here is a little background on the environment I have: - 2 hosts: 1 Xeon based (Intel(R) Xeon(R) Gold 5115 CPU @ 2.40GHz), 1 AMD based (AMD EPYC 7401 24-Core Processor) - Each host is interconnected via an external PCI-e (switchtec) switch. - The two hosts are exporting memory to each other via NTB. - IOMMU is enabled in both hosts. The Xeon platform requires some BIOS settings and a kernel parameter (intel_iommu=on), however as far as I have been able to determine, the AMD only requires the IOMMU BIOS setting to be enabled and no special kernel boot parameters. Does that sound right for AMD? - Region of memory exported to each host is acquired/mapped via dma_alloc_coherent() using the "device" of the respective external PCI-e switch. - The dma_addr returned from the dma_alloc_coherent is relayed to the peer host who then adds that value (i.e. IOVA offset) to it's local PCI BAR representing the switch, and then ioremap()'s that resulting address to get a CPU virtual address to which it can now perform ioread/iowrite operations. What we have found is that the Xeon based host can successfully ioread to this mapped shared buffer, but whenever it attempts an iowrite to this region, it results in an IO_PAGE_FAULT on the AMD based host: AMD-Vi: Event logged [IO_PAGE_FAULT device=23:01.2 domain=0x0000 address=0x00000000fde1c18c flags=0x0070] Going in the opposite direction there are no issues, i.e. the AMD based host can successfully ioread/iowrite to the mapped in buffer exported by the Xeon host. Or if both hosts are Xeon's, then everything works fine also. I have looked high and low, and have not been able to interpret what the "flags=0x0070" represent. I assume they are indicating some write permission error, but was wondering if anybody here might know? More importantly, does anybody know why the AMD IOMMU might seemingly default to not allow Write operations to the exported memory? Is there some additional BIOS or kernel boot parameter setting that needs to be set? lspci on the AMD hosts of the external PCI-e switch: 23:00.0 PCI bridge: PMC-Sierra Inc. Device 8536 23:00.1 Bridge: PMC-Sierra Inc. Device 8536 The 23:00.1 BDF is the NTB bridge. The BDF (23:01.2) in the error message represents the "NTB translated" BDF of the request that came from the peer, i.e. the 01.2 is the proxy-id. Is there a chance that this proxy-id is causing some confusion for the AMD IOMMU? Would greatly appreciate any assistance! Thanks! -- Eric Pilmore epilmore@xxxxxxxxxx http://gigaio.com Phone: (858) 775 2514 This e-mail message is intended only for the individual(s) to whom it is addressed and may contain information that is privileged, confidential, proprietary, or otherwise exempt from disclosure under applicable law. If you believe you have received this message in error, please advise the sender by return e-mail and delete it from your mailbox. Thank you.