On Fri, Jun 16, 2023 at 08:39:46AM +0000, Tian, Kevin wrote: > +Alex > > > From: Jason Gunthorpe <jgg@xxxxxxxxxx> > > Sent: Tuesday, June 13, 2023 11:54 PM > > > > On Thu, Jun 08, 2023 at 04:28:24PM +0100, Robin Murphy wrote: > > > > > > The iova_reserve_pci_windows() you've seen is for kernel DMA interfaces > > > > which is not related to peer-to-peer accesses. > > > > > > Right, in general the IOMMU driver cannot be held responsible for > > whatever > > > might happen upstream of the IOMMU input. > > > > The driver yes, but.. > > > > > The DMA layer carves PCI windows out of its IOVA space > > > unconditionally because we know that they *might* be problematic, > > > and we don't have any specific constraints on our IOVA layout so > > > it's no big deal to just sacrifice some space for simplicity. > > > > This is a problem for everything using UNMANAGED domains. If the iommu > > API user picks an IOVA it should be able to expect it to work. If the > > intereconnect fails to allow it to work then this has to be discovered > > otherwise UNAMANGED domains are not usable at all. > > > > Eg vfio and iommufd are also in trouble on these configurations. > > > > If those PCI windows are problematic e.g. due to ACS they belong to > a single iommu group. If a vfio user opens all the devices in that group > then it can discover and reserve those windows in its IOVA space. How? We don't even exclude the single device's BAR if there is no ACS? > The problem is that the user may not open all the devices then > currently there is no way for it to know the windows on those > unopened devices. > > Curious why nobody complains about this gap before this thread... Probably because it only matters if you have a real PCIe switch in the system, which is pretty rare. Jason