On Mon, Jun 03, 2024 at 07:50:59AM +0000, Vidya Sagar wrote: > Hi Bjorn, > Could you let me know if Jason's reply answers your question? > Please let me know if you are looking for any more information. I think we should add some of that content to the commit log. It needs: - Subject line that advertises some good thing. - A description of why users want this. I have no idea what the actual benefit is, but I'm looking for something at the level of "The default ACS settings put A and B in different IOMMU groups, preventing P2PDMA between them. If we disable ACS X, A and B will be put in the same group and P2PDMA will work". - A primer on how users can affect IOMMU groups by enabling/ disabling ACS settings so they can use this without just blind trial and error. A note that this is immutable except at boot time. - A pointer to the code that determines IOMMU groups based on the ACS settings. Similar to the above, but more useful for developers. If we assert "for iommu_groups to form correctly ...", a hint about why/where this is so would be helpful. "Correctly" is not quite the right word here; it's just a fact that the ACS settings determined at boot time result in certain IOMMU groups. If the user desires different groups, it's not that something is "incorrect"; it's just that the user may have to accept less isolation to get the desired IOMMU groups. > > -----Original Message----- > > From: Jason Gunthorpe <jgg@xxxxxxxxxx> > > ... > > > > On Thu, May 23, 2024 at 09:59:36AM -0500, Bjorn Helgaas wrote: > > > [+cc iommu folks] > > > > > > On Thu, May 23, 2024 at 12:05:28PM +0530, Vidya Sagar wrote: > > > > For iommu_groups to form correctly, the ACS settings in the PCIe > > > > fabric need to be setup early in the boot process, either via the > > > > BIOS or via the kernel disable_acs_redir parameter. > > > > > > Can you point to the iommu code that is involved here? It sounds like > > > the iommu_groups are built at boot time and are immutable after that? > > > > They are created when the struct device is plugged in. pci_device_group() does the > > logic. > > > > Notably groups can't/don't change if details like ACS change after the groups are > > setup. > > > > There are alot of instructions out there telling people to boot their servers and then > > manually change the ACS flags with set_pci or something, and these are not good > > instructions since it defeats the VFIO group based security mechanisms. > > > > > If we need per-device ACS config that depends on the workload, it > > > seems kind of problematic to only be able to specify this at boot > > > time. I guess we would need to reboot if we want to run a workload > > > that needs a different config? > > > > Basically. The main difference I'd see is if the server is a VM host or running bare > > metal apps. You can get more efficicenty if you change things for the bare metal case, > > and often bare metal will want to turn the iommu off while a VM host often wants > > more of it turned on. > > > > > Is this the iommu usage model we want in the long term? > > > > There is some path to more dynamic behavior here, but it would require separating > > groups into two components - devices that are together because they are physically > > sharing translation (aliases and things) from devices that are together because they > > share a security boundary (ACS). > > > > It is more believable we could dynamically change security group assigments for VFIO > > than translation group assignment. I don't know anyone interested in this right now - > > Alex and I have only talked about it as a possibility a while back. > > > > FWIW I don't view patch as excluding more dynamisism in the future, but it is the best > > way to work with the current state of affairs, and definitely better than set_pci > > instructions. > > > > Thanks, > > Jason