On Tue, 28 Jan 2020 07:13:01 +0000 "Skidanov, Alexey" <alexey.skidanov@xxxxxxxxx> wrote: > >On 2020-01-27 12:12 p.m., Christian König wrote: > >> Am 27.01.20 um 17:58 schrieb Logan Gunthorpe: > >>> > >>> On 2020-01-27 1:18 a.m., Christian König wrote: > >>>> Am 27.01.20 um 08:18 schrieb Skidanov, Alexey: > >>>>> Hello, > >>>>> > >>>>> I have recently found the below commit to disabling ACS bits. Using kernel parameter > >is pretty simple but requires to know in advance which devices might be participated in > >peer-to-peer sessions. > >>>>> > >>>>> Why we can't disable the ACS bits *after* the driver is initialized (and there is a > >request to connect between two peers) and not *during* device discovering?. > >>>> That's exactly what was initially proposed but we have seen hardware > >>>> which reacts allergic to disabling those bits on the fly. > >>> I wasn't aware of that and haven't seen anything like that. > >>> > >>>> Please read up the discussion on the mailing list leading to this patch. > >>> The issue was the IOMMU code does not allow for any kind of dynamic > >>> changes in the groups devices are assigned in. In theory, this could be > >>> possible but you'd still at least have to unbind the devices from their > >>> driver because you definitely can't change the IOMMU group while there > >>> are DMA requests in flight. Ultimately it's easier for most use cases to > >>> just disable it on boot. > >> > >> As far as I know you can't change the ACS bit either when there are > >> transactions in flight on the affected devices/bridges. > > > >No, I think the ACS bits are fine to change at any time. I've never had > >any issue changing them. The problem is the act of changing them changes > >the isolation between the devices which means the IOMMU groups have to > >change. > > > >It's certainly possible today to just use setpci to adjust those bits at > >any time. It just means the isolation the IOMMU is expecting will be > >wrong and that may mean you broke the security between VMs on your machine. > > > > According to the PCIe spec, there are two mechanisms to deal with isolation: > - Redirected Request Validation logic within the RC and > - ACS P2P Egress Control > So anyone who cares about the isolation must use at least one of these mechanisms. > I would expect that on VM creation, the above mechanisms will be configured appropriately. Redirected Request Validation within the RC presumes that transactions make it to the RC, and thus implies things like Upstream Forwarding, Request Redirection, and Completion Redirection. I think much of the desire for P2P wants to avoid the RC entirely. Also, this Redirected Request Validation logic and control of it is not part of the PCIe spec. P2P Egress Control only allows blocking of P2P between select downstream ports. AIUI this is used in conjunction with P2P Direct Translation to limit which direct translated paths are available to a device. DT on its own implies ATS, which has questionable isolation security (ie. susceptible to an exploitable endpoint). IIRC, it's also been rejected in previous discussions that we could simply wait for hardware that supports ATS and DT to build the direct P2P paths desired. There are also some arguable benefits to IOMMU isolation for non-VM use cases, so let's not put blinders on to those aspects. > >> Otherwise what could happen is that the response of an transaction takes > >> a different path than the request. That in turn can result in quite a > >> bunch of ordering problem on the PCIe bus. > >> > >> But the idea of unbinding a device, changing the bit and rebinding it > >> would probably work. > > > >Well, no, you can't just change the bit, you have to change the IOMMU > >group the device belongs to. Right now, we don't have any interface to > >do that except during scanning at boot. Right, and groups potentially contain multiple devices and multiple groups might be dependent on the isolation of a given interconnect, which means the scope of modifying that grouping may involve unrelated devices and drivers, which feeds into that boot-time restriction. We could do runtime modification, but it seems to imply quiescing DMA among unrelated drivers, which probably means the simplest solution is unbinding an entire hierarchy of drivers, soft unplugging the devices, and re-scanning with a different ACS policy, which in practice may not have a lot of benefit vs rebooting. Thanks, Alex