On Thu, 2018-06-07 at 16:15 -0600, Alex Williamson wrote: > On Fri, 08 Jun 2018 07:54:02 +1000 > Benjamin Herrenschmidt <benh@xxxxxxxxxxxxxxxxxxx> wrote: > > > On Thu, 2018-06-07 at 11:04 -0600, Alex Williamson wrote: > > > > > > Can we back up and discuss whether the IOMMU grouping of NVLink > > > connected devices makes sense? AIUI we have a PCI view of these > > > devices and from that perspective they're isolated. That's the view of > > > the device used to generate the grouping. However, not visible to us, > > > these devices are interconnected via NVLink. What isolation properties > > > does NVLink provide given that its entire purpose for existing seems to > > > be to provide a high performance link for p2p between devices? > > > > Not entire. On POWER chips, we also have an nvlink between the device > > and the CPU which is running significantly faster than PCIe. > > > > But yes, there are cross-links and those should probably be accounted > > for in the grouping. > > Then after we fix the grouping, can we just let the host driver manage > this coherent memory range and expose vGPUs to guests? The use case of > assigning all 6 GPUs to one VM seems pretty limited. (Might need to > convince NVIDIA to support more than a single vGPU per VM though) > Thanks, I don't know about "vGPUs" and what nVidia may be cooking in that area. The patched from Alexey allow for passing through the full thing, but they aren't trivial (there are additional issues, I'm not sure how covered they are, as we need to pay with the mapping attributes of portions of the GPU memory on the host side...). Note: The cross-links are only per-socket so that would be 2 groups of 3. We *can* allow individual GPUs to be passed through, either if somebody designs a system without cross links, or if the user is ok with the security risk as the guest driver will not enable them if it doesn't "find" both sides of them. Cheers, Ben.