On Thu, May 10, 2018 at 01:10:15PM -0600, Alex Williamson wrote: > On Thu, 10 May 2018 18:41:09 +0000 > "Stephen Bates" <sbates@xxxxxxxxxxxx> wrote: > > > Reasons is that GPU are giving up on PCIe (see all specialize link like > > > NVlink that are popping up in GPU space). So for fast GPU inter-connect > > > we have this new links. > > > > I look forward to Nvidia open-licensing NVLink to anyone who wants to use it ;-). > > No doubt, the marketing for it is quick to point out the mesh topology > of NVLink, but I haven't seen any technical documents that describe the > isolation capabilities or IOMMU interaction. Whether this is included > or an afterthought, I have no idea. AFAIK there is no IOMMU on NVLink between devices, walking a page table and being able to sustain 80GB/s or 160GB/s is hard to achieve :) I think idea behind those interconnect is that devices in the mesh are inherently secure ie each single device is suppose to make sure that no one can abuse it. GPU with their virtual address space and contextualize program executions unit are suppose to be secure (a specter like bug might be lurking on those but i doubt it). So for those interconnect you program physical address directly in the page table of the devices and those physical address are un-translated from hard- ware perspective. Note that the kernel driver that do the actual GPU page table programming can do sanity check on value it is setting. So checks can also happens at setup time. But after that assumption is hardware is secure and no one can abuse it AFAICT. > > > > Also the IOMMU isolation do matter a lot to us. Think someone using this > > > peer to peer to gain control of a server in the cloud. > > From that perspective, do we have any idea what NVLink means for > topology and IOMMU provided isolation and translation? I've seen a > device assignment user report that seems to suggest it might pretend to > be PCIe compatible, but the assigned GPU ultimately doesn't work > correctly in a VM, so perhaps the software compatibility is only so > deep. Thanks, Note that each single GPU (in configurations i am aware of) also have a PCIE link with the CPU/main memory. So from that point of view they very much behave like a regular PCIE devices. It is just that each GPUs in the mesh can access each other memory through high bandwidth interconnect. I am not sure how much is public beyond that, i will ask NVidia to try to have someone chime in this thread and shed light on this, if possible. Cheers, Jérôme