On Tue, Mar 23, 2021 at 03:30:41PM +0000, Christoph Hellwig wrote: > On Fri, Mar 19, 2021 at 10:57:20PM +0000, Rimmer, Todd wrote: > > We'd like advise on a challenging situation. Some customers > > desire NICs to support nVidia GPUs in some environments. > > Unfortunately the nVidia GPU drivers are not upstream, and have > > not been for years. So we are forced to have both out of tree and > > upstream versions of the code. We need the same applications to > > be able to work over both, so we would like the GPU enabled > > versions of the code to have the same ABI as the upstream code as > > this greatly simplifies things. We have removed all GPU specific > > code from the upstream submission, but used both the "alignment > > holes" and the "reserved" mechanisms to hold places for GPU > > specific fields which can't be upstreamed. > > NVIDIA GPUs are supported by drivers/gpu/drm/nouveau/, and your are > encourage to support them just like all the other in-tree GPU > drivers. Not sure what support a network protocol would need for a > specific GPU. You're probably trying to do something amazingly > stupid here instead of relying on proper kernel subsystem use. The kernel building block for what they are trying to do with the GPU is the recently merged DMABUF MR support in the RDMA subsystem. I'd like to think that since Daniel's team at Intel got the DMABUF stuff merged to support the applications Todd's Intel team is building that this RV stuff is already fully ready for dmabuf... (hint hint) What Todd is alluding to here is the hacky DMABUF alternative that is in the NVIDIA GPU driver - which HPC networking companies must support if they want to interwork with the NVIDIA GPU. Every RDMA vendor playing in the HPC space has some out-of-tree driver to enable this. :( Jason