On Mon, Aug 23, 2021 at 1:31 AM Jason Gunthorpe <jgg@xxxxxxxx> wrote: > > On Sun, Aug 22, 2021 at 12:40:26PM +0300, Oded Gabbay wrote: > > Hi Jason, > > > > I think that about a year ago we talked about the custom RDMA code of > > habanalabs. I tried to upstream it and you, rightfully, rejected that. > > > > Now that I have enough b/w to do this work, I want to start writing a > > proper RDMA driver for the habanalabs Gaudi device, which I will be > > able to upstream to the infiniband subsystem. > > > > I don't know if you remember but the Gaudi h/w is somewhat limited in > > its RDMA capabilities. We are not selling a stand-alone NIC :) We just > > use RDMA (or more precisely, ROCEv2) to connect between Gaudi devices. > > > > I'm sure I will have more specific questions down the line, but I had > > hoped you could point me to a basic/not-too-complex existing driver > > that I can use as a modern template. I'm also aware that I will need > > to write matching code in rdma-core. > > > > Also, I would like to add we will use the auxiliary bus feature to > > connect between this driver, the main (compute) driver and the > > Ethernet driver (which we are going to publish soon I hope). > > It sounds fine, as Leon mentions EFA is a good starting point for > something simple but non-spec compliant > > If I recall properly you'll want to have some special singular PD for > the HW and some specialty QPs? > > Jason Yes, we will have a singular PD. Regarding the QPs, I don't think we have anything special there, but I might be proven wrong. I was worried about reg_mr but I think we found a solution for that. I may be ahead of myself a little, but one of the issues I will need help with is how to handle ports that are not exposed to the Networking/Ethernet subsystem. In a box with Gaudis, some ports are connected back-to-back (between Gaudi devices) and some are exposed externally. The ports that are exposed externally will be registered as an Ethernet device and will be also handled by the Ethernet driver. I think that is pretty much standard. However, the "internal" ports won't be registered as an Ethernet device, as we don't want to expose them to the user as an interface. They are used only for back-to-back communication between Gaudi devices inside the same box. You can imagine them to be similar to NVlink, but instead of a proprietary protocol, they run ROCEv2. Registering them to netdev creates a very poor user experience and potentially degrades the host CPU performance (I can elaborate more on that). For those ports, we want to prevent the user from sending raw Ethernet data (opening a socket). We also want to avoid the need for the user to handle them with ifconfig/ethtool/etc. We only want to expose the IBverbs interface to those ports. Do you see any issue with that ? Thanks, Oded