Re: Creating new RDMA driver for habanalabs

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Aug 23, 2021 at 1:31 AM Jason Gunthorpe <jgg@xxxxxxxx> wrote:
>
> On Sun, Aug 22, 2021 at 12:40:26PM +0300, Oded Gabbay wrote:
> > Hi Jason,
> >
> > I think that about a year ago we talked about the custom RDMA code of
> > habanalabs. I tried to upstream it and you, rightfully, rejected that.
> >
> > Now that I have enough b/w to do this work, I want to start writing a
> > proper RDMA driver for the habanalabs Gaudi device, which I will be
> > able to upstream to the infiniband subsystem.
> >
> > I don't know if you remember but the Gaudi h/w is somewhat limited in
> > its RDMA capabilities. We are not selling a stand-alone NIC :) We just
> > use RDMA (or more precisely, ROCEv2) to connect between Gaudi devices.
> >
> > I'm sure I will have more specific questions down the line, but I had
> > hoped you could point me to a basic/not-too-complex existing driver
> > that I can use as a modern template. I'm also aware that I will need
> > to write matching code in rdma-core.
> >
> > Also, I would like to add we will use the auxiliary bus feature to
> > connect between this driver, the main (compute) driver and the
> > Ethernet driver (which we are going to publish soon I hope).
>
> It sounds fine, as Leon mentions EFA is a good starting point for
> something simple but non-spec compliant
>
> If I recall properly you'll want to have some special singular PD for
> the HW and some specialty QPs?
>
> Jason

Yes, we will have a singular PD.
Regarding the QPs, I don't think we have anything special there, but I
might be proven wrong.
I was worried about reg_mr but I think we found a solution for that.

I may be ahead of myself a little, but one of the issues I will need
help with is how to handle ports that are not exposed to the
Networking/Ethernet subsystem.
In a box with Gaudis, some ports are connected back-to-back (between
Gaudi devices) and some are exposed externally.

The ports that are exposed externally will be registered as an
Ethernet device and will be also handled by the Ethernet driver. I
think that is pretty much standard.

However, the "internal" ports won't be registered as an Ethernet
device, as we don't want to expose them to the user as an interface.
They are used only for back-to-back communication between Gaudi
devices inside the same box. You can imagine them to be similar to
NVlink, but instead of a proprietary protocol, they run ROCEv2.
Registering them to netdev creates a very poor user experience and
potentially degrades the host CPU performance (I can elaborate more on
that).

For those ports, we want to prevent the user from sending raw Ethernet
data (opening a socket). We also want to avoid the need for the user
to handle them with ifconfig/ethtool/etc. We only want to expose the
IBverbs interface to those ports.

Do you see any issue with that ?

Thanks,
Oded



[Index of Archives]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Photo]     [Yosemite News]     [Yosemite Photos]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux