Re: [net-next PATCH 00/15] eth: fbnic: Add network driver for Meta Platforms Host Network Interface

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Apr 5, 2024 at 8:17 AM Jason Gunthorpe <jgg@xxxxxxxxxx> wrote:
>
> On Fri, Apr 05, 2024 at 07:24:32AM -0700, Alexander Duyck wrote:
> > > Alex already indicated new features are coming, changes to the core
> > > code will be proposed. How should those be evaluated? Hypothetically
> > > should fbnic be allowed to be the first implementation of something
> > > invasive like Mina's DMABUF work? Google published an open userspace
> > > for NCCL that people can (in theory at least) actually run. Meta would
> > > not be able to do that. I would say that clearly crosses the line and
> > > should not be accepted.
> >
> > Why not? Just because we are not commercially selling it doesn't mean
> > we couldn't look at other solutions such as QEMU. If we were to
> > provide a github repo with an emulation of the NIC would that be
> > enough to satisfy the "commercial" requirement?
>
> My test is not "commercial", it is enabling open source ecosystem vs
> benefiting only proprietary software.

Sorry, that was where this started where Jiri was stating that we had
to be selling this.

> In my hypothetical you'd need to do something like open source Meta's
> implementation of the AI networking that the DMABUF patches enable,
> and even then since nobody could run it at performance the thing is
> pretty questionable.
>
> IMHO publishing a qemu chip emulator would not advance the open source
> ecosystem around building a DMABUF AI networking scheme.

Well not too many will be able to afford getting the types of systems
and hardware needed for this in the first place. Primarily just your
large data center companies can afford it.

I never said this hardware is about enabling DMABUF. You implied that.
The fact is that this driver is meant to be a pretty basic speeds and
feeds device. We support header split and network flow classification
so I suppose it could be used for DMABUF but by that logic so could a
number of other drivers.

> > > So I think there should be an expectation that technically sound things
> > > Meta may propose must not be accepted because they cross the
> > > ideological red line into enabling only proprietary software.
> >
> > That is a faulty argument. That is like saying we should kick out the
> > nouveu driver out of Linux just because it supports Nvidia graphics
> > cards that happen to also have a proprietary out-of-tree driver out
> > there,
>
> Huh? nouveau supports a fully open source mesa graphics stack in
> Linux. How is that remotely similar to what I said? No issue.

Right, nouveau is fully open source. That is what I am trying to do
with fbnic. That is what I am getting at. This isn't connecting to
some proprietary stack or engaging in any sort of bypass. It is going
through the standard networking stack. If there were some other
out-of-tree driver for this to support some other use case how would
that impact the upstream patch submission?

This driver is being NAKed for enabling stuff that hasn't even been
presented. It is barely enough driver to handle PXE booting which is
needed to be able to even load an OS on the system. Yet somehow
because you are expecting a fork to come in at some point to support
DMABUF you are wanting to block it outright. How about rather than
doing that we wait until there is something there that is
objectionable before we start speculating on what may be coming.

> You pointed at two things that I would consider to be exemplar open
> source projects and said their existance somehow means we should be
> purging drivers from the kernel???
>
> I really don't understand what you are trying to say at all.

I'm trying to say that both those projects are essentially doing the
same thing you are accusing fbnic of doing, even though I am exposing
no non-standard API(s) and everything is open source. You are
projecting future changes onto this driver that don't currently and
may never exist.

> The kernel standard is that good quality open source *does* exist, we
> tend to not care what proprietary things people create beyond that.

Now I am confused. You say you don't care what happens later, but you
seem to be insisting you care about what proprietary things will be
done with it after it is upstreamed.

> > I can't think of many NIC vendors that don't have their own
> > out-of-tree drivers floating around with their own kernel bypass
> > solutions to support proprietary software.
>
> Most of those are also open source, and we can't say much about what
> people do out of tree, obviously.

Isn't that exactly what you are doing though with all your
"proprietary" comments?

> > I agree. We need a consistent set of standards. I just strongly
> > believe commercial availability shouldn't be one of them.
>
> I never said commercial availability. I talked about open source vs
> proprietary userspace. This is very standard kernel stuff.
>
> You have an unavailable NIC, so we know it is only ever operated with
> Meta's proprietary kernel fork, supporting Meta's proprietary
> userspace software. Where exactly is the open source?

It depends on your definition of "unavailable". I could argue that for
many most of the Mellanox NICs are also have limited availability as
they aren't exactly easy to get a hold of without paying a hefty
ransom.

The NIC is currently available to developers within Meta. As such I
know there are not a small number of kernel developers who could get
access to it if they asked for a login to one of our test and
development systems. Also I offered to provide the QEMU repo, but you
said you had no interest in that option.

> Why should someone working to improve only their proprietary
> environment be welcomed in the same way as someone working to improve
> the open source ecosystem? That has never been the kernel communities
> position.

To quote Linus `I do not see open source as some big goody-goody
"let's all sing kumbaya around the campfire and make the world a
better place". No, open source only really works if everybody is
contributing for their own selfish reasons.`[1]

How is us using our own NIC any different than if one of the vendors
were to make a NIC exclusively for us or any other large data center?
The only reason why this is coming up is because Meta is not a typical
NIC vendor but normally a consumer. The fact that we will be
dogfooding our own NIC seems to be at the heart of the issue here.

Haven't there been a number of maintainers who end up maintaining code
bases in the kernel for platforms and/or devices where they own one of
the few devices available in the world? How would this be any
different. Given enough time it is likely this will end up in the
hands of those outside Meta anyway, at that point the argument would
be moot.

> If you want to propose things to the kernel that can only be
> meaningfully used by your proprietary software then you should not
> expect to succeed. No one should be surprised to hear this.

If the whole idea is to get us to run a non-proprietary stack nothing
sends the exact opposite message like telling us we cannot upstream a
simple network driver because of a "what if" about some DMABUF patch
set from Google. All I am asking for is the ability to net install a
system with this device. That requires the driver to be available in
the provisioning kernel image, so thus why I am working to upstream it
as I would rather not have to maintain an out-of-tree kernel driver.

The argument here isn't about proprietary software. It is proprietary
hardware that seems to be the issue, or at least that is where it
started. The driver itself anyone could load, build, or even run on
QEMU as I mentioned. It is open source and not exposing any new APIs.
The issue seems to be with the fact that the NIC can't be bought from
a vendor and instead Meta is building the NIC for it's own
consumption.

As far as the software stack the concern about DMABUF seems like an
orthogonal argument that should be had at the userspace/API level and
doesn't directly relate to any specific driver. As has been pointed
out enabling anything like that wouldn't be a single NIC solution and
to be accepted upstream it should be implemented on at least 2
different vendor drivers.  Additionally, there isn't anything unique
about this hardware that would make it more capable of enabling that
than any other device.

Thanks,

- Alex

[1]: https://www.bbc.com/news/technology-18419231





[Index of Archives]     [DMA Engine]     [Linux Coverity]     [Linux USB]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Greybus]

  Powered by Linux