Re: pull-request: mlx5-next 2023-01-24 V2

Jason Gunthorpe <jgg@xxxxxxxxxx> · Tue, 7 Feb 2023 15:52:59 -0400

On Mon, Feb 06, 2023 at 04:38:41PM -0800, Jakub Kicinski wrote:
> On Mon, 6 Feb 2023 10:58:56 -0400 Jason Gunthorpe wrote:
> > On Fri, Feb 03, 2023 at 05:45:31PM -0800, Jakub Kicinski wrote:
> > > Perfectly irrelevant comparisons :/ How many times do I have to say
> > > that all I'm asking is that you stay away from us and our APIs?  
> > 
> > What I'm reacting to is your remarks that came across as trying to
> > saying that the particular netdev subystem approach to open-ness was
> > in fact the same as the larger Linux values on open source and
> > community.
> >
> > netdev is clearly more restrictive, so is DRM, and that's fine. But it
> > should stay in netdev and not be exported to the rest of the
> > kernel. Eg don't lock away APIs for what are really shared resources.
> 
> I think you're misrepresenting. The DRM example is pertinent.
> The DRM disagreement as I recall it was whether Dave gets to nack
> random drivers in misc/ which are implementing GPU-like functionality
> but do _not_ use DRM APIs.

That isn't what I was thinking about.

The DRM specialness is they are very demanding about having an open
user space. More so than most places in the kernel.

The misc/ argument was about drivers trying to avoid the strict DRM
open user space requirement. In the end Greg agreed that open
userspace was something he wanted for misc too.

DRM tried to use DMABUF as some kind of API wedge, but it didn't
really work out too well.

In the end the fight was ideological around what is open enough to be
inside Linux because the GPU devices were skirting around something of
a grey area in the project's philosophy on how much open user space is
actually required.

> Whether one subsystem can use another subsystem's API over maintainer's
> NACK has a pretty obvious answer.

I would say not, I've never seen this actually aside from netdev vs
rdma. If the APIs are being used wrong, sure, but not for ideological
reasons.

> Good fences make good neighbors so I'd like to build a fence and avoid
> having to discuss this over and over.

I also would like to not discuss this :)

> Everyone is familiar with the term "vendor lock-in". The principles
> I listed are hardly hyperscaler driven.

The hyperscalers brought it to a whole new level. Previously we'd see
industry consortium's try to hammer out some consolidation, now we
quite often see hyperscalers make their own private purchasing
standards and have vendors to use them. I have mixed feelings about
the ecosystem value of private label standardization, especially if
the standard itself is kept secret.

Then of course we see the private standards get quietly implemented in
Linux.

An open source kernel implementation of a private standard for HW that
only one company can purchase that is only usable with a proprietary
userspace. Not exactly what I'd like to see.

> > I'd say here things are more like "lets innovate!" "lets
> > differentiate!" "customers pay a premium for uniquess"
> 
> Which favors complex and hard-to-copy offloads, over
> iterating on incremental common sense improvements.

I wouldn't use such a broad brush, but sure sometimes that is a
direction. More often complex is due to lack of better ideas, nobody
actually wants it to be complex, that just makes it more expensive to
build and more likely to fail..

> FWIW the "sides of the purchasing table" phrasing brings to mind
> industry forums rather than open source communities... Whether Linux
> is turning into an industry forum, and what Joreen would have to say
> about that*.. discussion for another time.

Well, Linux is an industry forum for sure, and it varys how much power
it projects. DRM's principled stand has undoubtedly had a large
impact, for instance.

> > I don't like what I see as a dangerous
> > trend of large cloud operators pushing things into the kernel where
> > the gold standard userspace is kept as some internal proprietary
> > application.
> 
> Curious what you mean here.

Ah, I stumble across stuff from time to time - KVM and related has
some interesting things. Especially with this new confidential compute
stuff. AMD just tried to get something into their mainline iommu
driver to support their out of tree kernel, for instance.

People try to bend the rules all the time.

> > I'm interested in the Linux software - and maintaining the open source
> > ecosystem. I've spent almost my whole career in this kind of space.
> > 
> > So I feel much closer to what I see as Linus's perspective: Bring your
> > open drivers, bring your open userspace, everyone is welcome.
> 
> (*as long as they are on a side of the purchasing table) ?

Naw, "hobbyists" are welcome of course, but I get the feeling that is
getting rarer.

> > Port your essential argument over to the storage world - what would
> > you say if the MTD developers insisted that proprietary NVMe shouldn't
> > be allowed to use "their" block APIs in Linux?
> > 
> > Or the MD/DM developers said no RAID controller drivers were allowed
> > to use "their" block stack?
> > 
> > I think as an overall community we would loose more than we gain.
> > 
> > So, why in your mind is networking so different from storage?
> 
> Networking is about connecting devices. It requires standards,
> interoperability and backward compatibility.
> 
> I'm not an expert on storage but my understanding is that the
> standardization of the internals is limited and seen as unnecessary.
> So there is no real potential for open source implementations of
> disk FW. Movement of data from point (a) to point (b) is not interesting
> either so NVMe is perfectly fine. Developers innovate in filesystems 
> instead.
>
> In networking we have strong standards so you can (and do) write
> open source software all the way down to the PHYs (serdes is where
> things get quite tricky). At the same time movement of data from point
> a to point b is _the_ problem so we need the ability to innovate in
> the transport space.
> 
> Now we have strayed quite far from the initial problem under discussion,
> but you can't say "networking is just like storage" and not expect
> a tirade from a networking guy :-D 

Heh, well, I don't agree with your characterization - from an open
source perspective I wouldn't call any FW "uninteresting", and the
storage device SW internals are super interesting/complicated and full
of incredible innovation.

Even PHYs, at slow speeds, are mostly closed FW running in proprietary
DSPs. netdev has a line they want to innovate at the packet level, but
everything underneath is still basically closed/proprietary.

I think that is great for netdev, but moving the line one OSI level
higher doesn't suddenly create an open source problem either, IMHO.

> > > > You've made it very clear you don't like the RDMA technology, but you
> > > > have no right to try and use your position as a kernel maintainer to
> > > > try and kill it by refusing PRs to shared driver code.  
> > > 
> > > For the n-th time, not my intention. RDMA may be more open than NVMe.
> > > Do your thing. Just do it with your own APIs.  
> > 
> > The standards being implemented broadly require the use of the APIs -
> > particularly the shared IP address.
> 
> No point talking about IP addresses, that ship has sailed.
> I bet the size of both communities was also orders of magnitude
> smaller back then. Different conditions different outcomes.

So, like I said, IP comes with baggage. Where do you draw the line?
What facets of the IP are we allowed to mirror and what are not? How
are you making this seemingly arbitrary decision?

The ipsec patches here have almost 0 impact on netdev because it is a
tiny steering engine configuration. I'd have more sympathy to the
argument if it was consuming a huge API surface to do this.

> We don't support black-box transport offloads in netdev. I thought that
> it'd come across but maybe I should spell it out - just because you
> are welcome in Linux does not mean RDMA devices are welcome in netdev.

Which is why they are not in netdev :) Nobody doubts this.

> As much as we got distracted by our ideological differences over the
> course of this thread - the issue is that I believe we had an agreement
> which was not upheld.
>
> I thought we compromised that to make the full offload sensible in
> netdev world nVidia would implement forwarding to xfrm tunnels using 
> tc rules. You want to add a feature in netdev, it needs to be usable
> in a non-trivial way in netdev. Seems fair.

Yes, and it is on Leon's work list. Notice Leon didn't do this RDMA
IPSEC patches. This is a huge journey for us, there are lots of parts
and several people working on it.

I understood the agreement was that we would do it, not that it done
as the very next thing. Stephen also asked for stuff and Leon is
working on that too.

> The simplest way forward would be to commit to when mlx5 will support
> redirects to xfrm tunnel via tc...

He needs to fix the bugs he created and found first :)

As far as I'm concerned TC will stay on his list until it is done.

Jason