On 3/21/2021 12:45 PM, Jason Gunthorpe wrote:
On Sun, Mar 21, 2021 at 12:24:39PM -0400, Dennis Dalessandro wrote:
On 3/21/2021 4:56 AM, Leon Romanovsky wrote:
On Sat, Mar 20, 2021 at 12:39:46PM -0400, Dennis Dalessandro wrote:
On 3/19/2021 6:57 PM, Rimmer, Todd wrote:
[Wan, Kaike] Incorrect. The rv module works with hfi1.
<...>
We have removed all GPU specific code from the upstream submission, but used both the "alignment holes" and the "reserved"
mechanisms to hold places for GPU specific fields which can't be upstreamed.
This problem extends to other drivers as well. I'm also interested in advice
on the situation. I don't particularly like this either, but we need a way
to accomplish the goal. We owe it to users to be flexible. Please offer
suggestions.
Sorry to interrupt, but it seems that solution was said here [1].
It wasn't said directly, but between the lines it means that you need
two things:
1. Upstream everything.
Completely agree. However the GPU code from nvidia is not upstream. I don't
see that issue getting resolved in this code review. Let's move on.
2. Find another vendor that jumps on this RV bandwagon.
That's not a valid argument. Clearly this works over multiple vendors HW.
At a certain point we have to decide if this is a generic code of some
kind or a driver-specific thing like HFI has.
There are also obvious technial problems designing it as a ULP, so it
is a very important question to answer. If it will only ever be used
by one Intel ethernet chip then maybe it isn't really generic code.
Todd/Kaike, is there something in here that is specific to the Intel
ethernet chip?
On the other hand it really looks like it overlaps in various ways
with both RDS and the qib/hfi1 cdev, so why isn't there any effort to
have some commonality??
Maybe that's something that should be explored. Isn't this along the
lines of stuff we talked about with the verbs 2.0 stuff, or whatever we
ended up calling it.
We should be trying to get things upstream, not putting up walls when people
want to submit new drivers. Calling code "garbage" [1] is not productive.
Putting a bunch of misaligned structures and random reserved fields
*is* garbage by the upstream standard and if I send that to Linus I'll
get yelled at.
Not saying you should send this to Linus. I'm saying we should figure
out a way to make it better and insulting people and their hard work
isn't helping. This is the kind of culture we are trying to get away
from in the kernel world.
And you certainly can't say "we are already shipping this ABI so we
won't change it" either.
You can't square this circle by compromising the upstream world in any
way, it is simply not accepted by the overall community.
All of you should know this, I shouldn't have to lecture on this!
No one is suggesting to compromise the upstream world. There is a bigger
picture here. The answer for this driver may just be take out the
reserved stuff. That's pretty simple. The bigger question is how do we
deal with non-upstream code. It can't be a problem unique to the RDMA
subsystem. How do others deal with it?
Also no, we should not be "trying to get things upstream" as some goal
in itself. Upstream is not a trashcan to dump stuff into, someone has
to maintain all of this long after it stops being interesting, so
there better be good reasons to put it here in the first place.
That is completely not what I meant at all. I mean we should be trying
to get rid of the proprietary, and out of tree stuff. It doesn't at all
imply to fling crap against the wall and hope it sticks. We should be
encouraging vendors to submit their code and work with them to get it in
shape. We clearly have a problem with vendor proprietary code not being
open. Let's not encourage that behavior. Vendors should say I want to
submit my code to the Linux kernel. Not eh, it's too much of a hassle
and kernel people are jerks, so we'll just post it on our website.
If it isn't obvious, I'll repeat again: I'm highly annoyed that Intel
is sending something like this RV, in the state it is in, to support
their own out of tree driver, that they themselves have been dragging
their feet on responding to review comments so it can be upstream for
*years*.
To be fair it is sent as RFC. So to me that says they know it's a ways
off from being ready to be included and are starting the process early.
-Denny