I hope and expect the nova and vgpu_mgr efforts to ultimately converge. First, for the fw ABI debacle: yes, it is unfortunate that we still don't have a stable ABI from GSP. We /are/ working on it, though there isn't anything to show, yet. FWIW, I expect the end result will be a much simpler interface than what is there today, and a stable interface that NVIDIA can guarantee. But, for now, we have a timing problem like Jason described: - We have customers eager for upstream vfio support in the near term, and that seems like something NVIDIA can develop/contribute/maintain in the near term, as an incremental step forward. - Nova is still early in its development, relative to nouveau/nvkm. - From NVIDIA's perspective, we're nervous about the backportability of rust-based components to enterprise kernels in the near term. - The stable GSP ABI is not going to be ready in the near term. I agree with what Dave said in one of the forks of this thread, in the context of NV2080_CTRL_VGPU_MGR_INTERNAL_BOOTLOAD_GSP_VGPU_PLUGIN_TASK_PARAMS: > The GSP firmware interfaces are not guaranteed stable. Exposing these > interfaces outside the nvkm core is unacceptable, as otherwise we > would have to adapt the whole kernel depending on the loaded firmware. > > You cannot use any nvidia sdk headers, these all have to be abstracted > behind things that have no bearing on the API. Agreed. Though not infinitely scalable, and not as clean as in rust, it seems possible to abstract NV2080_CTRL_VGPU_MGR_INTERNAL_BOOTLOAD_GSP_VGPU_PLUGIN_TASK_PARAMS behind a C-implemented abstraction layer in nvkm, at least for the short term. Is there a potential compromise where vgpu_mgr starts its life with a dependency on nvkm, and as things mature we migrate it to instead depend on nova? On Thu, Sep 26, 2024 at 11:40:57AM -0300, Jason Gunthorpe wrote: > On Thu, Sep 26, 2024 at 02:54:38PM +0200, Greg KH wrote: > > > That's fine, but again, do NOT make design decisions based on what you > > can, and can not, feel you can slide by one of these companies to get it > > into their old kernels. That's what I take objection to here. > > It is not slide by. It is a recognition that participating in the > community gives everyone value. If you excessively deny value from one > side they will have no reason to participate. > > In this case the value is that, with enough light work, the > kernel-fork community can deploy this code to their users. This has > been the accepted bargin for a long time now. > > There is a great big question mark over Rust regarding what impact it > actually has on this dynamic. It is definitely not just backport a few > hundred upstream patches. There is clearly new upstream development > work needed still - arch support being a very obvious one. > > > Also always remember please, that the % of overall Linux kernel > > installs, even counting out Android and embedded, is VERY tiny for these > > companies. The huge % overall is doing the "right thing" by using > > upstream kernels. And with the laws in place now that % is only going > > to grow and those older kernels will rightfully fall away into even > > smaller %. > > Who is "doing the right thing"? That is not what I see, we sell > server HW to *everyone*. There are a couple sites that are "near" > upstream, but that is not too common. Everyone is running some kind of > kernel fork. > > I dislike this generalization you do with % of users. Almost 100% of > NVIDIA server HW are running forks. I would estimate around 10% is > above a 6.0 baseline. It is not tiny either, NVIDIA sold like $60B of > server HW running Linux last year with this kind of demographic. So > did Intel, AMD, etc. > > I would not describe this as "VERY tiny". Maybe you mean RHEL-alike > specifically, and yes, they are a diminishing install share. However, > the hyperscale companies more than make up for that with their > internal secret proprietary forks :( > > > > Otherwise, let's slow down here. Nova is still years away from being > > > finished. Nouveau is the in-tree driver for this HW. This series > > > improves on Nouveau. We are definitely not at the point of refusing > > > new code because it is not writte in Rust, RIGHT? > > > > No, I do object to "we are ignoring the driver being proposed by the > > developers involved for this hardware by adding to the old one instead" > > which it seems like is happening here. > > That is too harsh. We've consistently taken a community position that > OOT stuff doesn't matter, and yes that includes OOT stuff that people > we trust and respect are working on. Until it is ready for submission, > and ideally merged, it is an unknown quantity. Good well meaning > people routinely drop their projects, good projects run into > unexpected roadblocks, and life happens. > > Nova is not being ignored, there is dialog, and yes some disagreement. > > Again, nobody here is talking about disrupting Nova. We just want to > keep going as-is until we can all agree together it is ready to make a > change. > > Jason