On Fri, Mar 07, 2025 at 01:01:50AM +0200, Nikolay Aleksandrov wrote: > Hi all, > This patch-set introduces minimal Ultra Ethernet driver infrastructure and > the lowest Ultra Ethernet sublayer - the Packet Delivery Sublayer (PDS), > which underpins the entire communication model of the Ultra Ethernet > Transport[1] (UET). Ultra Ethernet is a new RDMA transport designed for > efficient AI and HPC communication. I was away while this discussion happened so I've gone through and read the threads, looked at the patches and I don't think I've changed my view since I talked to Enfabrica privately on this topic almost a year ago. I do not agree with creating a new subsystem (or whatever you are calling drivers/ultraeth) for a single RDMA protocol and see nothing new here to change my mind. I would likely NAK the direction I see in this RFC, as I have other past attempts to build RDMA HW interfaces outside of the RDMA subystem. Since none of that past discussion seems to have been acknowledged or rebutted in this series I will repeat the main points: 1) I'm aware of something like 5-7 new protocols that are competing for the same market as Ultra Ethernet. We can't give everyone and their dog a new subsystem (or whatever) and all the maintainability negatives that come with that. As a matter of maintainability we need to see consolidation here, not fragmentation! Yes, UE is a consortium driven standard, which is unique and a big positive, but I don't believe anyone can say for certain what direction the industry is going to go in. Many consortium standards have failed to get adoption in the past even with a large number of member companies. Nor can we know what concepts in UE are going to be copied into other competing RDMA transports. See my other remarks on job key for an example. Prematurely siloing stuff in drivers/ultraeth is very much the wrong technical direction for maintainability. That said, I think UE should be in the kernel and have a fair chance to compete for market share. Just in a maintainable and appropriate way while the industry evolves. 2) Due to the above, I'm pretty confident we will see RDMA NICs supporting a lot of different protocols. In fact they already do. From a kernel maintainability perspective we really want one RDMA driver leveraging as much common infrastructure between the protocols as possible. We do not want to see a single HW driver further split up needlessly to other subsystems, that would be a big maintainability downside. To put a clear point on this, mlx5 has been gaining new protocols and fitting into the existing driver model for a number of years now. In fact there is speculation that UE could be implemented in mlx5 RDMA with minimal kernel changes. There would be no reason to try to mess up the driver to also interact with this stuff in drivers/ultraeth as seems to be proposed here. I think other HW will be similar. UE isn't so radically different that every HW path will need to diverge from classical RDMA. Nor is is so dissimilar to other competing proposals. We don't want artificial differences we want to create things that can be re-used when appropriate. Leon's response to Bart is correct, we already have similar examples of almost everything UE does. Bart is also correct that verbs would be a PITA, but RDMA userspace has moved beyond verbs limitations years ago now. Alot of mlx5 stuff is not using verbs today, for instance. EFA and other examples use extensive stuff beyond verbs. 3) Building a user/kernel split HW driver model is very hard. RDMA has spent 20 years learning how to do this and making alot of mistakes along the way. I think we are in a good place now as alot of new functionality has been rolled out with very little stress in the past few years. I see no reason to believe UE would not follow that same pattern. Frankly, I see no evidence in this RFC of any of that learning. Probably because it doesn't actually show any HW or even seem to contemplate what HW would even look like. There isn't even a call to pin_user_pages() in this RFC. You can't call yourself *RDMA* if you are not doing direct access to userspace memory! So, this RFC is woefully incomplete. I think you greatly underestimate how much work you are looking at to duplicate and re-invent the existing RDMA infrastructure. Frankly I'm not even sure why you sent this RFC when it doesn't show enough to even evaluate.. 4) For example, I get the feeling this RFC is repeating the original cardinal sin of RDMA by biasing the UAPI design toward a single philosophy. Ie you said: > I should've been more specific - it is not an issue for UEC and the way > our driver's netlink API is designed. We fully understand the pros and > cons of our approach. Which is exactly the kind of narrow thinking that creates long term trouble in uAPI design. Do your choices actually work for *ALL* future HW designs and others drivers not just "our drivers netlink"? I think not. Given UE spec doesn't even have something pretending to be a kernel/user interface standard I think we will see an extreme variety of HW implementations here. The proven modern RDMA approach to uAPI design is the right way to solve this problem. It is shown to work. It already implements multi-protocol RDMA and has alot of drivers demonstrating it now. 5) RDMA actually has pretty good infrastructure. It has alot of complex infrastructure features, for example see the long threads I recently wrote on how it's hot plug architecture works. Even "basic" things like mmaping a doorbell page have thousands of lines of support infrastructure to make the drivers work well and support enterprise level HA features. You get to have these features if you write a RDMA driver. Otherwise you have to clone them all. From what I can tell in this RFC the implementations of basic things like the object model are worse that what we have in RDMA already. Things like a device model don't even exist. Let alone advanced stuff like hot plug, namespace, crgoups, DMA operations and all the stuff needed for HW bindings. It has a *long* way to go to even reach feature parity in terms of what the core RDMA device model and object model provides a HW driver, let alone complex things like uverbs :\ This whole RFC reeks of NIH: it is more fun to go off and do something greenfield than do the maintenance work to evolve an existing code base. 6) I offered many things, including not having to use libibverbs, adding someone to maintain the UE specific portions, and helping to architect the solution within RDMA. So it is not like there is some blocker that is forcing a drivers/ultraeth, or that someone has even said no to any proposal made. For instance I spent alot of time with the Habana labs guys to work out how to fit their almost-RDMA stuff into RDMA. It required some careful thinking to accommodate their limited HW, but in the end it did manage to fit in fine. They also started as you did here with some weird thing. In the end we all agreed that RDMA HW support belongs in the RDMA subsystem, using normal RDMA APIs. We are trying not to proliferate these things. I feel like this is repeating the drivers/accel vs DRM debate from a few years ago. All the points DaveA made apply here just as well, arguably even more so as RDMA has even more robust shared infrastructure that should be used instead of re-invented. At least Habana had a reason for accel - they wanted to skip some DRM rules. This RFC doesn't even have that. Thus, I don't expect you will get support for something like this to be merged, you should change directions. Jason