On Thu, 2024-07-11 at 10:50 -0300, Jason Gunthorpe wrote: > On Tue, Jul 09, 2024 at 09:02:25AM -0700, James Bottomley wrote: > > > For NVMe and net we do have SPDK and DPDK. What I find is that > > people tend to use them for niche use cases (like the NVMe KV > > command set) or obscure network routers. Even though the claim > > they both make is to get the kernel out of the way and do stuff > > "way faster" the difficulty they create by bypassing everything is > > quite a high burden. > > [..] > > > What all of the prior pass through's taught us is that if the use > > case is big enough it will get pulled into the kernel and the > > kernel will usually manage it better (DB users). If it remains a > > niche use case it will likely remain out of the kernel, but we > > won't be hurt by it (NVME KV protocol) and sometimes it doesn't > > really matter and the device manufacturers will sort it out on > > their own (USB tokens). > > I don't see it as being linked to big enough use case at all. 'it' being fwctl? I'm happy to take a wait and see approach with that. I'm in the camp that doesn't squash a novel proposal because the kernel should be controlling it. I'm confident that if the use case becomes big enough the kernel will likely do it in the end. > The kernel gets involved if there are good technical reasons to do > so. Databases running over real filesystems with O_DIRECT is really > technically better than raw block devices. Exactly for a whole host of performance and more particularly management reasons. > While DPDK shows the opposite, userspace is the technically better > option. This is now shown at scale. DPDK is not some niche. A big > chunk of internet traffic is going through DPDKs, especially for > mobile. Many ORAN solutions include DPDK on Linux. ORAN being Open Radio Access Network? But that's a case in point: the kernel doesn't have a LTE stack or APN handling for networking. RAN hardware is not very widespread outside of cell providers, meaning it doesn't get a lot of widespread exposure. However, I believe Osmocom is trying to change this (giving linux a native stack instead of DPDK) and they're on record as referring to DPDK as the "Rabbit Hole". To look at a counter example: how many linux based router boxes (i.e. hardware based not cloud based) actually use DPDK? I have a huge list of cloud projects that overran and got cancelled because they decided to use DPDK to replace a function the kernel was already doing (because faster) and then found that if you replace function X in the kernel generally the rest of the alphabet needs replacing as well, which blows your project deadlines. That's not to say there aren't a whole host of uses for DPDK: novel protocols, traffic classification experiments, etc. It's just that it's like this honeytrap for the unwary and any project that comes along with DPDK somewhere in its spec immediately gets extra scrutiny (it's not that we don't do them, just that we make sure there's a genuine use case that isn't reinventing what the kernel already does). > What has been improved kernel-side is the intergation. DPDK > deployments now often use RDMA raw queue pairs instead of VFIO, which > laregly eliminates the "high burden". > > There are many other cases, like DPDK, where the right answer is to > reduce the kernel involvement. It is not so simple that things always > get pulled into the kernel. I don't disagree: there are many novel protocols and other use cases that will never make it into the kernel simply because they won't get the adoption; they're all ideal candidates for DPDK. However, I do take issue with "reduce kernel involvement" that's what gives rise to projects that start by rewriting a piece of kernel networking in DPDK and get sucked down the Rabbit Hole and never come out the other side. James