[ add kvm@xxxxxxxxxxxxxxx for VFIO discussion ] On Tue, Mar 16, 2021 at 2:01 AM Greg KH <gregkh@xxxxxxxxxxxxxxxxxxx> wrote: [..] > > Ioctl interface > > Kernel driver provides ioctl interface for user applications to setup and configure dlb domains, ports, queues, scheduling types, credits, > > sequence numbers, and links between ports and queues. Applications also use the interface to start, stop and inquire the dlb operations. > > What applications use any of this? What userspace implementation today > interacts with this? Where is that code located? > > Too many TLAs here, I have even less of an understanding of what this > driver is supposed to be doing, and what this hardware is now than > before. > > And here I thought I understood hardware devices, and if I am confused, > I pity anyone else looking at this code... > > You all need to get some real documentation together to explain > everything here in terms that anyone can understand. Without that, this > code is going nowhere. Hi Greg, So, for the last few weeks Mike and company have patiently waded through my questions and now I think we are at a point to work through the upstream driver architecture options and tradeoffs. You were not alone in struggling to understand what this device does because it is unlike any other accelerator Linux has ever considered. It shards / load balances a data stream for processing by CPU threads. This is typically a network appliance function / protocol, but could also be any other generic thread pool like the kernel's padata. It saves the CPU cycles spent load balancing work items and marshaling them through a thread pool pipeline. For example, in DPDK applications, DLB2 frees up entire cores that would otherwise be consumed with scheduling and work distribution. A separate proof-of-concept, using DLB2 to accelerate the kernel's "padata" thread pool for a crypto workload, demonstrated ~150% higher throughput with hardware employed to manage work distribution and result ordering. Yes, you need a sufficiently high touch / high throughput protocol before the software load balancing overhead coordinating CPU threads starts to dominate the performance, but there are some specific workloads willing to switch to this regime. The primary consumer to date has been as a backend for the event handling in the userspace networking stack, DPDK. DLB2 has an existing polled-mode-userspace driver for that use case. So I said, "great, just add more features to that userspace driver and you're done". In fact there was DLB1 hardware that also had a polled-mode-userspace driver. So, the next question is "what's changed in DLB2 where a userspace driver is no longer suitable?". The new use case for DLB2 is new hardware support for a host driver to carve up device resources into smaller sets (vfio-mdevs) that can be assigned to guests (Intel calls this new hardware capability SIOV: Scalable IO Virtualization). Hardware resource management is difficult to handle in userspace especially when bare-metal hardware events need to coordinate with guest-VM device instances. This includes a mailbox interface for the guest VM to negotiate resources with the host driver. Another more practical roadblock for a "DLB2 in userspace" proposal is the fact that it implements what are in-effect software-defined-interrupts to go beyond the scalability limits of PCI MSI-x (Intel calls this Interrupt Message Store: IMS). So even if hardware resource management was awkwardly plumbed into a userspace daemon there would still need to be kernel enabling for device-specific extensions to drivers/vfio/pci/vfio_pci_intrs.c for it to understand the IMS interrupts of DLB2 in addition to PCI MSI-x. While that still might be solvable in userspace if you squint at it, I don't think Linux end users are served by pushing all of hardware resource management to userspace. VFIO is mostly built to pass entire PCI devices to guests, or in coordination with a kernel driver to describe a subset of the hardware to a virtual-device (vfio-mdev) interface. The rub here is that to date kernel drivers using VFIO to provision mdevs have some existing responsibilities to the core kernel like a network driver or DMA offload driver. The DLB2 driver offers no such service to the kernel for its primary role of accelerating a userspace data-plane. I am assuming here that the padata proof-of-concept is interesting, but not a compelling reason to ship a driver compared to giving end users competent kernel-driven hardware-resource assignment for deploying DLB2 virtual instances into guest VMs. My "just continue in userspace" suggestion has no answer for the IMS interrupt and reliable hardware resource management support requirements. If you're with me so far we can go deeper into the details, but in answer to your previous questions most of the TLAs were from the land of "SIOV" where the VFIO community should be brought in to review. The driver is mostly a configuration plane where the fast path data-plane is entirely in userspace. That configuration plane needs to manage hardware events and resourcing on behalf of guest VMs running on a partitioned subset of the device. There are worthwhile questions about whether some of the uapi can be refactored to common modules like uacce, but I think we need to get to a first order understanding on what DLB2 is and why the kernel has a role before diving into the uapi discussion. Any clearer? So, in summary drivers/misc/ appears to be the first stop in the review since a host driver needs to be established to start the VFIO enabling campaign. With my community hat on, I think requiring standalone host drivers is healthier for Linux than broaching the subject of VFIO-only drivers. Even if, as in this case, the initial host driver is mostly implementing a capability that could be achieved with a userspace driver.