> From: Jason Gunthorpe <jgg@xxxxxxxxxxxx> > Sent: Monday, April 27, 2020 3:14 AM [...] > > technically Scalable IOV is definitely different from SR-IOV. It's > > simpler in hardware. And we're not emulating SR-IOV. The point > > is just in usage-wise we want to present a consistent user > > experience just like passing through a PCI endpoint (PF or VF) device > > through vfio eco-system, including various userspace VMMs (Qemu, > > firecracker, rust-vmm, etc.), middleware (Libvirt), and higher level > > management stacks. > > Yes, I understand your desire, but at the same time we have not been > doing device emulation in the kernel. You should at least be > forthwright about that major change in the cover letters/etc. I searched 'emulate' in kernel/Documentation: Documentation/sound/alsa-configuration.rst (emulate oss on alsa) Documentation/security/tpm/tpm_vtpm_proxy.rst (emulate virtual TPM) Documentation/networking/generic-hdlc.txt (emulate eth on HDLC) Documentation/gpu/todo.rst (generic fbdev emulation) ... I believe the main reason why putting such emulations in kernel is because those emulated device interfaces have their established eco-systems and values which the kernel shouldn't break. As you emphasize earlier, they have good reasons for getting into kernel. Then back to this context. Almost every newly-born Linux VMM (firecracker, crosvm, cloud hypervisor, and some proprietary implementations) support only two types of devices: virtio and vfio, because they want to be simple and slim. Virtio provides a basic set of I/O capabilities required by most VMs, while vfio brings an unified interface for gaining added values or higher performance from assigned devices. Even Qemu supports a minimal configuration ('microvm') now, for similar reason. So the vfio eco-system is significant and represents a major trend in the virtualization space. Then supporting vfio eco-system is actually the usage GOAL of this patch series, instead of an optional technique to be opted. vfio-pci is there for passing through standalone PCI endpoints (PF or VF), and vfio-mdev is there for passing through smaller portion of device resources but sharing the same VFIO interface to gain the uniform support in this eco-system. I believe above is the good reason for putting emulation in idxd driver by using vfio-mdev. Yes, it does imply that there will be more emulations in kernel when more Scalable-IOV (or alike) devices are introduced. But as explained earlier, the pci config space emulation can be largely consolidated and reused. and the remaining device specific MMIO emulation is relatively simple because we define virtual device interface to be same as or even simpler than a VF interface. Only a small set of registers are emulated after fast-path resource is passed through, and such small set of course needs to meet the normal quality requirement for getting into the kernel. We'll definitely highlight this part in future cover letter. 😊 > > > > The only thing we get out of this is someone doesn't have to write a > > > idxd emulation driver in qemu, instead they have to write it in the > > > kernel. I don't see how that is a win for the ecosystem. > > > > No. The clear win is on leveraging classic VFIO iommu and its eco-system > > as explained above. > > vdpa had no problem implementing iommu support without VFIO. This was > their original argument too, it turned out to be erroneous. > Every wheel can be re-invented... my gut-feeling is that vdpa is for offloading fast-path vhost operations to the underlying accelerators. It is just a welcomed/reasonable extension to the existing virtio/vhost eco-system. For other types of devices such as idxd, we rely on the vfio eco-system to catch up fast-evolving VMM spectrum. Thanks Kevin