On Sun, 17 Jan 2021 18:15:31 +0000 Max Gurtovoy <mgurtovoy@xxxxxxxxxx> wrote: > Hi Alex and Cornelia, > > This series split the vfio_pci driver into 2 parts: pci driver and a > subsystem driver that will also be library of code. The pci driver, > vfio_pci.ko will be used as before and it will bind to the subsystem > driver vfio_pci_core.ko to register to the VFIO subsystem. This patchset > if fully backward compatible. This is a typical Linux subsystem > framework behaviour. This framework can be also adopted by vfio_mdev > devices as we'll see in the below sketch. > > This series is coming to solve the issues that were raised in the > previous attempt for extending vfio-pci for vendor specific > functionality: https://lkml.org/lkml/2020/5/17/376 by Yan Zhao. > > This solution is also deterministic in a sense that when a user will > bind to a vendor specific vfio-pci driver, it will get all the special > goodies of the HW. > > This subsystem framework will also ease on adding vendor specific > functionality to VFIO devices in the future by allowing another module > to provide the pci_driver that can setup number of details before > registering to VFIO subsystem (such as inject its own operations). > > Below we can see the proposed changes (this patchset only deals with > VFIO_PCI subsystem but it can easily be extended to VFIO_MDEV subsystem > as well): > > +----------------------------------------------------------------------+ > | | > | VFIO | > | | > +----------------------------------------------------------------------+ > > +--------------------------------+ +--------------------------------+ > | | | | > | VFIO_PCI_CORE | | VFIO_MDEV_CORE | > | | | | > +--------------------------------+ +--------------------------------+ > > +---------------+ +--------------+ +---------------+ +--------------+ > | | | | | | | | > | | | | | | | | > | VFIO_PCI | | MLX5_VFIO_PCI| | VFIO_MDEV | |MLX5_VFIO_MDEV| > | | | | | | | | > | | | | | | | | > +---------------+ +--------------+ +---------------+ +--------------+ > > First 2 patches introduce the above changes for vfio_pci and > vfio_pci_core. > > Patch (3/3) introduces a new mlx5 vfio-pci module that registers to VFIO > subsystem using vfio_pci_core. It also registers to Auxiliary bus for > binding to mlx5_core that is the parent of mlx5-vfio-pci devices. This > will allow extending mlx5-vfio-pci devices with HW specific features > such as Live Migration (mlx5_core patches are not part of this series > that comes for proposing the changes need for the vfio pci subsystem). > > These devices will be seen on the Auxiliary bus as: > mlx5_core.vfio_pci.2048 -> ../../../devices/pci0000:00/0000:00:02.0/0000:05:00.0/0000:06:00.0/0000:07:00.0/mlx5_core.vfio_pci.2048 > mlx5_core.vfio_pci.2304 -> ../../../devices/pci0000:00/0000:00:02.0/0000:05:00.0/0000:06:00.0/0000:07:00.1/mlx5_core.vfio_pci.2304 > > 2048 represents BDF 08:00.0 and 2304 represents BDF 09:00.0 in decimal > view. In this manner, the administrator will be able to locate the > correct vfio-pci module it should bind the desired BDF to (by finding > the pointer to the module according to the Auxiliary driver of that > BDF). I'm not familiar with that auxiliary framework (it seems to be fairly new?); but can you maybe create an auxiliary device unconditionally and contain all hardware-specific things inside a driver for it? Or is that not flexible enough? > > In this way, we'll use the HW vendor driver core to manage the lifecycle > of these devices. This is reasonable since only the vendor driver knows > exactly about the status on its internal state and the capabilities of > its acceleratots, for example. > > TODOs: > 1. For this RFC we still haven't cleaned all vendor specific stuff that > were merged in the past into vfio_pci (such as VFIO_PCI_IG and > VFIO_PCI_NVLINK2). > 2. Create subsystem module for VFIO_MDEV. This can be used for vendor > specific scalable functions for example (SFs). > 3. Add Live migration functionality for mlx5 SNAP devices > (NVMe/Virtio-BLK). > 4. Add Live migration functionality for mlx5 VFs > 5. Add the needed functionality for mlx5_core > > I would like to thank the great team that was involved in this > development, design and internal review: > Oren, Liran, Jason, Leon, Aviad, Shahaf, Gary, Artem, Kirti, Neo, Andy > and others. > > This series applies cleanly on top of kernel 5.11-rc2+ commit 2ff90100ace8: > "Merge tag 'hwmon-for-v5.11-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging" > from Linus. > > Note: Live migration for MLX5 SNAP devices is WIP and will be the first > example for adding vendor extension to vfio-pci devices. As the > changes to the subsystem must be defined as a pre-condition for > this work, we've decided to split the submission for now. > > Max Gurtovoy (3): > vfio-pci: rename vfio_pci.c to vfio_pci_core.c > vfio-pci: introduce vfio_pci_core subsystem driver > mlx5-vfio-pci: add new vfio_pci driver for mlx5 devices > > drivers/vfio/pci/Kconfig | 22 +- > drivers/vfio/pci/Makefile | 16 +- > drivers/vfio/pci/mlx5_vfio_pci.c | 253 +++ > drivers/vfio/pci/vfio_pci.c | 2386 +-------------------------- > drivers/vfio/pci/vfio_pci_core.c | 2311 ++++++++++++++++++++++++++ Especially regarding this diffstat... from a quick glance at patch 3, it mostly forwards to vfio_pci_core anyway. Do you expect a huge amount of device-specific callback invocations? [I have not looked at this in detail yet.] > drivers/vfio/pci/vfio_pci_private.h | 21 + > include/linux/mlx5/vfio_pci.h | 36 + > 7 files changed, 2734 insertions(+), 2311 deletions(-) > create mode 100644 drivers/vfio/pci/mlx5_vfio_pci.c > create mode 100644 drivers/vfio/pci/vfio_pci_core.c > create mode 100644 include/linux/mlx5/vfio_pci.h >