Hi Kevin, On 4/18/22 10:49 AM, Tian, Kevin wrote: >> From: Liu, Yi L <yi.l.liu@xxxxxxxxx> >> Sent: Thursday, April 14, 2022 6:47 PM >> >> With the introduction of iommufd[1], the linux kernel provides a generic >> interface for userspace drivers to propagate their DMA mappings to kernel >> for assigned devices. This series does the porting of the VFIO devices >> onto the /dev/iommu uapi and let it coexist with the legacy implementation. >> Other devices like vpda, vfio mdev and etc. are not considered yet. > vfio mdev has no special support in Qemu. Just that it's not supported > by iommufd yet thus can only be operated in legacy container interface at > this point. Later once it's supported by the kernel suppose no additional > enabling work is required for mdev in Qemu. > >> For vfio devices, the new interface is tied with device fd and iommufd >> as the iommufd solution is device-centric. This is different from legacy >> vfio which is group-centric. To support both interfaces in QEMU, this >> series introduces the iommu backend concept in the form of different >> container classes. The existing vfio container is named legacy container >> (equivalent with legacy iommu backend in this series), while the new >> iommufd based container is named as iommufd container (may also be >> mentioned >> as iommufd backend in this series). The two backend types have their own >> way to setup secure context and dma management interface. Below diagram >> shows how it looks like with both BEs. >> >> VFIO AddressSpace/Memory >> +-------+ +----------+ +-----+ +-----+ >> | pci | | platform | | ap | | ccw | >> +---+---+ +----+-----+ +--+--+ +--+--+ +----------------------+ >> | | | | | AddressSpace | >> | | | | +------------+---------+ >> +---V-----------V-----------V--------V----+ / >> | VFIOAddressSpace | <------------+ >> | | | MemoryListener >> | VFIOContainer list | >> +-------+----------------------------+----+ >> | | >> | | >> +-------V------+ +--------V----------+ >> | iommufd | | vfio legacy | >> | container | | container | >> +-------+------+ +--------+----------+ >> | | >> | /dev/iommu | /dev/vfio/vfio >> | /dev/vfio/devices/vfioX | /dev/vfio/$group_id >> Userspace | | >> >> ===========+============================+======================= >> ========= >> Kernel | device fd | >> +---------------+ | group/container fd >> | (BIND_IOMMUFD | | (SET_CONTAINER/SET_IOMMU) >> | ATTACH_IOAS) | | device fd >> | | | >> | +-------V------------V-----------------+ >> iommufd | | vfio | >> (map/unmap | +---------+--------------------+-------+ >> ioas_copy) | | | map/unmap >> | | | >> +------V------+ +-----V------+ +------V--------+ >> | iommfd core | | device | | vfio iommu | >> +-------------+ +------------+ +---------------+ > last row: s/iommfd/iommufd/ > > overall this sounds a reasonable abstraction. Later when vdpa starts > supporting iommufd probably the iommufd BE will become even > smaller with more logic shareable between vfio and vdpa. > >> [Secure Context setup] >> - iommufd BE: uses device fd and iommufd to setup secure context >> (bind_iommufd, attach_ioas) >> - vfio legacy BE: uses group fd and container fd to setup secure context >> (set_container, set_iommu) >> [Device access] >> - iommufd BE: device fd is opened through /dev/vfio/devices/vfioX >> - vfio legacy BE: device fd is retrieved from group fd ioctl >> [DMA Mapping flow] >> - VFIOAddressSpace receives MemoryRegion add/del via MemoryListener >> - VFIO populates DMA map/unmap via the container BEs >> *) iommufd BE: uses iommufd >> *) vfio legacy BE: uses container fd >> >> This series qomifies the VFIOContainer object which acts as a base class > what does 'qomify' mean? I didn't find this word from dictionary... sorry this is pure QEMU terminology. This stands for "QEMU Object Model" additional info at: https://qemu.readthedocs.io/en/latest/devel/qom.html Eric > >> for a container. This base class is derived into the legacy VFIO container >> and the new iommufd based container. The base class implements generic >> code >> such as code related to memory_listener and address space management >> whereas >> the derived class implements callbacks that depend on the kernel user space > 'the kernel user space'? > >> being used. >> >> The selection of the backend is made on a device basis using the new >> iommufd option (on/off/auto). By default the iommufd backend is selected >> if supported by the host and by QEMU (iommufd KConfig). This option is >> currently available only for the vfio-pci device. For other types of >> devices, it does not yet exist and the legacy BE is chosen by default. >> >> Test done: >> - PCI and Platform device were tested > In this case PCI uses iommufd while platform device uses legacy? > >> - ccw and ap were only compile-tested >> - limited device hotplug test >> - vIOMMU test run for both legacy and iommufd backends (limited tests) >> >> This series was co-developed by Eric Auger and me based on the exploration >> iommufd kernel[2], complete code of this series is available in[3]. As >> iommufd kernel is in the early step (only iommufd generic interface is in >> mailing list), so this series hasn't made the iommufd backend fully on par >> with legacy backend w.r.t. features like p2p mappings, coherency tracking, > what does 'coherency tracking' mean here? if related to iommu enforce > snoop it is fully handled by the kernel so far. I didn't find any use of > VFIO_DMA_CC_IOMMU in current Qemu. > >> live migration, etc. This series hasn't supported PCI devices without FLR >> neither as the kernel doesn't support VFIO_DEVICE_PCI_HOT_RESET when >> userspace >> is using iommufd. The kernel needs to be updated to accept device fd list for >> reset when userspace is using iommufd. Related work is in progress by >> Jason[4]. >> >> TODOs: >> - Add DMA alias check for iommufd BE (group level) >> - Make pci.c to be BE agnostic. Needs kernel change as well to fix the >> VFIO_DEVICE_PCI_HOT_RESET gap >> - Cleanup the VFIODevice fields as it's used in both BEs >> - Add locks >> - Replace list with g_tree >> - More tests >> >> Patch Overview: >> >> - Preparation: >> 0001-scripts-update-linux-headers-Add-iommufd.h.patch >> 0002-linux-headers-Import-latest-vfio.h-and-iommufd.h.patch >> 0003-hw-vfio-pci-fix-vfio_pci_hot_reset_result-trace-poin.patch >> 0004-vfio-pci-Use-vbasedev-local-variable-in-vfio_realize.patch >> 0005-vfio-common-Rename-VFIOGuestIOMMU-iommu-into- >> iommu_m.patch > 3-5 are pure cleanups which could be sent out separately > >> 0006-vfio-common-Split-common.c-into-common.c-container.c.patch >> >> - Introduce container object and covert existing vfio to use it: >> 0007-vfio-Add-base-object-for-VFIOContainer.patch >> 0008-vfio-container-Introduce-vfio_attach-detach_device.patch >> 0009-vfio-platform-Use-vfio_-attach-detach-_device.patch >> 0010-vfio-ap-Use-vfio_-attach-detach-_device.patch >> 0011-vfio-ccw-Use-vfio_-attach-detach-_device.patch >> 0012-vfio-container-obj-Introduce-attach-detach-_device-c.patch >> 0013-vfio-container-obj-Introduce-VFIOContainer-reset-cal.patch >> >> - Introduce iommufd based container: >> 0014-hw-iommufd-Creation.patch >> 0015-vfio-iommufd-Implement-iommufd-backend.patch >> 0016-vfio-iommufd-Add-IOAS_COPY_DMA-support.patch >> >> - Add backend selection for vfio-pci: >> 0017-vfio-as-Allow-the-selection-of-a-given-iommu-backend.patch >> 0018-vfio-pci-Add-an-iommufd-option.patch >> >> [1] https://lore.kernel.org/kvm/0-v1-e79cd8d168e8+6- >> iommufd_jgg@xxxxxxxxxx/ >> [2] https://github.com/luxis1999/iommufd/tree/iommufd-v5.17-rc6 >> [3] https://github.com/luxis1999/qemu/tree/qemu-for-5.17-rc6-vm-rfcv1 >> [4] https://lore.kernel.org/kvm/0-v1-a8faf768d202+125dd- >> vfio_mdev_no_group_jgg@xxxxxxxxxx/ > Following is probably more relevant to [4]: > > https://lore.kernel.org/all/10-v1-33906a626da1+16b0-vfio_kvm_no_group_jgg@xxxxxxxxxx/ > > Thanks > Kevin >