Hi, On 4/18/22 2:09 PM, Yi Liu wrote: > Hi Kevin, > > On 2022/4/18 16:49, Tian, Kevin wrote: >>> From: Liu, Yi L <yi.l.liu@xxxxxxxxx> >>> Sent: Thursday, April 14, 2022 6:47 PM >>> >>> With the introduction of iommufd[1], the linux kernel provides a >>> generic >>> interface for userspace drivers to propagate their DMA mappings to >>> kernel >>> for assigned devices. This series does the porting of the VFIO devices >>> onto the /dev/iommu uapi and let it coexist with the legacy >>> implementation. >>> Other devices like vpda, vfio mdev and etc. are not considered yet. >> >> vfio mdev has no special support in Qemu. Just that it's not supported >> by iommufd yet thus can only be operated in legacy container >> interface at >> this point. Later once it's supported by the kernel suppose no >> additional >> enabling work is required for mdev in Qemu. > > yes. will make it more precise in next version. > >>> >>> For vfio devices, the new interface is tied with device fd and iommufd >>> as the iommufd solution is device-centric. This is different from >>> legacy >>> vfio which is group-centric. To support both interfaces in QEMU, this >>> series introduces the iommu backend concept in the form of different >>> container classes. The existing vfio container is named legacy >>> container >>> (equivalent with legacy iommu backend in this series), while the new >>> iommufd based container is named as iommufd container (may also be >>> mentioned >>> as iommufd backend in this series). The two backend types have their >>> own >>> way to setup secure context and dma management interface. Below diagram >>> shows how it looks like with both BEs. >>> >>> VFIO AddressSpace/Memory >>> +-------+ +----------+ +-----+ +-----+ >>> | pci | | platform | | ap | | ccw | >>> +---+---+ +----+-----+ +--+--+ +--+--+ >>> +----------------------+ >>> | | | | | >>> AddressSpace | >>> | | | | >>> +------------+---------+ >>> +---V-----------V-----------V--------V----+ / >>> | VFIOAddressSpace | <------------+ >>> | | | MemoryListener >>> | VFIOContainer list | >>> +-------+----------------------------+----+ >>> | | >>> | | >>> +-------V------+ +--------V----------+ >>> | iommufd | | vfio legacy | >>> | container | | container | >>> +-------+------+ +--------+----------+ >>> | | >>> | /dev/iommu | /dev/vfio/vfio >>> | /dev/vfio/devices/vfioX | /dev/vfio/$group_id >>> Userspace | | >>> >>> ===========+============================+======================= >>> ========= >>> Kernel | device fd | >>> +---------------+ | group/container fd >>> | (BIND_IOMMUFD | | (SET_CONTAINER/SET_IOMMU) >>> | ATTACH_IOAS) | | device fd >>> | | | >>> | +-------V------------V-----------------+ >>> iommufd | | vfio | >>> (map/unmap | +---------+--------------------+-------+ >>> ioas_copy) | | | map/unmap >>> | | | >>> +------V------+ +-----V------+ +------V--------+ >>> | iommfd core | | device | | vfio iommu | >>> +-------------+ +------------+ +---------------+ >> >> last row: s/iommfd/iommufd/ > > thanks. a typo. > >> overall this sounds a reasonable abstraction. Later when vdpa starts >> supporting iommufd probably the iommufd BE will become even >> smaller with more logic shareable between vfio and vdpa. > > let's see if Jason Wang will give some idea. :-) > >>> >>> [Secure Context setup] >>> - iommufd BE: uses device fd and iommufd to setup secure context >>> (bind_iommufd, attach_ioas) >>> - vfio legacy BE: uses group fd and container fd to setup secure >>> context >>> (set_container, set_iommu) >>> [Device access] >>> - iommufd BE: device fd is opened through /dev/vfio/devices/vfioX >>> - vfio legacy BE: device fd is retrieved from group fd ioctl >>> [DMA Mapping flow] >>> - VFIOAddressSpace receives MemoryRegion add/del via MemoryListener >>> - VFIO populates DMA map/unmap via the container BEs >>> *) iommufd BE: uses iommufd >>> *) vfio legacy BE: uses container fd >>> >>> This series qomifies the VFIOContainer object which acts as a base >>> class >> >> what does 'qomify' mean? I didn't find this word from dictionary... >> >>> for a container. This base class is derived into the legacy VFIO >>> container >>> and the new iommufd based container. The base class implements generic >>> code >>> such as code related to memory_listener and address space management >>> whereas >>> the derived class implements callbacks that depend on the kernel >>> user space >> >> 'the kernel user space'? > > aha, just want to express different BE callbacks will use different > user interface exposed by kernel. will refine the wording. > >> >>> being used. >>> >>> The selection of the backend is made on a device basis using the new >>> iommufd option (on/off/auto). By default the iommufd backend is >>> selected >>> if supported by the host and by QEMU (iommufd KConfig). This option is >>> currently available only for the vfio-pci device. For other types of >>> devices, it does not yet exist and the legacy BE is chosen by default. >>> >>> Test done: >>> - PCI and Platform device were tested >> >> In this case PCI uses iommufd while platform device uses legacy? > > For PCI, both legacy and iommufd were tested. The exploration kernel > branch doesn't have the new device uapi for platform device, so I > didn't test it. > But I remember Eric should have tested it with iommufd. Eric? No I just ran non regression tests for vfio-platform, in legacy mode. I did not integrate with the new device uapi for platform device. > >>> - ccw and ap were only compile-tested >>> - limited device hotplug test >>> - vIOMMU test run for both legacy and iommufd backends (limited tests) >>> >>> This series was co-developed by Eric Auger and me based on the >>> exploration >>> iommufd kernel[2], complete code of this series is available in[3]. As >>> iommufd kernel is in the early step (only iommufd generic interface >>> is in >>> mailing list), so this series hasn't made the iommufd backend fully >>> on par >>> with legacy backend w.r.t. features like p2p mappings, coherency >>> tracking, >> >> what does 'coherency tracking' mean here? if related to iommu enforce >> snoop it is fully handled by the kernel so far. I didn't find any use of >> VFIO_DMA_CC_IOMMU in current Qemu. > > It's the kvm_group add/del stuffs.perhaps say kvm_group add/del > equivalence > would be better? > >>> live migration, etc. This series hasn't supported PCI devices >>> without FLR >>> neither as the kernel doesn't support VFIO_DEVICE_PCI_HOT_RESET when >>> userspace >>> is using iommufd. The kernel needs to be updated to accept device fd >>> list for >>> reset when userspace is using iommufd. Related work is in progress by >>> Jason[4]. >>> >>> TODOs: >>> - Add DMA alias check for iommufd BE (group level) >>> - Make pci.c to be BE agnostic. Needs kernel change as well to fix the >>> VFIO_DEVICE_PCI_HOT_RESET gap >>> - Cleanup the VFIODevice fields as it's used in both BEs >>> - Add locks >>> - Replace list with g_tree >>> - More tests >>> >>> Patch Overview: >>> >>> - Preparation: >>> 0001-scripts-update-linux-headers-Add-iommufd.h.patch >>> 0002-linux-headers-Import-latest-vfio.h-and-iommufd.h.patch >>> 0003-hw-vfio-pci-fix-vfio_pci_hot_reset_result-trace-poin.patch >>> 0004-vfio-pci-Use-vbasedev-local-variable-in-vfio_realize.patch >>> 0005-vfio-common-Rename-VFIOGuestIOMMU-iommu-into- >>> iommu_m.patch >> >> 3-5 are pure cleanups which could be sent out separately > > yes. may send later after checking with Eric. :-) yes makes sense to send them separately. Thanks Eric > >>> 0006-vfio-common-Split-common.c-into-common.c-container.c.patch >>> >>> - Introduce container object and covert existing vfio to use it: >>> 0007-vfio-Add-base-object-for-VFIOContainer.patch >>> 0008-vfio-container-Introduce-vfio_attach-detach_device.patch >>> 0009-vfio-platform-Use-vfio_-attach-detach-_device.patch >>> 0010-vfio-ap-Use-vfio_-attach-detach-_device.patch >>> 0011-vfio-ccw-Use-vfio_-attach-detach-_device.patch >>> 0012-vfio-container-obj-Introduce-attach-detach-_device-c.patch >>> 0013-vfio-container-obj-Introduce-VFIOContainer-reset-cal.patch >>> >>> - Introduce iommufd based container: >>> 0014-hw-iommufd-Creation.patch >>> 0015-vfio-iommufd-Implement-iommufd-backend.patch >>> 0016-vfio-iommufd-Add-IOAS_COPY_DMA-support.patch >>> >>> - Add backend selection for vfio-pci: >>> 0017-vfio-as-Allow-the-selection-of-a-given-iommu-backend.patch >>> 0018-vfio-pci-Add-an-iommufd-option.patch >>> >>> [1] https://lore.kernel.org/kvm/0-v1-e79cd8d168e8+6- >>> iommufd_jgg@xxxxxxxxxx/ >>> [2] https://github.com/luxis1999/iommufd/tree/iommufd-v5.17-rc6 >>> [3] https://github.com/luxis1999/qemu/tree/qemu-for-5.17-rc6-vm-rfcv1 >>> [4] https://lore.kernel.org/kvm/0-v1-a8faf768d202+125dd- >>> vfio_mdev_no_group_jgg@xxxxxxxxxx/ >> >> Following is probably more relevant to [4]: >> >> https://lore.kernel.org/all/10-v1-33906a626da1+16b0-vfio_kvm_no_group_jgg@xxxxxxxxxx/ >> > > absolutely.:-) thanks. > >> Thanks >> Kevin >