We didn't close the open of how to get this merged in LPC due to the audio issue. Then let's use mails. Overall there are three options on the table: 1) Require vfio-compat to be 100% compatible with vfio-type1 Probably not a good choice given the amount of work to fix the remaining gaps. And this will block support of new IOMMU features for a longer time. 2) Leave vfio-compat as what it is in this series Treat it as a vehicle to validate the iommufd logic instead of immediately replacing vfio-type1. Functionally most vfio applications can work w/o change if putting aside the difference on locked mm accounting, p2p, etc. Then work on new features and 100% vfio-type1 compat. in parallel. 3) Focus on iommufd native uAPI first Require vfio_device cdev and adoption in Qemu. Only for new vfio app. Then work on new features and vfio-compat in parallel. I'm fine with either 2) or 3). Per a quick chat with Alex he prefers to 3). Jason, how about your opinion? Thanks Kevin > From: Jason Gunthorpe <jgg@xxxxxxxxxx> > Sent: Saturday, September 3, 2022 3:59 AM > > iommufd is the user API to control the IOMMU subsystem as it relates to > managing IO page tables that point at user space memory. > > It takes over from drivers/vfio/vfio_iommu_type1.c (aka the VFIO > container) which is the VFIO specific interface for a similar idea. > > We see a broad need for extended features, some being highly IOMMU > device > specific: > - Binding iommu_domain's to PASID/SSID > - Userspace page tables, for ARM, x86 and S390 > - Kernel bypass'd invalidation of user page tables > - Re-use of the KVM page table in the IOMMU > - Dirty page tracking in the IOMMU > - Runtime Increase/Decrease of IOPTE size > - PRI support with faults resolved in userspace > > As well as a need to access these features beyond just VFIO, from VDPA for > instance. Other classes of accelerator HW are touching on these areas now > too. > > The pre-v1 series proposed re-using the VFIO type 1 data structure, > however it was suggested that if we are doing this big update then we > should also come with an improved data structure that solves the > limitations that VFIO type1 has. Notably this addresses: > > - Multiple IOAS/'containers' and multiple domains inside a single FD > > - Single-pin operation no matter how many domains and containers use > a page > > - A fine grained locking scheme supporting user managed concurrency for > multi-threaded map/unmap > > - A pre-registration mechanism to optimize vIOMMU use cases by > pre-pinning pages > > - Extended ioctl API that can manage these new objects and exposes > domains directly to user space > > - domains are sharable between subsystems, eg VFIO and VDPA > > The bulk of this code is a new data structure design to track how the > IOVAs are mapped to PFNs. > > iommufd intends to be general and consumable by any driver that wants to > DMA to userspace. From a driver perspective it can largely be dropped in > in-place of iommu_attach_device() and provides a uniform full feature set > to all consumers. > > As this is a larger project this series is the first step. This series > provides the iommfd "generic interface" which is designed to be suitable > for applications like DPDK and VMM flows that are not optimized to > specific HW scenarios. It is close to being a drop in replacement for the > existing VFIO type 1. > > Several follow-on series are being prepared: > > - Patches integrating with qemu in native mode: > https://github.com/yiliu1765/qemu/commits/qemu-iommufd-6.0-rc2 > > - A completed integration with VFIO now exists that covers "emulated" mdev > use cases now, and can pass testing with qemu/etc in compatability mode: > https://github.com/jgunthorpe/linux/commits/vfio_iommufd > > - A draft providing system iommu dirty tracking on top of iommufd, > including iommu driver implementations: > https://github.com/jpemartins/linux/commits/x86-iommufd > > This pairs with patches for providing a similar API to support VFIO-device > tracking to give a complete vfio solution: > https://lore.kernel.org/kvm/20220901093853.60194-1-yishaih@xxxxxxxxxx/ > > - Userspace page tables aka 'nested translation' for ARM and Intel iommu > drivers: > https://github.com/nicolinc/iommufd/commits/iommufd_nesting > > - "device centric" vfio series to expose the vfio_device FD directly as a > normal cdev, and provide an extended API allowing dynamically changing > the IOAS binding: > https://github.com/yiliu1765/iommufd/commits/iommufd-v6.0-rc2- > nesting-0901 > > - Drafts for PASID and PRI interfaces are included above as well > > Overall enough work is done now to show the merit of the new API design > and at least draft solutions to many of the main problems. > > Several people have contributed directly to this work: Eric Auger, Joao > Martins, Kevin Tian, Lu Baolu, Nicolin Chen, Yi L Liu. Many more have > participated in the discussions that lead here, and provided ideas. Thanks > to all! > > The v1 iommufd series has been used to guide a large amount of preparatory > work that has now been merged. The general theme is to organize things in > a way that makes injecting iommufd natural: > > - VFIO live migration support with mlx5 and hisi_acc drivers. > These series need a dirty tracking solution to be really usable. > https://lore.kernel.org/kvm/20220224142024.147653-1- > yishaih@xxxxxxxxxx/ > https://lore.kernel.org/kvm/20220308184902.2242-1- > shameerali.kolothum.thodi@xxxxxxxxxx/ > > - Significantly rework the VFIO gvt mdev and remove struct > mdev_parent_ops > https://lore.kernel.org/lkml/20220411141403.86980-1-hch@xxxxxx/ > > - Rework how PCIe no-snoop blocking works > https://lore.kernel.org/kvm/0-v3-2cf356649677+a32- > intel_no_snoop_jgg@xxxxxxxxxx/ > > - Consolidate dma ownership into the iommu core code > https://lore.kernel.org/linux-iommu/20220418005000.897664-1- > baolu.lu@xxxxxxxxxxxxxxx/ > > - Make all vfio driver interfaces use struct vfio_device consistently > https://lore.kernel.org/kvm/0-v4-8045e76bf00b+13d- > vfio_mdev_no_group_jgg@xxxxxxxxxx/ > > - Remove the vfio_group from the kvm/vfio interface > https://lore.kernel.org/kvm/0-v3-f7729924a7ea+25e33- > vfio_kvm_no_group_jgg@xxxxxxxxxx/ > > - Simplify locking in vfio > https://lore.kernel.org/kvm/0-v2-d035a1842d81+1bf- > vfio_group_locking_jgg@xxxxxxxxxx/ > > - Remove the vfio notifiter scheme that faces drivers > https://lore.kernel.org/kvm/0-v4-681e038e30fd+78- > vfio_unmap_notif_jgg@xxxxxxxxxx/ > > - Improve the driver facing API for vfio pin/unpin pages to make the > presence of struct page clear > https://lore.kernel.org/kvm/20220723020256.30081-1- > nicolinc@xxxxxxxxxx/ > > - Clean up in the Intel IOMMU driver > https://lore.kernel.org/linux-iommu/20220301020159.633356-1- > baolu.lu@xxxxxxxxxxxxxxx/ > https://lore.kernel.org/linux-iommu/20220510023407.2759143-1- > baolu.lu@xxxxxxxxxxxxxxx/ > https://lore.kernel.org/linux-iommu/20220514014322.2927339-1- > baolu.lu@xxxxxxxxxxxxxxx/ > https://lore.kernel.org/linux-iommu/20220706025524.2904370-1- > baolu.lu@xxxxxxxxxxxxxxx/ > https://lore.kernel.org/linux-iommu/20220702015610.2849494-1- > baolu.lu@xxxxxxxxxxxxxxx/ > > - Rework s390 vfio drivers > https://lore.kernel.org/kvm/20220707135737.720765-1- > farman@xxxxxxxxxxxxx/ > > - Normalize vfio ioctl handling > https://lore.kernel.org/kvm/0-v2-0f9e632d54fb+d6- > vfio_ioctl_split_jgg@xxxxxxxxxx/ > > This is about 168 patches applied since March, thank you to everyone > involved in all this work! > > Currently there are a number of supporting series still in progress: > - Simplify and consolidate iommu_domain/device compatability checking > https://lore.kernel.org/linux-iommu/20220815181437.28127-1- > nicolinc@xxxxxxxxxx/ > > - Align iommu SVA support with the domain-centric model > https://lore.kernel.org/linux-iommu/20220826121141.50743-1- > baolu.lu@xxxxxxxxxxxxxxx/ > > - VFIO API for dirty tracking (aka dma logging) managed inside a PCI > device, with mlx5 implementation > https://lore.kernel.org/kvm/20220901093853.60194-1-yishaih@xxxxxxxxxx > > - Introduce a struct device sysfs presence for struct vfio_device > https://lore.kernel.org/kvm/20220901143747.32858-1- > kevin.tian@xxxxxxxxx/ > > - Complete restructuring the vfio mdev model > https://lore.kernel.org/kvm/20220822062208.152745-1-hch@xxxxxx/ > > - DMABUF exporter support for VFIO to allow PCI P2P with VFIO > https://lore.kernel.org/r/0-v2-472615b3877e+28f7- > vfio_dma_buf_jgg@xxxxxxxxxx > > - Isolate VFIO container code in preperation for iommufd to provide an > alternative implementation of it all > https://lore.kernel.org/kvm/0-v1-a805b607f1fb+17b- > vfio_container_split_jgg@xxxxxxxxxx > > - Start to provide iommu_domain ops for power > https://lore.kernel.org/all/20220714081822.3717693-1-aik@xxxxxxxxx/ > > Right now there is no more preperatory work sketched out, so this is the > last of it. > > This series remains RFC as there are still several important FIXME's to > deal with first, but things are on track for non-RFC in the near future. > > This is on github: https://github.com/jgunthorpe/linux/commits/iommufd > > v2: > - Rebase to v6.0-rc3 > - Improve comments > - Change to an iterative destruction approach to avoid cycles > - Near rewrite of the vfio facing implementation, supported by a complete > implementation on the vfio side > - New IOMMU_IOAS_ALLOW_IOVAS API as discussed. Allows userspace to > assert that ranges of IOVA must always be mappable. To be used by a > VMM > that has promised a guest a certain availability of IOVA. May help > guide PPC's multi-window implementation. > - Rework how unmap_iova works, user can unmap the whole ioas now > - The no-snoop / wbinvd support is implemented > - Bug fixes > - Test suite improvements > - Lots of smaller changes (the interdiff is 3k lines) > v1: https://lore.kernel.org/r/0-v1-e79cd8d168e8+6- > iommufd_jgg@xxxxxxxxxx > > # S390 in-kernel page table walker > Cc: Niklas Schnelle <schnelle@xxxxxxxxxxxxx> > Cc: Matthew Rosato <mjrosato@xxxxxxxxxxxxx> > # AMD Dirty page tracking > Cc: Joao Martins <joao.m.martins@xxxxxxxxxx> > # ARM SMMU Dirty page tracking > Cc: Keqian Zhu <zhukeqian1@xxxxxxxxxx> > Cc: Shameerali Kolothum Thodi <shameerali.kolothum.thodi@xxxxxxxxxx> > # ARM SMMU nesting > Cc: Eric Auger <eric.auger@xxxxxxxxxx> > Cc: Jean-Philippe Brucker <jean-philippe@xxxxxxxxxx> > # Map/unmap performance > Cc: Daniel Jordan <daniel.m.jordan@xxxxxxxxxx> > # VDPA > Cc: "Michael S. Tsirkin" <mst@xxxxxxxxxx> > Cc: Jason Wang <jasowang@xxxxxxxxxx> > # Power > Cc: David Gibson <david@xxxxxxxxxxxxxxxxxxxxx> > # vfio > Cc: Alex Williamson <alex.williamson@xxxxxxxxxx> > Cc: Cornelia Huck <cohuck@xxxxxxxxxx> > Cc: kvm@xxxxxxxxxxxxxxx > # iommu > Cc: iommu@xxxxxxxxxxxxxxx > # Collaborators > Cc: "Chaitanya Kulkarni" <chaitanyak@xxxxxxxxxx> > Cc: Nicolin Chen <nicolinc@xxxxxxxxxx> > Cc: Lu Baolu <baolu.lu@xxxxxxxxxxxxxxx> > Cc: Kevin Tian <kevin.tian@xxxxxxxxx> > Cc: Yi Liu <yi.l.liu@xxxxxxxxx> > # s390 > Cc: Eric Farman <farman@xxxxxxxxxxxxx> > Signed-off-by: Jason Gunthorpe <jgg@xxxxxxxxxx> > > Jason Gunthorpe (12): > interval-tree: Add a utility to iterate over spans in an interval tree > iommufd: File descriptor, context, kconfig and makefiles > kernel/user: Allow user::locked_vm to be usable for iommufd > iommufd: PFN handling for iopt_pages > iommufd: Algorithms for PFN storage > iommufd: Data structure to provide IOVA to PFN mapping > iommufd: IOCTLs for the io_pagetable > iommufd: Add a HW pagetable object > iommufd: Add kAPI toward external drivers for physical devices > iommufd: Add kAPI toward external drivers for kernel access > iommufd: vfio container FD ioctl compatibility > iommufd: Add a selftest > > Kevin Tian (1): > iommufd: Overview documentation > > .clang-format | 1 + > Documentation/userspace-api/index.rst | 1 + > .../userspace-api/ioctl/ioctl-number.rst | 1 + > Documentation/userspace-api/iommufd.rst | 224 +++ > MAINTAINERS | 10 + > drivers/iommu/Kconfig | 1 + > drivers/iommu/Makefile | 2 +- > drivers/iommu/iommufd/Kconfig | 22 + > drivers/iommu/iommufd/Makefile | 13 + > drivers/iommu/iommufd/device.c | 580 +++++++ > drivers/iommu/iommufd/hw_pagetable.c | 68 + > drivers/iommu/iommufd/io_pagetable.c | 984 ++++++++++++ > drivers/iommu/iommufd/io_pagetable.h | 186 +++ > drivers/iommu/iommufd/ioas.c | 338 ++++ > drivers/iommu/iommufd/iommufd_private.h | 266 ++++ > drivers/iommu/iommufd/iommufd_test.h | 74 + > drivers/iommu/iommufd/main.c | 392 +++++ > drivers/iommu/iommufd/pages.c | 1301 +++++++++++++++ > drivers/iommu/iommufd/selftest.c | 626 ++++++++ > drivers/iommu/iommufd/vfio_compat.c | 423 +++++ > include/linux/interval_tree.h | 47 + > include/linux/iommufd.h | 101 ++ > include/linux/sched/user.h | 2 +- > include/uapi/linux/iommufd.h | 279 ++++ > kernel/user.c | 1 + > lib/interval_tree.c | 98 ++ > tools/testing/selftests/Makefile | 1 + > tools/testing/selftests/iommu/.gitignore | 2 + > tools/testing/selftests/iommu/Makefile | 11 + > tools/testing/selftests/iommu/config | 2 + > tools/testing/selftests/iommu/iommufd.c | 1396 +++++++++++++++++ > 31 files changed, 7451 insertions(+), 2 deletions(-) > create mode 100644 Documentation/userspace-api/iommufd.rst > create mode 100644 drivers/iommu/iommufd/Kconfig > create mode 100644 drivers/iommu/iommufd/Makefile > create mode 100644 drivers/iommu/iommufd/device.c > create mode 100644 drivers/iommu/iommufd/hw_pagetable.c > create mode 100644 drivers/iommu/iommufd/io_pagetable.c > create mode 100644 drivers/iommu/iommufd/io_pagetable.h > create mode 100644 drivers/iommu/iommufd/ioas.c > create mode 100644 drivers/iommu/iommufd/iommufd_private.h > create mode 100644 drivers/iommu/iommufd/iommufd_test.h > create mode 100644 drivers/iommu/iommufd/main.c > create mode 100644 drivers/iommu/iommufd/pages.c > create mode 100644 drivers/iommu/iommufd/selftest.c > create mode 100644 drivers/iommu/iommufd/vfio_compat.c > create mode 100644 include/linux/iommufd.h > create mode 100644 include/uapi/linux/iommufd.h > create mode 100644 tools/testing/selftests/iommu/.gitignore > create mode 100644 tools/testing/selftests/iommu/Makefile > create mode 100644 tools/testing/selftests/iommu/config > create mode 100644 tools/testing/selftests/iommu/iommufd.c > > > base-commit: b90cb1053190353cc30f0fef0ef1f378ccc063c5 > -- > 2.37.3