Hi, On 9/13/22 03:55, Tian, Kevin wrote: > We didn't close the open of how to get this merged in LPC due to the > audio issue. Then let's use mails. > > Overall there are three options on the table: > > 1) Require vfio-compat to be 100% compatible with vfio-type1 > > Probably not a good choice given the amount of work to fix the remaining > gaps. And this will block support of new IOMMU features for a longer time. > > 2) Leave vfio-compat as what it is in this series > > Treat it as a vehicle to validate the iommufd logic instead of immediately > replacing vfio-type1. Functionally most vfio applications can work w/o > change if putting aside the difference on locked mm accounting, p2p, etc. > > Then work on new features and 100% vfio-type1 compat. in parallel. > > 3) Focus on iommufd native uAPI first > > Require vfio_device cdev and adoption in Qemu. Only for new vfio app. > > Then work on new features and vfio-compat in parallel. > > I'm fine with either 2) or 3). Per a quick chat with Alex he prefers to 3). I am also inclined to pursue 3) as this was the initial Jason's guidance and pre-requisite to integrate new features. In the past we concluded vfio-compat would mostly be used for testing purpose. Our QEMU integration fully is based on device based API. Thanks Eric > > Jason, how about your opinion? > > Thanks > Kevin > >> From: Jason Gunthorpe <jgg@xxxxxxxxxx> >> Sent: Saturday, September 3, 2022 3:59 AM >> >> iommufd is the user API to control the IOMMU subsystem as it relates to >> managing IO page tables that point at user space memory. >> >> It takes over from drivers/vfio/vfio_iommu_type1.c (aka the VFIO >> container) which is the VFIO specific interface for a similar idea. >> >> We see a broad need for extended features, some being highly IOMMU >> device >> specific: >> - Binding iommu_domain's to PASID/SSID >> - Userspace page tables, for ARM, x86 and S390 >> - Kernel bypass'd invalidation of user page tables >> - Re-use of the KVM page table in the IOMMU >> - Dirty page tracking in the IOMMU >> - Runtime Increase/Decrease of IOPTE size >> - PRI support with faults resolved in userspace >> >> As well as a need to access these features beyond just VFIO, from VDPA for >> instance. Other classes of accelerator HW are touching on these areas now >> too. >> >> The pre-v1 series proposed re-using the VFIO type 1 data structure, >> however it was suggested that if we are doing this big update then we >> should also come with an improved data structure that solves the >> limitations that VFIO type1 has. Notably this addresses: >> >> - Multiple IOAS/'containers' and multiple domains inside a single FD >> >> - Single-pin operation no matter how many domains and containers use >> a page >> >> - A fine grained locking scheme supporting user managed concurrency for >> multi-threaded map/unmap >> >> - A pre-registration mechanism to optimize vIOMMU use cases by >> pre-pinning pages >> >> - Extended ioctl API that can manage these new objects and exposes >> domains directly to user space >> >> - domains are sharable between subsystems, eg VFIO and VDPA >> >> The bulk of this code is a new data structure design to track how the >> IOVAs are mapped to PFNs. >> >> iommufd intends to be general and consumable by any driver that wants to >> DMA to userspace. From a driver perspective it can largely be dropped in >> in-place of iommu_attach_device() and provides a uniform full feature set >> to all consumers. >> >> As this is a larger project this series is the first step. This series >> provides the iommfd "generic interface" which is designed to be suitable >> for applications like DPDK and VMM flows that are not optimized to >> specific HW scenarios. It is close to being a drop in replacement for the >> existing VFIO type 1. >> >> Several follow-on series are being prepared: >> >> - Patches integrating with qemu in native mode: >> https://github.com/yiliu1765/qemu/commits/qemu-iommufd-6.0-rc2 >> >> - A completed integration with VFIO now exists that covers "emulated" mdev >> use cases now, and can pass testing with qemu/etc in compatability mode: >> https://github.com/jgunthorpe/linux/commits/vfio_iommufd >> >> - A draft providing system iommu dirty tracking on top of iommufd, >> including iommu driver implementations: >> https://github.com/jpemartins/linux/commits/x86-iommufd >> >> This pairs with patches for providing a similar API to support VFIO-device >> tracking to give a complete vfio solution: >> https://lore.kernel.org/kvm/20220901093853.60194-1-yishaih@xxxxxxxxxx/ >> >> - Userspace page tables aka 'nested translation' for ARM and Intel iommu >> drivers: >> https://github.com/nicolinc/iommufd/commits/iommufd_nesting >> >> - "device centric" vfio series to expose the vfio_device FD directly as a >> normal cdev, and provide an extended API allowing dynamically changing >> the IOAS binding: >> https://github.com/yiliu1765/iommufd/commits/iommufd-v6.0-rc2- >> nesting-0901 >> >> - Drafts for PASID and PRI interfaces are included above as well >> >> Overall enough work is done now to show the merit of the new API design >> and at least draft solutions to many of the main problems. >> >> Several people have contributed directly to this work: Eric Auger, Joao >> Martins, Kevin Tian, Lu Baolu, Nicolin Chen, Yi L Liu. Many more have >> participated in the discussions that lead here, and provided ideas. Thanks >> to all! >> >> The v1 iommufd series has been used to guide a large amount of preparatory >> work that has now been merged. The general theme is to organize things in >> a way that makes injecting iommufd natural: >> >> - VFIO live migration support with mlx5 and hisi_acc drivers. >> These series need a dirty tracking solution to be really usable. >> https://lore.kernel.org/kvm/20220224142024.147653-1- >> yishaih@xxxxxxxxxx/ >> https://lore.kernel.org/kvm/20220308184902.2242-1- >> shameerali.kolothum.thodi@xxxxxxxxxx/ >> >> - Significantly rework the VFIO gvt mdev and remove struct >> mdev_parent_ops >> https://lore.kernel.org/lkml/20220411141403.86980-1-hch@xxxxxx/ >> >> - Rework how PCIe no-snoop blocking works >> https://lore.kernel.org/kvm/0-v3-2cf356649677+a32- >> intel_no_snoop_jgg@xxxxxxxxxx/ >> >> - Consolidate dma ownership into the iommu core code >> https://lore.kernel.org/linux-iommu/20220418005000.897664-1- >> baolu.lu@xxxxxxxxxxxxxxx/ >> >> - Make all vfio driver interfaces use struct vfio_device consistently >> https://lore.kernel.org/kvm/0-v4-8045e76bf00b+13d- >> vfio_mdev_no_group_jgg@xxxxxxxxxx/ >> >> - Remove the vfio_group from the kvm/vfio interface >> https://lore.kernel.org/kvm/0-v3-f7729924a7ea+25e33- >> vfio_kvm_no_group_jgg@xxxxxxxxxx/ >> >> - Simplify locking in vfio >> https://lore.kernel.org/kvm/0-v2-d035a1842d81+1bf- >> vfio_group_locking_jgg@xxxxxxxxxx/ >> >> - Remove the vfio notifiter scheme that faces drivers >> https://lore.kernel.org/kvm/0-v4-681e038e30fd+78- >> vfio_unmap_notif_jgg@xxxxxxxxxx/ >> >> - Improve the driver facing API for vfio pin/unpin pages to make the >> presence of struct page clear >> https://lore.kernel.org/kvm/20220723020256.30081-1- >> nicolinc@xxxxxxxxxx/ >> >> - Clean up in the Intel IOMMU driver >> https://lore.kernel.org/linux-iommu/20220301020159.633356-1- >> baolu.lu@xxxxxxxxxxxxxxx/ >> https://lore.kernel.org/linux-iommu/20220510023407.2759143-1- >> baolu.lu@xxxxxxxxxxxxxxx/ >> https://lore.kernel.org/linux-iommu/20220514014322.2927339-1- >> baolu.lu@xxxxxxxxxxxxxxx/ >> https://lore.kernel.org/linux-iommu/20220706025524.2904370-1- >> baolu.lu@xxxxxxxxxxxxxxx/ >> https://lore.kernel.org/linux-iommu/20220702015610.2849494-1- >> baolu.lu@xxxxxxxxxxxxxxx/ >> >> - Rework s390 vfio drivers >> https://lore.kernel.org/kvm/20220707135737.720765-1- >> farman@xxxxxxxxxxxxx/ >> >> - Normalize vfio ioctl handling >> https://lore.kernel.org/kvm/0-v2-0f9e632d54fb+d6- >> vfio_ioctl_split_jgg@xxxxxxxxxx/ >> >> This is about 168 patches applied since March, thank you to everyone >> involved in all this work! >> >> Currently there are a number of supporting series still in progress: >> - Simplify and consolidate iommu_domain/device compatability checking >> https://lore.kernel.org/linux-iommu/20220815181437.28127-1- >> nicolinc@xxxxxxxxxx/ >> >> - Align iommu SVA support with the domain-centric model >> https://lore.kernel.org/linux-iommu/20220826121141.50743-1- >> baolu.lu@xxxxxxxxxxxxxxx/ >> >> - VFIO API for dirty tracking (aka dma logging) managed inside a PCI >> device, with mlx5 implementation >> https://lore.kernel.org/kvm/20220901093853.60194-1-yishaih@xxxxxxxxxx >> >> - Introduce a struct device sysfs presence for struct vfio_device >> https://lore.kernel.org/kvm/20220901143747.32858-1- >> kevin.tian@xxxxxxxxx/ >> >> - Complete restructuring the vfio mdev model >> https://lore.kernel.org/kvm/20220822062208.152745-1-hch@xxxxxx/ >> >> - DMABUF exporter support for VFIO to allow PCI P2P with VFIO >> https://lore.kernel.org/r/0-v2-472615b3877e+28f7- >> vfio_dma_buf_jgg@xxxxxxxxxx >> >> - Isolate VFIO container code in preperation for iommufd to provide an >> alternative implementation of it all >> https://lore.kernel.org/kvm/0-v1-a805b607f1fb+17b- >> vfio_container_split_jgg@xxxxxxxxxx >> >> - Start to provide iommu_domain ops for power >> https://lore.kernel.org/all/20220714081822.3717693-1-aik@xxxxxxxxx/ >> >> Right now there is no more preperatory work sketched out, so this is the >> last of it. >> >> This series remains RFC as there are still several important FIXME's to >> deal with first, but things are on track for non-RFC in the near future. >> >> This is on github: https://github.com/jgunthorpe/linux/commits/iommufd >> >> v2: >> - Rebase to v6.0-rc3 >> - Improve comments >> - Change to an iterative destruction approach to avoid cycles >> - Near rewrite of the vfio facing implementation, supported by a complete >> implementation on the vfio side >> - New IOMMU_IOAS_ALLOW_IOVAS API as discussed. Allows userspace to >> assert that ranges of IOVA must always be mappable. To be used by a >> VMM >> that has promised a guest a certain availability of IOVA. May help >> guide PPC's multi-window implementation. >> - Rework how unmap_iova works, user can unmap the whole ioas now >> - The no-snoop / wbinvd support is implemented >> - Bug fixes >> - Test suite improvements >> - Lots of smaller changes (the interdiff is 3k lines) >> v1: https://lore.kernel.org/r/0-v1-e79cd8d168e8+6- >> iommufd_jgg@xxxxxxxxxx >> >> # S390 in-kernel page table walker >> Cc: Niklas Schnelle <schnelle@xxxxxxxxxxxxx> >> Cc: Matthew Rosato <mjrosato@xxxxxxxxxxxxx> >> # AMD Dirty page tracking >> Cc: Joao Martins <joao.m.martins@xxxxxxxxxx> >> # ARM SMMU Dirty page tracking >> Cc: Keqian Zhu <zhukeqian1@xxxxxxxxxx> >> Cc: Shameerali Kolothum Thodi <shameerali.kolothum.thodi@xxxxxxxxxx> >> # ARM SMMU nesting >> Cc: Eric Auger <eric.auger@xxxxxxxxxx> >> Cc: Jean-Philippe Brucker <jean-philippe@xxxxxxxxxx> >> # Map/unmap performance >> Cc: Daniel Jordan <daniel.m.jordan@xxxxxxxxxx> >> # VDPA >> Cc: "Michael S. Tsirkin" <mst@xxxxxxxxxx> >> Cc: Jason Wang <jasowang@xxxxxxxxxx> >> # Power >> Cc: David Gibson <david@xxxxxxxxxxxxxxxxxxxxx> >> # vfio >> Cc: Alex Williamson <alex.williamson@xxxxxxxxxx> >> Cc: Cornelia Huck <cohuck@xxxxxxxxxx> >> Cc: kvm@xxxxxxxxxxxxxxx >> # iommu >> Cc: iommu@xxxxxxxxxxxxxxx >> # Collaborators >> Cc: "Chaitanya Kulkarni" <chaitanyak@xxxxxxxxxx> >> Cc: Nicolin Chen <nicolinc@xxxxxxxxxx> >> Cc: Lu Baolu <baolu.lu@xxxxxxxxxxxxxxx> >> Cc: Kevin Tian <kevin.tian@xxxxxxxxx> >> Cc: Yi Liu <yi.l.liu@xxxxxxxxx> >> # s390 >> Cc: Eric Farman <farman@xxxxxxxxxxxxx> >> Signed-off-by: Jason Gunthorpe <jgg@xxxxxxxxxx> >> >> Jason Gunthorpe (12): >> interval-tree: Add a utility to iterate over spans in an interval tree >> iommufd: File descriptor, context, kconfig and makefiles >> kernel/user: Allow user::locked_vm to be usable for iommufd >> iommufd: PFN handling for iopt_pages >> iommufd: Algorithms for PFN storage >> iommufd: Data structure to provide IOVA to PFN mapping >> iommufd: IOCTLs for the io_pagetable >> iommufd: Add a HW pagetable object >> iommufd: Add kAPI toward external drivers for physical devices >> iommufd: Add kAPI toward external drivers for kernel access >> iommufd: vfio container FD ioctl compatibility >> iommufd: Add a selftest >> >> Kevin Tian (1): >> iommufd: Overview documentation >> >> .clang-format | 1 + >> Documentation/userspace-api/index.rst | 1 + >> .../userspace-api/ioctl/ioctl-number.rst | 1 + >> Documentation/userspace-api/iommufd.rst | 224 +++ >> MAINTAINERS | 10 + >> drivers/iommu/Kconfig | 1 + >> drivers/iommu/Makefile | 2 +- >> drivers/iommu/iommufd/Kconfig | 22 + >> drivers/iommu/iommufd/Makefile | 13 + >> drivers/iommu/iommufd/device.c | 580 +++++++ >> drivers/iommu/iommufd/hw_pagetable.c | 68 + >> drivers/iommu/iommufd/io_pagetable.c | 984 ++++++++++++ >> drivers/iommu/iommufd/io_pagetable.h | 186 +++ >> drivers/iommu/iommufd/ioas.c | 338 ++++ >> drivers/iommu/iommufd/iommufd_private.h | 266 ++++ >> drivers/iommu/iommufd/iommufd_test.h | 74 + >> drivers/iommu/iommufd/main.c | 392 +++++ >> drivers/iommu/iommufd/pages.c | 1301 +++++++++++++++ >> drivers/iommu/iommufd/selftest.c | 626 ++++++++ >> drivers/iommu/iommufd/vfio_compat.c | 423 +++++ >> include/linux/interval_tree.h | 47 + >> include/linux/iommufd.h | 101 ++ >> include/linux/sched/user.h | 2 +- >> include/uapi/linux/iommufd.h | 279 ++++ >> kernel/user.c | 1 + >> lib/interval_tree.c | 98 ++ >> tools/testing/selftests/Makefile | 1 + >> tools/testing/selftests/iommu/.gitignore | 2 + >> tools/testing/selftests/iommu/Makefile | 11 + >> tools/testing/selftests/iommu/config | 2 + >> tools/testing/selftests/iommu/iommufd.c | 1396 +++++++++++++++++ >> 31 files changed, 7451 insertions(+), 2 deletions(-) >> create mode 100644 Documentation/userspace-api/iommufd.rst >> create mode 100644 drivers/iommu/iommufd/Kconfig >> create mode 100644 drivers/iommu/iommufd/Makefile >> create mode 100644 drivers/iommu/iommufd/device.c >> create mode 100644 drivers/iommu/iommufd/hw_pagetable.c >> create mode 100644 drivers/iommu/iommufd/io_pagetable.c >> create mode 100644 drivers/iommu/iommufd/io_pagetable.h >> create mode 100644 drivers/iommu/iommufd/ioas.c >> create mode 100644 drivers/iommu/iommufd/iommufd_private.h >> create mode 100644 drivers/iommu/iommufd/iommufd_test.h >> create mode 100644 drivers/iommu/iommufd/main.c >> create mode 100644 drivers/iommu/iommufd/pages.c >> create mode 100644 drivers/iommu/iommufd/selftest.c >> create mode 100644 drivers/iommu/iommufd/vfio_compat.c >> create mode 100644 include/linux/iommufd.h >> create mode 100644 include/uapi/linux/iommufd.h >> create mode 100644 tools/testing/selftests/iommu/.gitignore >> create mode 100644 tools/testing/selftests/iommu/Makefile >> create mode 100644 tools/testing/selftests/iommu/config >> create mode 100644 tools/testing/selftests/iommu/iommufd.c >> >> >> base-commit: b90cb1053190353cc30f0fef0ef1f378ccc063c5 >> -- >> 2.37.3