Hi, On 3/18/22 6:27 PM, Jason Gunthorpe wrote: > iommufd is the user API to control the IOMMU subsystem as it relates to > managing IO page tables that point at user space memory. > > It takes over from drivers/vfio/vfio_iommu_type1.c (aka the VFIO > container) which is the VFIO specific interface for a similar idea. > > We see a broad need for extended features, some being highly IOMMU device > specific: > - Binding iommu_domain's to PASID/SSID > - Userspace page tables, for ARM, x86 and S390 > - Kernel bypass'd invalidation of user page tables > - Re-use of the KVM page table in the IOMMU > - Dirty page tracking in the IOMMU > - Runtime Increase/Decrease of IOPTE size > - PRI support with faults resolved in userspace This series does not have any concept of group fds anymore and the API is device oriented. I have a question wrt pci bus reset capability. 8b27ee60bfd6 ("vfio-pci: PCI hot reset interface") introduced VFIO_DEVICE_PCI_GET_HOT_RESET_INFO and VFIO_DEVICE_PCI_HOT_RESET Maybe we can reuse VFIO_DEVICE_GET_PCI_HOT_RESET_INFO to retrieve the devices and iommu groups that need to be checked and involved in the bus reset. If I understand correctly we now need to make sure the devices are handled in the same security context (bound to the same iommufd) however VFIO_DEVICE_PCI_HOT_RESET operate on a collection of group fds. How do you see the porting of this functionality onto /dev/iommu? Thanks Eric > > As well as a need to access these features beyond just VFIO, VDPA for > instance, but other classes of accelerator HW are touching on these areas > now too. > > The v1 series proposed re-using the VFIO type 1 data structure, however it > was suggested that if we are doing this big update then we should also > come with a data structure that solves the limitations that VFIO type1 > has. Notably this addresses: > > - Multiple IOAS/'containers' and multiple domains inside a single FD > > - Single-pin operation no matter how many domains and containers use > a page > > - A fine grained locking scheme supporting user managed concurrency for > multi-threaded map/unmap > > - A pre-registration mechanism to optimize vIOMMU use cases by > pre-pinning pages > > - Extended ioctl API that can manage these new objects and exposes > domains directly to user space > > - domains are sharable between subsystems, eg VFIO and VDPA > > The bulk of this code is a new data structure design to track how the > IOVAs are mapped to PFNs. > > iommufd intends to be general and consumable by any driver that wants to > DMA to userspace. From a driver perspective it can largely be dropped in > in-place of iommu_attach_device() and provides a uniform full feature set > to all consumers. > > As this is a larger project this series is the first step. This series > provides the iommfd "generic interface" which is designed to be suitable > for applications like DPDK and VMM flows that are not optimized to > specific HW scenarios. It is close to being a drop in replacement for the > existing VFIO type 1. > > This is part two of three for an initial sequence: > - Move IOMMU Group security into the iommu layer > https://lore.kernel.org/linux-iommu/20220218005521.172832-1-baolu.lu@xxxxxxxxxxxxxxx/ > * Generic IOMMUFD implementation > - VFIO ability to consume IOMMUFD > An early exploration of this is available here: > https://github.com/luxis1999/iommufd/commits/iommufd-v5.17-rc6 > > Various parts of the above extended features are in WIP stages currently > to define how their IOCTL interface should work. > > At this point, using the draft VFIO series, unmodified qemu has been > tested to operate using iommufd on x86 and ARM systems. > > Several people have contributed directly to this work: Eric Auger, Kevin > Tian, Lu Baolu, Nicolin Chen, Yi L Liu. Many more have participated in the > discussions that lead here, and provided ideas. Thanks to all! > > This is on github: https://github.com/jgunthorpe/linux/commits/iommufd > > # S390 in-kernel page table walker > Cc: Niklas Schnelle <schnelle@xxxxxxxxxxxxx> > Cc: Matthew Rosato <mjrosato@xxxxxxxxxxxxx> > # AMD Dirty page tracking > Cc: Joao Martins <joao.m.martins@xxxxxxxxxx> > # ARM SMMU Dirty page tracking > Cc: Keqian Zhu <zhukeqian1@xxxxxxxxxx> > Cc: Shameerali Kolothum Thodi <shameerali.kolothum.thodi@xxxxxxxxxx> > # ARM SMMU nesting > Cc: Eric Auger <eric.auger@xxxxxxxxxx> > Cc: Jean-Philippe Brucker <jean-philippe@xxxxxxxxxx> > # Map/unmap performance > Cc: Daniel Jordan <daniel.m.jordan@xxxxxxxxxx> > # VDPA > Cc: "Michael S. Tsirkin" <mst@xxxxxxxxxx> > Cc: Jason Wang <jasowang@xxxxxxxxxx> > # Power > Cc: David Gibson <david@xxxxxxxxxxxxxxxxxxxxx> > # vfio > Cc: Alex Williamson <alex.williamson@xxxxxxxxxx> > Cc: Cornelia Huck <cohuck@xxxxxxxxxx> > Cc: kvm@xxxxxxxxxxxxxxx > # iommu > Cc: iommu@xxxxxxxxxxxxxxxxxxxxxxxxxx > # Collaborators > Cc: "Chaitanya Kulkarni" <chaitanyak@xxxxxxxxxx> > Cc: Nicolin Chen <nicolinc@xxxxxxxxxx> > Cc: Lu Baolu <baolu.lu@xxxxxxxxxxxxxxx> > Cc: Kevin Tian <kevin.tian@xxxxxxxxx> > Cc: Yi Liu <yi.l.liu@xxxxxxxxx> > Signed-off-by: Jason Gunthorpe <jgg@xxxxxxxxxx> > > Jason Gunthorpe (11): > interval-tree: Add a utility to iterate over spans in an interval tree > iommufd: File descriptor, context, kconfig and makefiles > kernel/user: Allow user::locked_vm to be usable for iommufd > iommufd: PFN handling for iopt_pages > iommufd: Algorithms for PFN storage > iommufd: Data structure to provide IOVA to PFN mapping > iommufd: IOCTLs for the io_pagetable > iommufd: Add a HW pagetable object > iommufd: Add kAPI toward external drivers > iommufd: vfio container FD ioctl compatibility > iommufd: Add a selftest > > Kevin Tian (1): > iommufd: Overview documentation > > Documentation/userspace-api/index.rst | 1 + > .../userspace-api/ioctl/ioctl-number.rst | 1 + > Documentation/userspace-api/iommufd.rst | 224 +++ > MAINTAINERS | 10 + > drivers/iommu/Kconfig | 1 + > drivers/iommu/Makefile | 2 +- > drivers/iommu/iommufd/Kconfig | 22 + > drivers/iommu/iommufd/Makefile | 13 + > drivers/iommu/iommufd/device.c | 274 ++++ > drivers/iommu/iommufd/hw_pagetable.c | 142 ++ > drivers/iommu/iommufd/io_pagetable.c | 890 +++++++++++ > drivers/iommu/iommufd/io_pagetable.h | 170 +++ > drivers/iommu/iommufd/ioas.c | 252 ++++ > drivers/iommu/iommufd/iommufd_private.h | 231 +++ > drivers/iommu/iommufd/iommufd_test.h | 65 + > drivers/iommu/iommufd/main.c | 346 +++++ > drivers/iommu/iommufd/pages.c | 1321 +++++++++++++++++ > drivers/iommu/iommufd/selftest.c | 495 ++++++ > drivers/iommu/iommufd/vfio_compat.c | 401 +++++ > include/linux/interval_tree.h | 41 + > include/linux/iommufd.h | 50 + > include/linux/sched/user.h | 2 +- > include/uapi/linux/iommufd.h | 223 +++ > kernel/user.c | 1 + > lib/interval_tree.c | 98 ++ > tools/testing/selftests/Makefile | 1 + > tools/testing/selftests/iommu/.gitignore | 2 + > tools/testing/selftests/iommu/Makefile | 11 + > tools/testing/selftests/iommu/config | 2 + > tools/testing/selftests/iommu/iommufd.c | 1225 +++++++++++++++ > 30 files changed, 6515 insertions(+), 2 deletions(-) > create mode 100644 Documentation/userspace-api/iommufd.rst > create mode 100644 drivers/iommu/iommufd/Kconfig > create mode 100644 drivers/iommu/iommufd/Makefile > create mode 100644 drivers/iommu/iommufd/device.c > create mode 100644 drivers/iommu/iommufd/hw_pagetable.c > create mode 100644 drivers/iommu/iommufd/io_pagetable.c > create mode 100644 drivers/iommu/iommufd/io_pagetable.h > create mode 100644 drivers/iommu/iommufd/ioas.c > create mode 100644 drivers/iommu/iommufd/iommufd_private.h > create mode 100644 drivers/iommu/iommufd/iommufd_test.h > create mode 100644 drivers/iommu/iommufd/main.c > create mode 100644 drivers/iommu/iommufd/pages.c > create mode 100644 drivers/iommu/iommufd/selftest.c > create mode 100644 drivers/iommu/iommufd/vfio_compat.c > create mode 100644 include/linux/iommufd.h > create mode 100644 include/uapi/linux/iommufd.h > create mode 100644 tools/testing/selftests/iommu/.gitignore > create mode 100644 tools/testing/selftests/iommu/Makefile > create mode 100644 tools/testing/selftests/iommu/config > create mode 100644 tools/testing/selftests/iommu/iommufd.c > > > base-commit: d1c716ed82a6bf4c35ba7be3741b9362e84cd722