Re: [PATCH RFC v2 00/13] IOMMUFD Generic interface

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

On 9/13/22 03:55, Tian, Kevin wrote:
> We didn't close the open of how to get this merged in LPC due to the
> audio issue. Then let's use mails.
>
> Overall there are three options on the table:
>
> 1) Require vfio-compat to be 100% compatible with vfio-type1
>
>    Probably not a good choice given the amount of work to fix the remaining
>    gaps. And this will block support of new IOMMU features for a longer time.
>
> 2) Leave vfio-compat as what it is in this series
>
>    Treat it as a vehicle to validate the iommufd logic instead of immediately
>    replacing vfio-type1. Functionally most vfio applications can work w/o
>    change if putting aside the difference on locked mm accounting, p2p, etc.
>
>    Then work on new features and 100% vfio-type1 compat. in parallel.
>
> 3) Focus on iommufd native uAPI first
>
>    Require vfio_device cdev and adoption in Qemu. Only for new vfio app.
>
>    Then work on new features and vfio-compat in parallel.
>
> I'm fine with either 2) or 3). Per a quick chat with Alex he prefers to 3).

I am also inclined to pursue 3) as this was the initial Jason's guidance
and pre-requisite to integrate new features. In the past we concluded
vfio-compat would mostly be used for testing purpose. Our QEMU
integration fully is based on device based API.

Thanks

Eric
>
> Jason, how about your opinion?
>
> Thanks
> Kevin
>
>> From: Jason Gunthorpe <jgg@xxxxxxxxxx>
>> Sent: Saturday, September 3, 2022 3:59 AM
>>
>> iommufd is the user API to control the IOMMU subsystem as it relates to
>> managing IO page tables that point at user space memory.
>>
>> It takes over from drivers/vfio/vfio_iommu_type1.c (aka the VFIO
>> container) which is the VFIO specific interface for a similar idea.
>>
>> We see a broad need for extended features, some being highly IOMMU
>> device
>> specific:
>>  - Binding iommu_domain's to PASID/SSID
>>  - Userspace page tables, for ARM, x86 and S390
>>  - Kernel bypass'd invalidation of user page tables
>>  - Re-use of the KVM page table in the IOMMU
>>  - Dirty page tracking in the IOMMU
>>  - Runtime Increase/Decrease of IOPTE size
>>  - PRI support with faults resolved in userspace
>>
>> As well as a need to access these features beyond just VFIO, from VDPA for
>> instance. Other classes of accelerator HW are touching on these areas now
>> too.
>>
>> The pre-v1 series proposed re-using the VFIO type 1 data structure,
>> however it was suggested that if we are doing this big update then we
>> should also come with an improved data structure that solves the
>> limitations that VFIO type1 has. Notably this addresses:
>>
>>  - Multiple IOAS/'containers' and multiple domains inside a single FD
>>
>>  - Single-pin operation no matter how many domains and containers use
>>    a page
>>
>>  - A fine grained locking scheme supporting user managed concurrency for
>>    multi-threaded map/unmap
>>
>>  - A pre-registration mechanism to optimize vIOMMU use cases by
>>    pre-pinning pages
>>
>>  - Extended ioctl API that can manage these new objects and exposes
>>    domains directly to user space
>>
>>  - domains are sharable between subsystems, eg VFIO and VDPA
>>
>> The bulk of this code is a new data structure design to track how the
>> IOVAs are mapped to PFNs.
>>
>> iommufd intends to be general and consumable by any driver that wants to
>> DMA to userspace. From a driver perspective it can largely be dropped in
>> in-place of iommu_attach_device() and provides a uniform full feature set
>> to all consumers.
>>
>> As this is a larger project this series is the first step. This series
>> provides the iommfd "generic interface" which is designed to be suitable
>> for applications like DPDK and VMM flows that are not optimized to
>> specific HW scenarios. It is close to being a drop in replacement for the
>> existing VFIO type 1.
>>
>> Several follow-on series are being prepared:
>>
>> - Patches integrating with qemu in native mode:
>>   https://github.com/yiliu1765/qemu/commits/qemu-iommufd-6.0-rc2
>>
>> - A completed integration with VFIO now exists that covers "emulated" mdev
>>   use cases now, and can pass testing with qemu/etc in compatability mode:
>>   https://github.com/jgunthorpe/linux/commits/vfio_iommufd
>>
>> - A draft providing system iommu dirty tracking on top of iommufd,
>>   including iommu driver implementations:
>>   https://github.com/jpemartins/linux/commits/x86-iommufd
>>
>>   This pairs with patches for providing a similar API to support VFIO-device
>>   tracking to give a complete vfio solution:
>>   https://lore.kernel.org/kvm/20220901093853.60194-1-yishaih@xxxxxxxxxx/
>>
>> - Userspace page tables aka 'nested translation' for ARM and Intel iommu
>>   drivers:
>>   https://github.com/nicolinc/iommufd/commits/iommufd_nesting
>>
>> - "device centric" vfio series to expose the vfio_device FD directly as a
>>   normal cdev, and provide an extended API allowing dynamically changing
>>   the IOAS binding:
>>   https://github.com/yiliu1765/iommufd/commits/iommufd-v6.0-rc2-
>> nesting-0901
>>
>> - Drafts for PASID and PRI interfaces are included above as well
>>
>> Overall enough work is done now to show the merit of the new API design
>> and at least draft solutions to many of the main problems.
>>
>> Several people have contributed directly to this work: Eric Auger, Joao
>> Martins, Kevin Tian, Lu Baolu, Nicolin Chen, Yi L Liu. Many more have
>> participated in the discussions that lead here, and provided ideas. Thanks
>> to all!
>>
>> The v1 iommufd series has been used to guide a large amount of preparatory
>> work that has now been merged. The general theme is to organize things in
>> a way that makes injecting iommufd natural:
>>
>>  - VFIO live migration support with mlx5 and hisi_acc drivers.
>>    These series need a dirty tracking solution to be really usable.
>>    https://lore.kernel.org/kvm/20220224142024.147653-1-
>> yishaih@xxxxxxxxxx/
>>    https://lore.kernel.org/kvm/20220308184902.2242-1-
>> shameerali.kolothum.thodi@xxxxxxxxxx/
>>
>>  - Significantly rework the VFIO gvt mdev and remove struct
>>    mdev_parent_ops
>>    https://lore.kernel.org/lkml/20220411141403.86980-1-hch@xxxxxx/
>>
>>  - Rework how PCIe no-snoop blocking works
>>    https://lore.kernel.org/kvm/0-v3-2cf356649677+a32-
>> intel_no_snoop_jgg@xxxxxxxxxx/
>>
>>  - Consolidate dma ownership into the iommu core code
>>    https://lore.kernel.org/linux-iommu/20220418005000.897664-1-
>> baolu.lu@xxxxxxxxxxxxxxx/
>>
>>  - Make all vfio driver interfaces use struct vfio_device consistently
>>    https://lore.kernel.org/kvm/0-v4-8045e76bf00b+13d-
>> vfio_mdev_no_group_jgg@xxxxxxxxxx/
>>
>>  - Remove the vfio_group from the kvm/vfio interface
>>    https://lore.kernel.org/kvm/0-v3-f7729924a7ea+25e33-
>> vfio_kvm_no_group_jgg@xxxxxxxxxx/
>>
>>  - Simplify locking in vfio
>>    https://lore.kernel.org/kvm/0-v2-d035a1842d81+1bf-
>> vfio_group_locking_jgg@xxxxxxxxxx/
>>
>>  - Remove the vfio notifiter scheme that faces drivers
>>    https://lore.kernel.org/kvm/0-v4-681e038e30fd+78-
>> vfio_unmap_notif_jgg@xxxxxxxxxx/
>>
>>  - Improve the driver facing API for vfio pin/unpin pages to make the
>>    presence of struct page clear
>>    https://lore.kernel.org/kvm/20220723020256.30081-1-
>> nicolinc@xxxxxxxxxx/
>>
>>  - Clean up in the Intel IOMMU driver
>>    https://lore.kernel.org/linux-iommu/20220301020159.633356-1-
>> baolu.lu@xxxxxxxxxxxxxxx/
>>    https://lore.kernel.org/linux-iommu/20220510023407.2759143-1-
>> baolu.lu@xxxxxxxxxxxxxxx/
>>    https://lore.kernel.org/linux-iommu/20220514014322.2927339-1-
>> baolu.lu@xxxxxxxxxxxxxxx/
>>    https://lore.kernel.org/linux-iommu/20220706025524.2904370-1-
>> baolu.lu@xxxxxxxxxxxxxxx/
>>    https://lore.kernel.org/linux-iommu/20220702015610.2849494-1-
>> baolu.lu@xxxxxxxxxxxxxxx/
>>
>>  - Rework s390 vfio drivers
>>    https://lore.kernel.org/kvm/20220707135737.720765-1-
>> farman@xxxxxxxxxxxxx/
>>
>>  - Normalize vfio ioctl handling
>>    https://lore.kernel.org/kvm/0-v2-0f9e632d54fb+d6-
>> vfio_ioctl_split_jgg@xxxxxxxxxx/
>>
>> This is about 168 patches applied since March, thank you to everyone
>> involved in all this work!
>>
>> Currently there are a number of supporting series still in progress:
>>  - Simplify and consolidate iommu_domain/device compatability checking
>>    https://lore.kernel.org/linux-iommu/20220815181437.28127-1-
>> nicolinc@xxxxxxxxxx/
>>
>>  - Align iommu SVA support with the domain-centric model
>>    https://lore.kernel.org/linux-iommu/20220826121141.50743-1-
>> baolu.lu@xxxxxxxxxxxxxxx/
>>
>>  - VFIO API for dirty tracking (aka dma logging) managed inside a PCI
>>    device, with mlx5 implementation
>>    https://lore.kernel.org/kvm/20220901093853.60194-1-yishaih@xxxxxxxxxx
>>
>>  - Introduce a struct device sysfs presence for struct vfio_device
>>    https://lore.kernel.org/kvm/20220901143747.32858-1-
>> kevin.tian@xxxxxxxxx/
>>
>>  - Complete restructuring the vfio mdev model
>>    https://lore.kernel.org/kvm/20220822062208.152745-1-hch@xxxxxx/
>>
>>  - DMABUF exporter support for VFIO to allow PCI P2P with VFIO
>>    https://lore.kernel.org/r/0-v2-472615b3877e+28f7-
>> vfio_dma_buf_jgg@xxxxxxxxxx
>>
>>  - Isolate VFIO container code in preperation for iommufd to provide an
>>    alternative implementation of it all
>>    https://lore.kernel.org/kvm/0-v1-a805b607f1fb+17b-
>> vfio_container_split_jgg@xxxxxxxxxx
>>
>>  - Start to provide iommu_domain ops for power
>>    https://lore.kernel.org/all/20220714081822.3717693-1-aik@xxxxxxxxx/
>>
>> Right now there is no more preperatory work sketched out, so this is the
>> last of it.
>>
>> This series remains RFC as there are still several important FIXME's to
>> deal with first, but things are on track for non-RFC in the near future.
>>
>> This is on github: https://github.com/jgunthorpe/linux/commits/iommufd
>>
>> v2:
>>  - Rebase to v6.0-rc3
>>  - Improve comments
>>  - Change to an iterative destruction approach to avoid cycles
>>  - Near rewrite of the vfio facing implementation, supported by a complete
>>    implementation on the vfio side
>>  - New IOMMU_IOAS_ALLOW_IOVAS API as discussed. Allows userspace to
>>    assert that ranges of IOVA must always be mappable. To be used by a
>> VMM
>>    that has promised a guest a certain availability of IOVA. May help
>>    guide PPC's multi-window implementation.
>>  - Rework how unmap_iova works, user can unmap the whole ioas now
>>  - The no-snoop / wbinvd support is implemented
>>  - Bug fixes
>>  - Test suite improvements
>>  - Lots of smaller changes (the interdiff is 3k lines)
>> v1: https://lore.kernel.org/r/0-v1-e79cd8d168e8+6-
>> iommufd_jgg@xxxxxxxxxx
>>
>> # S390 in-kernel page table walker
>> Cc: Niklas Schnelle <schnelle@xxxxxxxxxxxxx>
>> Cc: Matthew Rosato <mjrosato@xxxxxxxxxxxxx>
>> # AMD Dirty page tracking
>> Cc: Joao Martins <joao.m.martins@xxxxxxxxxx>
>> # ARM SMMU Dirty page tracking
>> Cc: Keqian Zhu <zhukeqian1@xxxxxxxxxx>
>> Cc: Shameerali Kolothum Thodi <shameerali.kolothum.thodi@xxxxxxxxxx>
>> # ARM SMMU nesting
>> Cc: Eric Auger <eric.auger@xxxxxxxxxx>
>> Cc: Jean-Philippe Brucker <jean-philippe@xxxxxxxxxx>
>> # Map/unmap performance
>> Cc: Daniel Jordan <daniel.m.jordan@xxxxxxxxxx>
>> # VDPA
>> Cc: "Michael S. Tsirkin" <mst@xxxxxxxxxx>
>> Cc: Jason Wang <jasowang@xxxxxxxxxx>
>> # Power
>> Cc: David Gibson <david@xxxxxxxxxxxxxxxxxxxxx>
>> # vfio
>> Cc: Alex Williamson <alex.williamson@xxxxxxxxxx>
>> Cc: Cornelia Huck <cohuck@xxxxxxxxxx>
>> Cc: kvm@xxxxxxxxxxxxxxx
>> # iommu
>> Cc: iommu@xxxxxxxxxxxxxxx
>> # Collaborators
>> Cc: "Chaitanya Kulkarni" <chaitanyak@xxxxxxxxxx>
>> Cc: Nicolin Chen <nicolinc@xxxxxxxxxx>
>> Cc: Lu Baolu <baolu.lu@xxxxxxxxxxxxxxx>
>> Cc: Kevin Tian <kevin.tian@xxxxxxxxx>
>> Cc: Yi Liu <yi.l.liu@xxxxxxxxx>
>> # s390
>> Cc: Eric Farman <farman@xxxxxxxxxxxxx>
>> Signed-off-by: Jason Gunthorpe <jgg@xxxxxxxxxx>
>>
>> Jason Gunthorpe (12):
>>   interval-tree: Add a utility to iterate over spans in an interval tree
>>   iommufd: File descriptor, context, kconfig and makefiles
>>   kernel/user: Allow user::locked_vm to be usable for iommufd
>>   iommufd: PFN handling for iopt_pages
>>   iommufd: Algorithms for PFN storage
>>   iommufd: Data structure to provide IOVA to PFN mapping
>>   iommufd: IOCTLs for the io_pagetable
>>   iommufd: Add a HW pagetable object
>>   iommufd: Add kAPI toward external drivers for physical devices
>>   iommufd: Add kAPI toward external drivers for kernel access
>>   iommufd: vfio container FD ioctl compatibility
>>   iommufd: Add a selftest
>>
>> Kevin Tian (1):
>>   iommufd: Overview documentation
>>
>>  .clang-format                                 |    1 +
>>  Documentation/userspace-api/index.rst         |    1 +
>>  .../userspace-api/ioctl/ioctl-number.rst      |    1 +
>>  Documentation/userspace-api/iommufd.rst       |  224 +++
>>  MAINTAINERS                                   |   10 +
>>  drivers/iommu/Kconfig                         |    1 +
>>  drivers/iommu/Makefile                        |    2 +-
>>  drivers/iommu/iommufd/Kconfig                 |   22 +
>>  drivers/iommu/iommufd/Makefile                |   13 +
>>  drivers/iommu/iommufd/device.c                |  580 +++++++
>>  drivers/iommu/iommufd/hw_pagetable.c          |   68 +
>>  drivers/iommu/iommufd/io_pagetable.c          |  984 ++++++++++++
>>  drivers/iommu/iommufd/io_pagetable.h          |  186 +++
>>  drivers/iommu/iommufd/ioas.c                  |  338 ++++
>>  drivers/iommu/iommufd/iommufd_private.h       |  266 ++++
>>  drivers/iommu/iommufd/iommufd_test.h          |   74 +
>>  drivers/iommu/iommufd/main.c                  |  392 +++++
>>  drivers/iommu/iommufd/pages.c                 | 1301 +++++++++++++++
>>  drivers/iommu/iommufd/selftest.c              |  626 ++++++++
>>  drivers/iommu/iommufd/vfio_compat.c           |  423 +++++
>>  include/linux/interval_tree.h                 |   47 +
>>  include/linux/iommufd.h                       |  101 ++
>>  include/linux/sched/user.h                    |    2 +-
>>  include/uapi/linux/iommufd.h                  |  279 ++++
>>  kernel/user.c                                 |    1 +
>>  lib/interval_tree.c                           |   98 ++
>>  tools/testing/selftests/Makefile              |    1 +
>>  tools/testing/selftests/iommu/.gitignore      |    2 +
>>  tools/testing/selftests/iommu/Makefile        |   11 +
>>  tools/testing/selftests/iommu/config          |    2 +
>>  tools/testing/selftests/iommu/iommufd.c       | 1396 +++++++++++++++++
>>  31 files changed, 7451 insertions(+), 2 deletions(-)
>>  create mode 100644 Documentation/userspace-api/iommufd.rst
>>  create mode 100644 drivers/iommu/iommufd/Kconfig
>>  create mode 100644 drivers/iommu/iommufd/Makefile
>>  create mode 100644 drivers/iommu/iommufd/device.c
>>  create mode 100644 drivers/iommu/iommufd/hw_pagetable.c
>>  create mode 100644 drivers/iommu/iommufd/io_pagetable.c
>>  create mode 100644 drivers/iommu/iommufd/io_pagetable.h
>>  create mode 100644 drivers/iommu/iommufd/ioas.c
>>  create mode 100644 drivers/iommu/iommufd/iommufd_private.h
>>  create mode 100644 drivers/iommu/iommufd/iommufd_test.h
>>  create mode 100644 drivers/iommu/iommufd/main.c
>>  create mode 100644 drivers/iommu/iommufd/pages.c
>>  create mode 100644 drivers/iommu/iommufd/selftest.c
>>  create mode 100644 drivers/iommu/iommufd/vfio_compat.c
>>  create mode 100644 include/linux/iommufd.h
>>  create mode 100644 include/uapi/linux/iommufd.h
>>  create mode 100644 tools/testing/selftests/iommu/.gitignore
>>  create mode 100644 tools/testing/selftests/iommu/Makefile
>>  create mode 100644 tools/testing/selftests/iommu/config
>>  create mode 100644 tools/testing/selftests/iommu/iommufd.c
>>
>>
>> base-commit: b90cb1053190353cc30f0fef0ef1f378ccc063c5
>> --
>> 2.37.3




[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux