Currently each of the iommu page table formats duplicates all of the logic to maintain the page table and perform map/unmap/etc operations. There are several different versions of the algorithms between all the different formats. The io-pgtable system provides an interface to help isolate the page table code from the iommu driver, but doesn't provide tools to implement the common algorithms. This makes it very hard to improve the state of the pagetable code under the iommu domains as any proposed improvement needs to alter a large number of different driver code paths. Combined with a lack of software based testing this makes improvement in this area very hard. iommufd wants several new page table operations: - More efficient map/unmap operations, using iommufd's batching approach - unmap that returns the physical addresses into a batch as it progresses - cut that allows splitting areas so large pages can have holes poked in them dynamically - More agressive freeing of table memory to avoid waste - Fragmenting large pages so that dirty tracking can run efficiently - Reassembling large pages so that VMs can run at full IO performance in error flows In addition there are possibilities like directly mapping a bvec, or sg_list in more efficient ways, and perhaps even optimizations for the GPU drivers using the io-pgtable code as well. Together these are algorithmically complex enough to be a very significant task to go and implement in all the page table formats we support. Just the "server" focused drivers use almost all the formats (ARMv8 S1&S2 / x86 PAE / AMDv1 / VT-D SS / RISCV) Instead of doing the duplicated work, this series takes the first step to consolidate the algorithms into one places. In spirit it is similar to the work Christoph did a few years back to pull the redundant get_user_pages() implementations out of the arch code into core MM. This unlocked a great deal of improvement in that space in the following years. I would like to see the same benefit in iommu as well. The approach is split into three deliberate layers: - The truely generic page table components. These are very application neutral and could conceivably by used in the MM or KVM if there was a reason. A DRM driver may also be interested in this layer as it could be more efficient than working through the iommu focused ops. - The per format functions. These are a set of small inline functions that abstract the details on how the page table is layed out in memory and what bits do what things. Like the MM these functions share the same name so the same code can be compiled against different formats by including the appropriate format header. - An iommu implementation. This is intended to create ops that can take over from the iommu_domain ops. There is a single set of C routines that compile against all the formats generically. On top of this are two kunit tests, one that directly exercises the iommu implementation across all the different formats. The second kunit does an A/B comparison between the iommupt and the io-pgtable implementation to ensure things are identical. Sort of like MM, this uses multi-compilation where the common code includes format specific headers that implement the same C API. Unlike the MM we need to build multiple page table formats into the same kernel, so each combinaton of format/parameters/iommu implementation is compiled in a single compilation unit and into a module. This results in compiling the same C code multiple times in a single kernel build, using different combinations of header files. The approach is designed to be able to provide both mm-like fully inlined performance, or as typical for iommu, recursive non-inlined smaller .text version. As the implementation is now shared it will be worthwhile to do some performance work and fine tune this as appropriate. I've CC'd a few people from outside iommu that may have some interest in the generic part of this, or ideas how to better build the abstraction and helpers. For this RFC I've provided draft formats for nearly everything (S390 and RISCV are notably not included). The formats all pass the compare test and thus, to a significant degree, produce the same memory layouts for the radix tree. The primary purpose of this breadth is to prove the common API is suitable for the job. Completing these to be fully usable in their respective drivers is still to be done. I'm expecting to show maybe another RFC round with all the formats and pivot to a more focused series, likely just for AMD, that brings the minimum necessary. From there we can work in parallel to add the new iommufd features and convert more of the drievrs. From an iommufd perspective I would like the "server" drivers (AMD / SMMUv3 / VT-D) to be converted as a minimum. This general concept was brough up and discussed a few times during LPC last year and I have a formal session on the schedule for this series in LPC Vienna. There are many additional support patches required to run the kunits, everything is on github: https://github.com/jgunthorpe/linux/commits/iommu_pt FIXME: - Improve the two kunit tests - Implement additional new iommufd ops - Implement the debugfs with the RCU safety - Do a performance study vs the io-pgtable versions - Implement the flush callbacks, iommu core hookups, etc - Look at possible bvec and sg optimizations - Link it up to the iommu drivers and test it in HW as an iommu implementation Cc: Joao Martins <joao.m.martins@xxxxxxxxxx> Cc: Alejandro Jimenez <alejandro.j.jimenez@xxxxxxxxxx> Cc: Christoph Hellwig <hch@xxxxxx> Cc: Pasha Tatashin <pasha.tatashin@xxxxxxxxxx> Cc: Peter Xu <peterx@xxxxxxxxxx> Cc: David Hildenbrand <david@xxxxxxxxxx> Cc: Sean Christopherson <seanjc@xxxxxxxxxx> Cc: Ryan Roberts <ryan.roberts@xxxxxxx> Cc: iommu@xxxxxxxxxxxxxxx Cc: kvm@xxxxxxxxxxxxxxx Cc: linux-mm@xxxxxxxxx Signed-off-by: Jason Gunthorpe <jgg@xxxxxxxxxx> Jason Gunthorpe (16): genpt: Generic Page Table base API genpt: Add a specialized allocator for page table levels iommupt: Add the basic structure of the iommu implementation iommupt: Add iova_to_phys op iommupt: Add unmap_pages op iommupt: Add map_pages op iommupt: Add cut_mapping op iommupt: Add read_and_clear_dirty op iommupt: Add a kunit test for Generic Page Table and the IOMMU implementation iommupt: Add a kunit test to compare against iopt iommupt: Add the 64 bit ARMv8 page table format iommupt: Add the AMD IOMMU v1 page table format iommupt: Add the x86 PAE page table format iommupt: Add the DART v1/v2 page table format iommupt: Add the 32 bit ARMv7s page table format iommupt: Add the Intel VT-D second stage page table format .clang-format | 1 + drivers/iommu/Kconfig | 2 + drivers/iommu/Makefile | 1 + drivers/iommu/generic_pt/.kunitconfig | 23 + drivers/iommu/generic_pt/Kconfig | 117 ++ drivers/iommu/generic_pt/Makefile | 7 + drivers/iommu/generic_pt/fmt/Makefile | 35 + drivers/iommu/generic_pt/fmt/amdv1.h | 372 ++++++ drivers/iommu/generic_pt/fmt/armv7s.h | 529 +++++++++ drivers/iommu/generic_pt/fmt/armv8.h | 621 ++++++++++ drivers/iommu/generic_pt/fmt/dart.h | 371 ++++++ drivers/iommu/generic_pt/fmt/defs_amdv1.h | 21 + drivers/iommu/generic_pt/fmt/defs_armv7s.h | 23 + drivers/iommu/generic_pt/fmt/defs_armv8.h | 28 + drivers/iommu/generic_pt/fmt/defs_dart.h | 21 + drivers/iommu/generic_pt/fmt/defs_vtdss.h | 21 + drivers/iommu/generic_pt/fmt/defs_x86pae.h | 21 + drivers/iommu/generic_pt/fmt/iommu_amdv1.c | 9 + drivers/iommu/generic_pt/fmt/iommu_armv7s.c | 11 + .../iommu/generic_pt/fmt/iommu_armv8_16k.c | 13 + drivers/iommu/generic_pt/fmt/iommu_armv8_4k.c | 13 + .../iommu/generic_pt/fmt/iommu_armv8_64k.c | 13 + drivers/iommu/generic_pt/fmt/iommu_dart.c | 8 + drivers/iommu/generic_pt/fmt/iommu_template.h | 49 + drivers/iommu/generic_pt/fmt/iommu_vtdss.c | 8 + drivers/iommu/generic_pt/fmt/iommu_x86pae.c | 8 + drivers/iommu/generic_pt/fmt/vtdss.h | 276 +++++ drivers/iommu/generic_pt/fmt/x86pae.h | 283 +++++ drivers/iommu/generic_pt/iommu_pt.h | 1030 +++++++++++++++++ drivers/iommu/generic_pt/kunit_generic_pt.h | 576 +++++++++ drivers/iommu/generic_pt/kunit_iommu.h | 105 ++ drivers/iommu/generic_pt/kunit_iommu_cmp.h | 272 +++++ drivers/iommu/generic_pt/kunit_iommu_pt.h | 352 ++++++ drivers/iommu/generic_pt/pt_alloc.c | 174 +++ drivers/iommu/generic_pt/pt_alloc.h | 98 ++ drivers/iommu/generic_pt/pt_common.h | 311 +++++ drivers/iommu/generic_pt/pt_defs.h | 276 +++++ drivers/iommu/generic_pt/pt_fmt_defaults.h | 109 ++ drivers/iommu/generic_pt/pt_iter.h | 468 ++++++++ drivers/iommu/generic_pt/pt_log2.h | 131 +++ include/linux/generic_pt/common.h | 156 +++ include/linux/generic_pt/iommu.h | 344 ++++++ 42 files changed, 7307 insertions(+) create mode 100644 drivers/iommu/generic_pt/.kunitconfig create mode 100644 drivers/iommu/generic_pt/Kconfig create mode 100644 drivers/iommu/generic_pt/Makefile create mode 100644 drivers/iommu/generic_pt/fmt/Makefile create mode 100644 drivers/iommu/generic_pt/fmt/amdv1.h create mode 100644 drivers/iommu/generic_pt/fmt/armv7s.h create mode 100644 drivers/iommu/generic_pt/fmt/armv8.h create mode 100644 drivers/iommu/generic_pt/fmt/dart.h create mode 100644 drivers/iommu/generic_pt/fmt/defs_amdv1.h create mode 100644 drivers/iommu/generic_pt/fmt/defs_armv7s.h create mode 100644 drivers/iommu/generic_pt/fmt/defs_armv8.h create mode 100644 drivers/iommu/generic_pt/fmt/defs_dart.h create mode 100644 drivers/iommu/generic_pt/fmt/defs_vtdss.h create mode 100644 drivers/iommu/generic_pt/fmt/defs_x86pae.h create mode 100644 drivers/iommu/generic_pt/fmt/iommu_amdv1.c create mode 100644 drivers/iommu/generic_pt/fmt/iommu_armv7s.c create mode 100644 drivers/iommu/generic_pt/fmt/iommu_armv8_16k.c create mode 100644 drivers/iommu/generic_pt/fmt/iommu_armv8_4k.c create mode 100644 drivers/iommu/generic_pt/fmt/iommu_armv8_64k.c create mode 100644 drivers/iommu/generic_pt/fmt/iommu_dart.c create mode 100644 drivers/iommu/generic_pt/fmt/iommu_template.h create mode 100644 drivers/iommu/generic_pt/fmt/iommu_vtdss.c create mode 100644 drivers/iommu/generic_pt/fmt/iommu_x86pae.c create mode 100644 drivers/iommu/generic_pt/fmt/vtdss.h create mode 100644 drivers/iommu/generic_pt/fmt/x86pae.h create mode 100644 drivers/iommu/generic_pt/iommu_pt.h create mode 100644 drivers/iommu/generic_pt/kunit_generic_pt.h create mode 100644 drivers/iommu/generic_pt/kunit_iommu.h create mode 100644 drivers/iommu/generic_pt/kunit_iommu_cmp.h create mode 100644 drivers/iommu/generic_pt/kunit_iommu_pt.h create mode 100644 drivers/iommu/generic_pt/pt_alloc.c create mode 100644 drivers/iommu/generic_pt/pt_alloc.h create mode 100644 drivers/iommu/generic_pt/pt_common.h create mode 100644 drivers/iommu/generic_pt/pt_defs.h create mode 100644 drivers/iommu/generic_pt/pt_fmt_defaults.h create mode 100644 drivers/iommu/generic_pt/pt_iter.h create mode 100644 drivers/iommu/generic_pt/pt_log2.h create mode 100644 include/linux/generic_pt/common.h create mode 100644 include/linux/generic_pt/iommu.h base-commit: fdc4344ef3ee7741df149967893fb61240520ab3 -- 2.46.0