On Mon, Dec 04, 2023 at 09:00:55AM -0800, Sean Christopherson wrote: > There are more approaches beyond having IOMMUFD and KVM be > completely separate entities. E.g. extract the bulk of KVM's "TDP > MMU" implementation to common code so that IOMMUFD doesn't need to > reinvent the wheel. We've pretty much done this already, it is called "hmm" and it is what the IO world uses. Merging/splitting huge page is just something that needs some coding in the page table code, that people want for other reasons anyhow. > - Subjects IOMMUFD to all of KVM's historical baggage, e.g. the memslot deletion > mess, the truly nasty MTRR emulation (which I still hope to delete), the NX > hugepage mitigation, etc. Does it? I think that just remains isolated in kvm. The output from KVM is only a radix table top pointer, it is up to KVM how to manage it still. > I'm not convinced that memory consumption is all that interesting. If a VM is > mapping the majority of memory into a device, then odds are good that the guest > is backed with at least 2MiB page, if not 1GiB pages, at which point the memory > overhead for pages tables is quite small, especially relative to the total amount > of memory overheads for such systems. AFAIK the main argument is performance. It is similar to why we want to do IOMMU SVA with MM page table sharing. If IOMMU mirrors/shadows/copies a page table using something like HMM techniques then the invalidations will mark ranges of IOVA as non-present and faults will occur to trigger hmm_range_fault to do the shadowing. This means that pretty much all IO will always encounter a non-present fault, certainly at the start and maybe worse while ongoing. On the other hand, if we share the exact page table then natural CPU touches will usually make the page present before an IO happens in almost all cases and we don't have to take the horribly expensive IO page fault at all. We were not able to make bi-dir notifiers with with the CPU mm, I'm not sure that is "relatively easy" :( Jason