Re: [RFC v2 1/1] memory: Delete assertion in memory_region_unregister_iommu_notifier

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On 2020/7/10 下午9:30, Peter Xu wrote:
On Fri, Jul 10, 2020 at 02:34:11PM +0800, Jason Wang wrote:
On 2020/7/9 下午10:10, Peter Xu wrote:
On Thu, Jul 09, 2020 at 01:58:33PM +0800, Jason Wang wrote:
- If we care the performance, it's better to implement the MAP event for
vhost, otherwise it could be a lot of IOTLB miss
I feel like these are two things.

So far what we are talking about is whether vt-d should have knowledge about
what kind of events one iommu notifier is interested in.  I still think we
should keep this as answered in question 1.

The other question is whether we want to switch vhost from UNMAP to MAP/UNMAP
events even without vDMA, so that vhost can establish the mapping even before
IO starts.  IMHO it's doable, but only if the guest runs DPDK workloads.  When
the guest is using dynamic iommu page mappings, I feel like that can be even
slower, because then the worst case is for each IO we'll need to vmexit twice:

     - The first vmexit caused by an invalidation to MAP the page tables, so vhost
       will setup the page table before IO starts

     - IO/DMA triggers and completes

     - The second vmexit caused by another invalidation to UNMAP the page tables

So it seems to be worse than when vhost only uses UNMAP like right now.  At
least we only have one vmexit (when UNMAP).  We'll have a vhost translate()
request from kernel to userspace, but IMHO that's cheaper than the vmexit.
Right but then I would still prefer to have another notifier.

Since vtd_page_walk has nothing to do with device IOTLB. IOMMU have a
dedicated command for flushing device IOTLB. But the check for
vtd_as_has_map_notifier is used to skip the device which can do demand
paging via ATS or device specific way. If we have two different notifiers,
vhost will be on the device iotlb notifier so we don't need it at all?
But we can still have iommu notifier that only registers to UNMAP even after we
introduce dev-iotlb notifier?  We don't want to do page walk for them as well.
TCG should be the only one so far, but I don't know.. maybe there can still be
new ones?

I think you're right. But looking at the codes, it looks like the check of
vtd_as_has_map_notifier() was only used in:

1) vtd_iommu_replay()
2) vtd_iotlb_page_invalidate_notify() (PSI)

For the replay, it's expensive anyhow. For PSI, I think it's just about one
or few mappings, not sure it will have obvious performance impact.

And I had two questions:

1) The codes doesn't check map for DSI or GI, does this match what spec
said? (It looks to me the spec is unclear in this part)
Both DSI/GI should cover maps too?  E.g. vtd_sync_shadow_page_table() in
vtd_iotlb_domain_invalidate().


I meant the code doesn't check whether there's an MAP notifier :)



2) for the replay() I don't see other implementations (either spapr or
generic one) that did unmap (actually they skip unmap explicitly), any
reason for doing this in intel IOMMU?
I could be wrong, but I'd guess it's because vt-d implemented the caching mode
by leveraging the same invalidation strucuture, so it's harder to make all
things right (IOW, we can't clearly identify MAP with UNMAP when we receive an
invalidation request, because MAP/UNMAP requests look the same).

I didn't check others, but I believe spapr is doing it differently by using
some hypercalls to deliver IOMMU map/unmap requests, which seems a bit close to
what virtio-iommu is doing.  Anyway, the point is if we have explicit MAP/UNMAP
from the guest, logically the replay indeed does not need to do any unmap
because we don't need to call replay() on an already existing device but only
for e.g. hot plug.


But this looks conflict with what memory_region_iommu_replay( ) did, for IOMMU that doesn't have a replay method, it skips UNMAP request:

    for (addr = 0; addr < memory_region_size(mr); addr += granularity) {
        iotlb = imrc->translate(iommu_mr, addr, IOMMU_NONE, n->iommu_idx);
        if (iotlb.perm != IOMMU_NONE) {
            n->notify(n, &iotlb);
        }

I guess there's no knowledge of whether guest have an explicit MAP/UMAP for this generic code. Or replay implies that guest doesn't have explicit MAP/UNMAP?

(btw, the code shortcut the memory_region_notify_one(), not sure the reason)


  VT-d does not have that clear interface, so VT-d needs to
maintain its own mapping structures, and also vt-d is using the same replay &
page_walk operations to sync all these structures, which complicated the vt-d
replay a bit.  With that, we assume replay() can be called anytime on a device,
and we won't notify duplicated MAPs to lower layer like vfio if it is mapped
before.  At the meantime, since we'll compare the latest mapping with the one
we cached in the iova tree, UNMAP becomes possible too.


AFAIK vtd_iommu_replay() did a completely UNMAP:

    /*
     * The replay can be triggered by either a invalidation or a newly
     * created entry. No matter what, we release existing mappings
     * (it means flushing caches for UNMAP-only registers).
     */
    vtd_address_space_unmap(vtd_as, n);

Since it doesn't do any comparison with iova tree. Will this cause unnecessary UNMAP to be sent to VFIO?

Thanks



Thanks,





[Index of Archives]     [Virt Tools]     [Libvirt Users]     [Lib OS Info]     [Fedora Users]     [Fedora Desktop]     [Fedora SELinux]     [Big List of Linux Books]     [Yosemite News]     [KDE Users]     [Fedora Tools]

  Powered by Linux