On Thu, May 09, 2024 at 12:10:49PM -0600, Alex Williamson wrote: > On Tue, 7 May 2024 14:21:38 +0800 > Yan Zhao <yan.y.zhao@xxxxxxxxx> wrote: ... > > drivers/vfio/vfio_iommu_type1.c | 51 +++++++++++++++++++++++++++++++++ > > 1 file changed, 51 insertions(+) > > > > diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c > > index b5c15fe8f9fc..ce873f4220bf 100644 > > --- a/drivers/vfio/vfio_iommu_type1.c > > +++ b/drivers/vfio/vfio_iommu_type1.c > > @@ -74,6 +74,7 @@ struct vfio_iommu { > > bool v2; > > bool nesting; > > bool dirty_page_tracking; > > + bool has_noncoherent_domain; > > struct list_head emulated_iommu_groups; > > }; > > > > @@ -99,6 +100,7 @@ struct vfio_dma { > > unsigned long *bitmap; > > struct mm_struct *mm; > > size_t locked_vm; > > + bool cache_flush_required; /* For noncoherent domain */ > > Poor packing, minimally this should be grouped with the other bools in > the structure, longer term they should likely all be converted to > bit fields. Yes. Will do! > > > }; > > > > struct vfio_batch { > > @@ -716,6 +718,9 @@ static long vfio_unpin_pages_remote(struct vfio_dma *dma, dma_addr_t iova, > > long unlocked = 0, locked = 0; > > long i; > > > > + if (dma->cache_flush_required) > > + arch_clean_nonsnoop_dma(pfn << PAGE_SHIFT, npage << PAGE_SHIFT); > > + > > for (i = 0; i < npage; i++, iova += PAGE_SIZE) { > > if (put_pfn(pfn++, dma->prot)) { > > unlocked++; > > @@ -1099,6 +1104,8 @@ static long vfio_unmap_unpin(struct vfio_iommu *iommu, struct vfio_dma *dma, > > &iotlb_gather); > > } > > > > + dma->cache_flush_required = false; > > + > > if (do_accounting) { > > vfio_lock_acct(dma, -unlocked, true); > > return 0; > > @@ -1120,6 +1127,21 @@ static void vfio_remove_dma(struct vfio_iommu *iommu, struct vfio_dma *dma) > > iommu->dma_avail++; > > } > > > > +static void vfio_update_noncoherent_domain_state(struct vfio_iommu *iommu) > > +{ > > + struct vfio_domain *domain; > > + bool has_noncoherent = false; > > + > > + list_for_each_entry(domain, &iommu->domain_list, next) { > > + if (domain->enforce_cache_coherency) > > + continue; > > + > > + has_noncoherent = true; > > + break; > > + } > > + iommu->has_noncoherent_domain = has_noncoherent; > > +} > > This should be merged with vfio_domains_have_enforce_cache_coherency() > and the VFIO_DMA_CC_IOMMU extension (if we keep it, see below). Will convert it to a counter and do the merge. Thanks for pointing it out! > > > + > > static void vfio_update_pgsize_bitmap(struct vfio_iommu *iommu) > > { > > struct vfio_domain *domain; > > @@ -1455,6 +1477,12 @@ static int vfio_pin_map_dma(struct vfio_iommu *iommu, struct vfio_dma *dma, > > > > vfio_batch_init(&batch); > > > > + /* > > + * Record necessity to flush CPU cache to make sure CPU cache is flushed > > + * for both pin & map and unmap & unpin (for unwind) paths. > > + */ > > + dma->cache_flush_required = iommu->has_noncoherent_domain; > > + > > while (size) { > > /* Pin a contiguous chunk of memory */ > > npage = vfio_pin_pages_remote(dma, vaddr + dma->size, > > @@ -1466,6 +1494,10 @@ static int vfio_pin_map_dma(struct vfio_iommu *iommu, struct vfio_dma *dma, > > break; > > } > > > > + if (dma->cache_flush_required) > > + arch_clean_nonsnoop_dma(pfn << PAGE_SHIFT, > > + npage << PAGE_SHIFT); > > + > > /* Map it! */ > > ret = vfio_iommu_map(iommu, iova + dma->size, pfn, npage, > > dma->prot); > > @@ -1683,9 +1715,14 @@ static int vfio_iommu_replay(struct vfio_iommu *iommu, > > for (; n; n = rb_next(n)) { > > struct vfio_dma *dma; > > dma_addr_t iova; > > + bool cache_flush_required; > > > > dma = rb_entry(n, struct vfio_dma, node); > > iova = dma->iova; > > + cache_flush_required = !domain->enforce_cache_coherency && > > + !dma->cache_flush_required; > > + if (cache_flush_required) > > + dma->cache_flush_required = true; > > The variable name here isn't accurate and the logic is confusing. If > the domain does not enforce coherency and the mapping is not tagged as > requiring a cache flush, then we need to mark the mapping as requiring > a cache flush. So the variable state is something more akin to > set_cache_flush_required. But all we're saving with this is a > redundant set if the mapping is already tagged as requiring a cache > flush, so it could really be simplified to: > > dma->cache_flush_required = !domain->enforce_cache_coherency; Sorry about the confusion. If dma->cache_flush_required is set to true by a domain not enforcing cache coherency, we hope it will not be reset to false by a later attaching to domain enforcing cache coherency due to the lazily flushing design. > It might add more clarity to just name the mapping flag > dma->mapped_noncoherent. The dma->cache_flush_required is to mark whether pages in a vfio_dma requires cache flush in the subsequence mapping into the first non-coherent domain and page unpinning. So, mapped_noncoherent may not be accurate. Do you think it's better to put a comment for explanation? struct vfio_dma { ... bool iommu_mapped; bool lock_cap; /* capable(CAP_IPC_LOCK) */ bool vaddr_invalid; /* * Mark whether it is required to flush CPU caches when mapping pages * of the vfio_dma to the first non-coherent domain and when unpinning * pages of the vfio_dma */ bool cache_flush_required; ... }; > > > > > while (iova < dma->iova + dma->size) { > > phys_addr_t phys; > > @@ -1737,6 +1774,9 @@ static int vfio_iommu_replay(struct vfio_iommu *iommu, > > size = npage << PAGE_SHIFT; > > } > > > > + if (cache_flush_required) > > + arch_clean_nonsnoop_dma(phys, size); > > + > > I agree with others as well that this arch callback should be named > something relative to the cache-flush/write-back operation that it > actually performs instead of the overall reason for us requiring it. > Ok. If there are no objections, I'll rename it to arch_flush_cache_phys() as suggested by Kevin. > > ret = iommu_map(domain->domain, iova, phys, size, > > dma->prot | IOMMU_CACHE, > > GFP_KERNEL_ACCOUNT); > > @@ -1801,6 +1841,7 @@ static int vfio_iommu_replay(struct vfio_iommu *iommu, > > vfio_unpin_pages_remote(dma, iova, phys >> PAGE_SHIFT, > > size >> PAGE_SHIFT, true); > > } > > + dma->cache_flush_required = false; > > } > > > > vfio_batch_fini(&batch); > > @@ -1828,6 +1869,9 @@ static void vfio_test_domain_fgsp(struct vfio_domain *domain, struct list_head * > > if (!pages) > > return; > > > > + if (!domain->enforce_cache_coherency) > > + arch_clean_nonsnoop_dma(page_to_phys(pages), PAGE_SIZE * 2); > > + > > list_for_each_entry(region, regions, list) { > > start = ALIGN(region->start, PAGE_SIZE * 2); > > if (start >= region->end || (region->end - start < PAGE_SIZE * 2)) > > @@ -1847,6 +1891,9 @@ static void vfio_test_domain_fgsp(struct vfio_domain *domain, struct list_head * > > break; > > } > > > > + if (!domain->enforce_cache_coherency) > > + arch_clean_nonsnoop_dma(page_to_phys(pages), PAGE_SIZE * 2); > > + > > Seems like this use case isn't subject to the unmap aspect since these > are kernel allocated and freed pages rather than userspace pages. > There's not an "ongoing use of the page" concern. > > The window of opportunity for a device to discover and exploit the > mapping side issue appears almost impossibly small. > The concern is for a malicious device attempting DMAs automatically. Do you think this concern is valid? As there're only extra flushes for 4 pages, what about keeping it for safety? > > __free_pages(pages, order); > > } > > > > @@ -2308,6 +2355,8 @@ static int vfio_iommu_type1_attach_group(void *iommu_data, > > > > list_add(&domain->next, &iommu->domain_list); > > vfio_update_pgsize_bitmap(iommu); > > + if (!domain->enforce_cache_coherency) > > + vfio_update_noncoherent_domain_state(iommu); > > Why isn't this simply: > > if (!domain->enforce_cache_coherency) > iommu->has_noncoherent_domain = true; Yes, it's simpler during attach. > Or maybe: > > if (!domain->enforce_cache_coherency) > iommu->noncoherent_domains++; Yes, this counter is better. I previously thought a bool can save some space. > > done: > > /* Delete the old one and insert new iova list */ > > vfio_iommu_iova_insert_copy(iommu, &iova_copy); > > @@ -2508,6 +2557,8 @@ static void vfio_iommu_type1_detach_group(void *iommu_data, > > } > > iommu_domain_free(domain->domain); > > list_del(&domain->next); > > + if (!domain->enforce_cache_coherency) > > + vfio_update_noncoherent_domain_state(iommu); > > If we were to just track the number of noncoherent domains, this could > simply be iommu->noncoherent_domains-- and VFIO_DMA_CC_DMA could be: > > return iommu->noncoherent_domains ? 1 : 0; > > Maybe there should be wrappers for list_add() and list_del() relative > to the iommu domain list to make it just be a counter. Thanks, Do you think we can skip the "iommu->noncoherent_domains--" in vfio_iommu_type1_release() when iommu is about to be freed. Asking that is also because it's hard for me to find a good name for the wrapper around list_del(). :) It follows vfio_release_domain() in vfio_iommu_type1_release(), but not in vfio_iommu_type1_detach_group(). > > > > kfree(domain); > > vfio_iommu_aper_expand(iommu, &iova_copy); > > vfio_update_pgsize_bitmap(iommu); >