Hi Baolu, > -----Original Message----- > From: Lu Baolu [mailto:baolu.lu@xxxxxxxxxxxxxxx] > Sent: Thursday, April 8, 2021 12:32 PM > To: Longpeng (Mike, Cloud Infrastructure Service Product Dept.) > <longpeng2@xxxxxxxxxx>; iommu@xxxxxxxxxxxxxxxxxxxxxxxxxx; > linux-kernel@xxxxxxxxxxxxxxx > Cc: baolu.lu@xxxxxxxxxxxxxxx; David Woodhouse <dwmw2@xxxxxxxxxxxxx>; Nadav > Amit <nadav.amit@xxxxxxxxx>; Alex Williamson <alex.williamson@xxxxxxxxxx>; > Kevin Tian <kevin.tian@xxxxxxxxx>; Gonglei (Arei) <arei.gonglei@xxxxxxxxxx>; > stable@xxxxxxxxxxxxxxx > Subject: Re: [PATCH] iommu/vt-d: Force to flush iotlb before creating superpage > > Hi Longpeng, > > On 4/7/21 2:35 PM, Longpeng (Mike, Cloud Infrastructure Service Product > Dept.) wrote: > > Hi Baolu, > > > >> -----Original Message----- > >> From: Lu Baolu [mailto:baolu.lu@xxxxxxxxxxxxxxx] > >> Sent: Friday, April 2, 2021 12:44 PM > >> To: Longpeng (Mike, Cloud Infrastructure Service Product Dept.) > >> <longpeng2@xxxxxxxxxx>; iommu@xxxxxxxxxxxxxxxxxxxxxxxxxx; > >> linux-kernel@xxxxxxxxxxxxxxx > >> Cc: baolu.lu@xxxxxxxxxxxxxxx; David Woodhouse <dwmw2@xxxxxxxxxxxxx>; > >> Nadav Amit <nadav.amit@xxxxxxxxx>; Alex Williamson > >> <alex.williamson@xxxxxxxxxx>; Kevin Tian <kevin.tian@xxxxxxxxx>; > >> Gonglei (Arei) <arei.gonglei@xxxxxxxxxx>; stable@xxxxxxxxxxxxxxx > >> Subject: Re: [PATCH] iommu/vt-d: Force to flush iotlb before creating > >> superpage > >> > >> Hi Longpeng, > >> > >> On 4/1/21 3:18 PM, Longpeng(Mike) wrote: > >>> diff --git a/drivers/iommu/intel/iommu.c > >>> b/drivers/iommu/intel/iommu.c index ee09323..cbcb434 100644 > >>> --- a/drivers/iommu/intel/iommu.c > >>> +++ b/drivers/iommu/intel/iommu.c > >>> @@ -2342,9 +2342,20 @@ static inline int > >>> hardware_largepage_caps(struct > >> dmar_domain *domain, > >>> * removed to make room for superpage(s). > >>> * We're adding new large pages, so make sure > >>> * we don't remove their parent tables. > >>> + * > >>> + * We also need to flush the iotlb before creating > >>> + * superpage to ensure it does not perserves any > >>> + * obsolete info. > >>> */ > >>> - dma_pte_free_pagetable(domain, iov_pfn, end_pfn, > >>> - largepage_lvl + 1); > >>> + if (dma_pte_present(pte)) { > >> > >> The dma_pte_free_pagetable() clears a batch of PTEs. So checking > >> current PTE is insufficient. How about removing this check and always > >> performing cache invalidation? > >> > > > > Um...the PTE here may be present( e.g. 4K mapping --> superpage mapping ) > orNOT-present ( e.g. create a totally new superpage mapping ), but we only need to > call free_pagetable and flush_iotlb in the former case, right ? > > But this code covers multiple PTEs and perhaps crosses the page boundary. > > How about moving this code into a separated function and check PTE presence > there. A sample code could look like below: [compiled but not tested!] > > diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c index > d334f5b4e382..0e04d450c38a 100644 > --- a/drivers/iommu/intel/iommu.c > +++ b/drivers/iommu/intel/iommu.c > @@ -2300,6 +2300,41 @@ static inline int hardware_largepage_caps(struct > dmar_domain *domain, > return level; > } > > +/* > + * Ensure that old small page tables are removed to make room for > superpage(s). > + * We're going to add new large pages, so make sure we don't remove > their parent > + * tables. The IOTLB/devTLBs should be flushed if any PDE/PTEs are cleared. > + */ > +static void switch_to_super_page(struct dmar_domain *domain, > + unsigned long start_pfn, > + unsigned long end_pfn, int level) { Maybe "swith_to" will lead people to think "remove old and then setup new", so how about something like "remove_room_for_super_page" or "prepare_for_super_page" ? > + unsigned long lvl_pages = lvl_to_nr_pages(level); > + struct dma_pte *pte = NULL; > + int i; > + > + while (start_pfn <= end_pfn) { start_pfn < end_pfn ? > + if (!pte) > + pte = pfn_to_dma_pte(domain, start_pfn, &level); > + > + if (dma_pte_present(pte)) { > + dma_pte_free_pagetable(domain, start_pfn, > + start_pfn + lvl_pages - 1, > + level + 1); > + > + for_each_domain_iommu(i, domain) > + iommu_flush_iotlb_psi(g_iommus[i], > domain, > + start_pfn, > lvl_pages, > + 0, 0); > + } > + > + pte++; > + start_pfn += lvl_pages; > + if (first_pte_in_page(pte)) > + pte = NULL; > + } > +} > + > static int > __domain_mapping(struct dmar_domain *domain, unsigned long iov_pfn, > unsigned long phys_pfn, unsigned long nr_pages, int prot) > @@ -2341,22 +2376,11 @@ __domain_mapping(struct dmar_domain *domain, > unsigned long iov_pfn, > return -ENOMEM; > /* It is large page*/ > if (largepage_lvl > 1) { > - unsigned long nr_superpages, end_pfn; > + unsigned long end_pfn; > > pteval |= DMA_PTE_LARGE_PAGE; > - lvl_pages = lvl_to_nr_pages(largepage_lvl); > - > - nr_superpages = nr_pages / lvl_pages; > - end_pfn = iov_pfn + nr_superpages * > lvl_pages - 1; > - > - /* > - * Ensure that old small page tables are > - * removed to make room for superpage(s). > - * We're adding new large pages, so make > sure > - * we don't remove their parent tables. > - */ > - dma_pte_free_pagetable(domain, iov_pfn, > end_pfn, > - largepage_lvl + > 1); > + end_pfn = ((iov_pfn + nr_pages) & > level_mask(largepage_lvl)) - 1; > + switch_to_super_page(domain, iov_pfn, > end_pfn, largepage_lvl); > } else { > pteval &= > ~(uint64_t)DMA_PTE_LARGE_PAGE; > } > > I will send you the diff patch off list. Any thoughts? > The solution looks good to me. It's free for you to send this patch if you want, I'll send V2 if you have no plan to send it :) > Best regards, > baolu > > > > >>> + int i; > >>> + > >>> + dma_pte_free_pagetable(domain, iov_pfn, end_pfn, > >>> + largepage_lvl + 1); > >>> + for_each_domain_iommu(i, domain) > >>> + iommu_flush_iotlb_psi(g_iommus[i], domain, > >>> + iov_pfn, nr_pages, 0, 0); > >>> + > >> > >> Best regards, > >> baolu