Hi Longpeng,
On 4/1/21 3:18 PM, Longpeng(Mike) wrote:
The translation caches may preserve obsolete data when the
mapping size is changed, suppose the following sequence which
can reveal the problem with high probability.
1.mmap(4GB,MAP_HUGETLB)
2.
while (1) {
(a) DMA MAP 0,0xa0000
(b) DMA UNMAP 0,0xa0000
(c) DMA MAP 0,0xc0000000
* DMA read IOVA 0 may failure here (Not present)
* if the problem occurs.
(d) DMA UNMAP 0,0xc0000000
}
The page table(only focus on IOVA 0) after (a) is:
PML4: 0x19db5c1003 entry:0xffff899bdcd2f000
PDPE: 0x1a1cacb003 entry:0xffff89b35b5c1000
PDE: 0x1a30a72003 entry:0xffff89b39cacb000
PTE: 0x21d200803 entry:0xffff89b3b0a72000
The page table after (b) is:
PML4: 0x19db5c1003 entry:0xffff899bdcd2f000
PDPE: 0x1a1cacb003 entry:0xffff89b35b5c1000
PDE: 0x1a30a72003 entry:0xffff89b39cacb000
PTE: 0x0 entry:0xffff89b3b0a72000
The page table after (c) is:
PML4: 0x19db5c1003 entry:0xffff899bdcd2f000
PDPE: 0x1a1cacb003 entry:0xffff89b35b5c1000
PDE: 0x21d200883 entry:0xffff89b39cacb000 (*)
Because the PDE entry after (b) is present, it won't be
flushed even if the iommu driver flush cache when unmap,
so the obsolete data may be preserved in cache, which
would cause the wrong translation at end.
However, we can see the PDE entry is finally switch to
2M-superpage mapping, but it does not transform
to 0x21d200883 directly:
1. PDE: 0x1a30a72003
2. __domain_mapping
dma_pte_free_pagetable
Set the PDE entry to ZERO
Set the PDE entry to 0x21d200883
So we must flush the cache after the entry switch to ZERO
to avoid the obsolete info be preserved.
Cc: David Woodhouse <dwmw2@xxxxxxxxxxxxx>
Cc: Lu Baolu <baolu.lu@xxxxxxxxxxxxxxx>
Cc: Nadav Amit <nadav.amit@xxxxxxxxx>
Cc: Alex Williamson <alex.williamson@xxxxxxxxxx>
Cc: Kevin Tian <kevin.tian@xxxxxxxxx>
Cc: Gonglei (Arei) <arei.gonglei@xxxxxxxxxx>
Fixes: 6491d4d02893 ("intel-iommu: Free old page tables before creating superpage")
Cc: <stable@xxxxxxxxxxxxxxx> # v3.0+
Link: https://lore.kernel.org/linux-iommu/670baaf8-4ff8-4e84-4be3-030b95ab5a5e@xxxxxxxxxx/
Suggested-by: Lu Baolu <baolu.lu@xxxxxxxxxxxxxxx>
Signed-off-by: Longpeng(Mike) <longpeng2@xxxxxxxxxx>
---
drivers/iommu/intel/iommu.c | 15 +++++++++++++--
1 file changed, 13 insertions(+), 2 deletions(-)
diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index ee09323..cbcb434 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -2342,9 +2342,20 @@ static inline int hardware_largepage_caps(struct dmar_domain *domain,
* removed to make room for superpage(s).
* We're adding new large pages, so make sure
* we don't remove their parent tables.
+ *
+ * We also need to flush the iotlb before creating
+ * superpage to ensure it does not perserves any
+ * obsolete info.
*/
- dma_pte_free_pagetable(domain, iov_pfn, end_pfn,
- largepage_lvl + 1);
+ if (dma_pte_present(pte)) {
+ int i;
+
+ dma_pte_free_pagetable(domain, iov_pfn, end_pfn,
+ largepage_lvl + 1);
+ for_each_domain_iommu(i, domain)
+ iommu_flush_iotlb_psi(g_iommus[i], domain,
+ iov_pfn, nr_pages, 0, 0);
Thanks for patch!
How about making the flushed page size accurate? For example,
@@ -2365,8 +2365,8 @@ __domain_mapping(struct dmar_domain *domain,
unsigned long iov_pfn,
dma_pte_free_pagetable(domain,
iov_pfn, end_pfn,
largepage_lvl + 1);
for_each_domain_iommu(i, domain)
-
iommu_flush_iotlb_psi(g_iommus[i], domain,
-
iov_pfn, nr_pages, 0, 0);
+
iommu_flush_iotlb_psi(g_iommus[i], domain, iov_pfn,
+
ALIGN_DOWN(nr_pages, lvl_pages), 0, 0);
+ }
} else {
pteval &= ~(uint64_t)DMA_PTE_LARGE_PAGE;
}
Best regards,
baolu