Re: [PATCH v4 11/18] iommu/amd: Access/Dirty bit support in IOPTEs

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 10/19/23 7:58 PM, Joao Martins wrote:
On 19/10/2023 01:17, Joao Martins wrote:
On 19/10/2023 00:11, Jason Gunthorpe wrote:
On Wed, Oct 18, 2023 at 09:27:08PM +0100, Joao Martins wrote:
+static int iommu_v1_read_and_clear_dirty(struct io_pgtable_ops *ops,
+					 unsigned long iova, size_t size,
+					 unsigned long flags,
+					 struct iommu_dirty_bitmap *dirty)
+{
+	struct amd_io_pgtable *pgtable = io_pgtable_ops_to_data(ops);
+	unsigned long end = iova + size - 1;
+
+	do {
+		unsigned long pgsize = 0;
+		u64 *ptep, pte;
+
+		ptep = fetch_pte(pgtable, iova, &pgsize);
+		if (ptep)
+			pte = READ_ONCE(*ptep);
It is fine for now, but this is so slow for something that is such a
fast path. We are optimizing away a TLB invalidation but leaving
this???

More obvious reason is that I'm still working towards the 'faster' page table
walker. Then map/unmap code needs to do similar lookups so thought of reusing
the same functions as map/unmap initially. And improve it afterwards or when
introducing the splitting.

It is a radix tree, you walk trees by retaining your position at each
level as you go (eg in a function per-level call chain or something)
then ++ is cheap. Re-searching the entire tree every time is madness.
I'm aware -- I have an improved page-table walker for AMD[0] (not yet for Intel;
still in the works),
Sigh, I realized that Intel's pfn_to_dma_pte() (main lookup function for
map/unmap/iova_to_phys) does something a little off when it finds a non-present
PTE. It allocates a page table to it; which is not OK in this specific case (I
would argue it's neither for iova_to_phys but well maybe I misunderstand the
expectation of that API).

pfn_to_dma_pte() doesn't allocate page for a non-present PTE if the
target_level parameter is set to 0. See below line 932.

 913 static struct dma_pte *pfn_to_dma_pte(struct dmar_domain *domain,
 914                           unsigned long pfn, int *target_level,
 915                           gfp_t gfp)
 916 {

[...]

 927         while (1) {
 928                 void *tmp_page;
 929
 930                 offset = pfn_level_offset(pfn, level);
 931                 pte = &parent[offset];
932 if (!*target_level && (dma_pte_superpage(pte) || !dma_pte_present(pte)))
 933                         break;

So both iova_to_phys() and read_and_clear_dirty() are doing things
right:

	struct dma_pte *pte;
	int level = 0;

	pte = pfn_to_dma_pte(dmar_domain, iova >> VTD_PAGE_SHIFT,
                             &level, GFP_KERNEL);
	if (pte && dma_pte_present(pte)) {
		/* The PTE is valid, check anything you want! */
		... ...
	}

Or, I am overlooking something else?

Best regards,
baolu



[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux