RE: [PATCH v4 11/18] iommu/amd: Access/Dirty bit support in IOPTEs

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



> From: Baolu Lu <baolu.lu@xxxxxxxxxxxxxxx>
> Sent: Friday, October 20, 2023 10:22 AM
> 
> On 10/19/23 7:58 PM, Joao Martins wrote:
> > On 19/10/2023 01:17, Joao Martins wrote:
> >> On 19/10/2023 00:11, Jason Gunthorpe wrote:
> >>> On Wed, Oct 18, 2023 at 09:27:08PM +0100, Joao Martins wrote:
> >>>> +static int iommu_v1_read_and_clear_dirty(struct io_pgtable_ops *ops,
> >>>> +					 unsigned long iova, size_t size,
> >>>> +					 unsigned long flags,
> >>>> +					 struct iommu_dirty_bitmap *dirty)
> >>>> +{
> >>>> +	struct amd_io_pgtable *pgtable = io_pgtable_ops_to_data(ops);
> >>>> +	unsigned long end = iova + size - 1;
> >>>> +
> >>>> +	do {
> >>>> +		unsigned long pgsize = 0;
> >>>> +		u64 *ptep, pte;
> >>>> +
> >>>> +		ptep = fetch_pte(pgtable, iova, &pgsize);
> >>>> +		if (ptep)
> >>>> +			pte = READ_ONCE(*ptep);
> >>> It is fine for now, but this is so slow for something that is such a
> >>> fast path. We are optimizing away a TLB invalidation but leaving
> >>> this???
> >>>
> >> More obvious reason is that I'm still working towards the 'faster' page
> table
> >> walker. Then map/unmap code needs to do similar lookups so thought of
> reusing
> >> the same functions as map/unmap initially. And improve it afterwards or
> when
> >> introducing the splitting.
> >>
> >>> It is a radix tree, you walk trees by retaining your position at each
> >>> level as you go (eg in a function per-level call chain or something)
> >>> then ++ is cheap. Re-searching the entire tree every time is madness.
> >> I'm aware -- I have an improved page-table walker for AMD[0] (not yet for
> Intel;
> >> still in the works),
> > Sigh, I realized that Intel's pfn_to_dma_pte() (main lookup function for
> > map/unmap/iova_to_phys) does something a little off when it finds a non-
> present
> > PTE. It allocates a page table to it; which is not OK in this specific case (I
> > would argue it's neither for iova_to_phys but well maybe I misunderstand
> the
> > expectation of that API).
> 
> pfn_to_dma_pte() doesn't allocate page for a non-present PTE if the
> target_level parameter is set to 0. See below line 932.
> 
>   913 static struct dma_pte *pfn_to_dma_pte(struct dmar_domain *domain,
>   914                           unsigned long pfn, int *target_level,
>   915                           gfp_t gfp)
>   916 {
> 
> [...]
> 
>   927         while (1) {
>   928                 void *tmp_page;
>   929
>   930                 offset = pfn_level_offset(pfn, level);
>   931                 pte = &parent[offset];
>   932                 if (!*target_level && (dma_pte_superpage(pte) ||
> !dma_pte_present(pte)))
>   933                         break;
> 
> So both iova_to_phys() and read_and_clear_dirty() are doing things
> right:
> 
> 	struct dma_pte *pte;
> 	int level = 0;
> 
> 	pte = pfn_to_dma_pte(dmar_domain, iova >> VTD_PAGE_SHIFT,
>                               &level, GFP_KERNEL);
> 	if (pte && dma_pte_present(pte)) {
> 		/* The PTE is valid, check anything you want! */
> 		... ...
> 	}
> 
> Or, I am overlooking something else?
> 

This is correct.




[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux