Re: [PATCH v4 11/18] iommu/amd: Access/Dirty bit support in IOPTEs

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 20/10/2023 03:21, Baolu Lu wrote:
> On 10/19/23 7:58 PM, Joao Martins wrote:
>> On 19/10/2023 01:17, Joao Martins wrote:
>>> On 19/10/2023 00:11, Jason Gunthorpe wrote:
>>>> On Wed, Oct 18, 2023 at 09:27:08PM +0100, Joao Martins wrote:
>>>>> +static int iommu_v1_read_and_clear_dirty(struct io_pgtable_ops *ops,
>>>>> +                     unsigned long iova, size_t size,
>>>>> +                     unsigned long flags,
>>>>> +                     struct iommu_dirty_bitmap *dirty)
>>>>> +{
>>>>> +    struct amd_io_pgtable *pgtable = io_pgtable_ops_to_data(ops);
>>>>> +    unsigned long end = iova + size - 1;
>>>>> +
>>>>> +    do {
>>>>> +        unsigned long pgsize = 0;
>>>>> +        u64 *ptep, pte;
>>>>> +
>>>>> +        ptep = fetch_pte(pgtable, iova, &pgsize);
>>>>> +        if (ptep)
>>>>> +            pte = READ_ONCE(*ptep);
>>>> It is fine for now, but this is so slow for something that is such a
>>>> fast path. We are optimizing away a TLB invalidation but leaving
>>>> this???
>>>>
>>> More obvious reason is that I'm still working towards the 'faster' page table
>>> walker. Then map/unmap code needs to do similar lookups so thought of reusing
>>> the same functions as map/unmap initially. And improve it afterwards or when
>>> introducing the splitting.
>>>
>>>> It is a radix tree, you walk trees by retaining your position at each
>>>> level as you go (eg in a function per-level call chain or something)
>>>> then ++ is cheap. Re-searching the entire tree every time is madness.
>>> I'm aware -- I have an improved page-table walker for AMD[0] (not yet for Intel;
>>> still in the works),
>> Sigh, I realized that Intel's pfn_to_dma_pte() (main lookup function for
>> map/unmap/iova_to_phys) does something a little off when it finds a non-present
>> PTE. It allocates a page table to it; which is not OK in this specific case (I
>> would argue it's neither for iova_to_phys but well maybe I misunderstand the
>> expectation of that API).
> 
> pfn_to_dma_pte() doesn't allocate page for a non-present PTE if the
> target_level parameter is set to 0. See below line 932.
> 
>  913 static struct dma_pte *pfn_to_dma_pte(struct dmar_domain *domain,
>  914                           unsigned long pfn, int *target_level,
>  915                           gfp_t gfp)
>  916 {
> 
> [...]
> 
>  927         while (1) {
>  928                 void *tmp_page;
>  929
>  930                 offset = pfn_level_offset(pfn, level);
>  931                 pte = &parent[offset];
>  932                 if (!*target_level && (dma_pte_superpage(pte) ||
> !dma_pte_present(pte)))
>  933                         break;
> 
> So both iova_to_phys() and read_and_clear_dirty() are doing things
> right:
> 
>     struct dma_pte *pte;
>     int level = 0;
> 
>     pte = pfn_to_dma_pte(dmar_domain, iova >> VTD_PAGE_SHIFT,
>                              &level, GFP_KERNEL);
>     if (pte && dma_pte_present(pte)) {
>         /* The PTE is valid, check anything you want! */
>         ... ...
>     }
> 
> Or, I am overlooking something else?

You're right, thanks for the keeping me straight -- I was already doing the
right thing. I've forgotten about it in the midst of the other code -- Probably
worth a comment in the caller to make it obvious.

	Joao



[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux