On 4/29/22 10:03, Tian, Kevin wrote: >> From: Joao Martins <joao.m.martins@xxxxxxxxxx> >> Sent: Friday, April 29, 2022 5:10 AM >> >> IOMMU advertises Access/Dirty bits if the extended capability >> DMAR register reports it (ECAP, mnemonic ECAP.SSADS). The first >> stage table, though, has not bit for advertising, unless referenced via > > first-stage is compatible to CPU page table thus a/d bit support is > implied. Ah! That clarifies something which the manual wasn't quite so clear about :) I mean I understood what you just said from reading the manual but was not was /really 100% sure/ > But for dirty tracking I'm I'm fine with only supporting it > with second-stage as first-stage will be used only for guest in the > nesting case (though in concept first-stage could also be used for > IOVA when nesting is disabled there is no plan to do so on Intel > platforms). > Cool. >> a scalable-mode PASID Entry. Relevant Intel IOMMU SDM ref for first stage >> table "3.6.2 Accessed, Extended Accessed, and Dirty Flags" and second >> stage table "3.7.2 Accessed and Dirty Flags". >> >> To enable it scalable-mode for the second-stage table is required, >> solimit the use of dirty-bit to scalable-mode and discarding the >> first stage configured DMAR domains. To use SSADS, we set a bit in > > above is inaccurate. dirty bit is only supported in scalable mode so > there is no limit per se. > OK. >> the scalable-mode PASID Table entry, by setting bit 9 (SSADE). When > > "To use SSADS, we set bit 9 (SSADE) in the scalable-mode PASID table > entry" > /me nods >> doing so, flush all iommu caches. Relevant SDM refs: >> >> "3.7.2 Accessed and Dirty Flags" >> "6.5.3.3 Guidance to Software for Invalidations, >> Table 23. Guidance to Software for Invalidations" >> >> Dirty bit on the PTE is located in the same location (bit 9). The IOTLB > > I'm not sure what information 'same location' here tries to convey... > The PASID table *and* the dirty bit are both on bit 9. (On AMD for example it's on different bits) That's what 'location' meant, not the actual storage of those bits of course :) >> caches some attributes when SSADE is enabled and dirty-ness information, > > be direct that the dirty bit is cached in IOTLB thus any change of that > bit requires flushing IOTLB > OK, will make it clearer. >> so we also need to flush IOTLB to make sure IOMMU attempts to set the >> dirty bit again. Relevant manuals over the hardware translation is >> chapter 6 with some special mention to: >> >> "6.2.3.1 Scalable-Mode PASID-Table Entry Programming Considerations" >> "6.2.4 IOTLB" >> >> Signed-off-by: Joao Martins <joao.m.martins@xxxxxxxxxx> >> --- >> Shouldn't probably be as aggresive as to flush all; needs >> checking with hardware (and invalidations guidance) as to understand >> what exactly needs flush. > > yes, definitely not required to flush all. You can follow table 23 > for software guidance for invalidations. > /me nods >> --- >> drivers/iommu/intel/iommu.c | 109 >> ++++++++++++++++++++++++++++++++++++ >> drivers/iommu/intel/pasid.c | 76 +++++++++++++++++++++++++ >> drivers/iommu/intel/pasid.h | 7 +++ >> include/linux/intel-iommu.h | 14 +++++ >> 4 files changed, 206 insertions(+) >> >> diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c >> index ce33f85c72ab..92af43f27241 100644 >> --- a/drivers/iommu/intel/iommu.c >> +++ b/drivers/iommu/intel/iommu.c >> @@ -5089,6 +5089,113 @@ static void intel_iommu_iotlb_sync_map(struct >> iommu_domain *domain, >> } >> } >> >> +static int intel_iommu_set_dirty_tracking(struct iommu_domain *domain, >> + bool enable) >> +{ >> + struct dmar_domain *dmar_domain = to_dmar_domain(domain); >> + struct device_domain_info *info; >> + unsigned long flags; >> + int ret = -EINVAL; >> + >> + spin_lock_irqsave(&device_domain_lock, flags); >> + if (list_empty(&dmar_domain->devices)) { >> + spin_unlock_irqrestore(&device_domain_lock, flags); >> + return ret; >> + } > > or return success here and just don't set any dirty bitmap in > read_and_clear_dirty()? > Yeap. > btw I think every iommu driver needs to record the tracking status > so later if a device which doesn't claim dirty tracking support is > attached to a domain which already has dirty_tracking enabled > then the attach request should be rejected. once the capability > uAPI is introduced. > Good point. >> + >> + list_for_each_entry(info, &dmar_domain->devices, link) { >> + if (!info->dev || (info->domain != dmar_domain)) >> + continue; > > why would there be a device linked under a dmar_domain but its > internal domain pointer doesn't point to that dmar_domain? > I think I got a little confused when using this list with something else. Let me fix that. >> + >> + /* Dirty tracking is second-stage level SM only */ >> + if ((info->domain && domain_use_first_level(info->domain)) >> || >> + !ecap_slads(info->iommu->ecap) || >> + !sm_supported(info->iommu) || !intel_iommu_sm) { > > sm_supported() already covers the check on intel_iommu_sm. > /me nods, removed it. >> + ret = -EOPNOTSUPP; >> + continue; >> + } >> + >> + ret = intel_pasid_setup_dirty_tracking(info->iommu, info- >>> domain, >> + info->dev, >> PASID_RID2PASID, >> + enable); >> + if (ret) >> + break; >> + } >> + spin_unlock_irqrestore(&device_domain_lock, flags); >> + >> + /* >> + * We need to flush context TLB and IOTLB with any cached >> translations >> + * to force the incoming DMA requests for have its IOTLB entries >> tagged >> + * with A/D bits >> + */ >> + intel_flush_iotlb_all(domain); >> + return ret; >> +} >> + >> +static int intel_iommu_get_dirty_tracking(struct iommu_domain *domain) >> +{ >> + struct dmar_domain *dmar_domain = to_dmar_domain(domain); >> + struct device_domain_info *info; >> + unsigned long flags; >> + int ret = 0; >> + >> + spin_lock_irqsave(&device_domain_lock, flags); >> + list_for_each_entry(info, &dmar_domain->devices, link) { >> + if (!info->dev || (info->domain != dmar_domain)) >> + continue; >> + >> + /* Dirty tracking is second-stage level SM only */ >> + if ((info->domain && domain_use_first_level(info->domain)) >> || >> + !ecap_slads(info->iommu->ecap) || >> + !sm_supported(info->iommu) || !intel_iommu_sm) { >> + ret = -EOPNOTSUPP; >> + continue; >> + } >> + >> + if (!intel_pasid_dirty_tracking_enabled(info->iommu, info- >>> domain, >> + info->dev, PASID_RID2PASID)) >> { >> + ret = -EINVAL; >> + break; >> + } >> + } >> + spin_unlock_irqrestore(&device_domain_lock, flags); >> + >> + return ret; >> +} > > All above can be translated to a single status bit in dmar_domain. > Yes. I wrestled a bit over making this a domains_op, which would then tie in into a tracking bit in the iommu domain (or driver representation of it). Which is why you see a get_dirty_tracking() helper here and in amd IOMMU counterpart. But I figured I would tie in into the capability part. >> + >> +static int intel_iommu_read_and_clear_dirty(struct iommu_domain >> *domain, >> + unsigned long iova, size_t size, >> + struct iommu_dirty_bitmap *dirty) >> +{ >> + struct dmar_domain *dmar_domain = to_dmar_domain(domain); >> + unsigned long end = iova + size - 1; >> + unsigned long pgsize; >> + int ret; >> + >> + ret = intel_iommu_get_dirty_tracking(domain); >> + if (ret) >> + return ret; >> + >> + do { >> + struct dma_pte *pte; >> + int lvl = 0; >> + >> + pte = pfn_to_dma_pte(dmar_domain, iova >> >> VTD_PAGE_SHIFT, &lvl); > > it's probably fine as the starting point but moving forward this could > be further optimized so there is no need to walk from L4->L3->L2->L1 > for every pte. > Yes. This is actually part of my TODO on Performance (in the cover letter). Both AMD and Intel could use its own dedicated lookup.