Re: [PATCH v5 04/13] iommu/vt-d: Add pasid replace helpers

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 2024/11/7 16:04, Baolu Lu wrote:
On 2024/11/7 15:57, Yi Liu wrote:
On 2024/11/7 14:53, Baolu Lu wrote:
On 2024/11/7 14:46, Yi Liu wrote:
On 2024/11/7 13:46, Tian, Kevin wrote:
From: Liu, Yi L <yi.l.liu@xxxxxxxxx>
Sent: Thursday, November 7, 2024 12:21 PM

On 2024/11/7 10:52, Baolu Lu wrote:
On 11/6/24 23:45, Yi Liu wrote:
+int intel_pasid_replace_first_level(struct intel_iommu *iommu,
+                    struct device *dev, pgd_t *pgd,
+                    u32 pasid, u16 did, u16 old_did,
+                    int flags)
+{
+    struct pasid_entry *pte;
+
+    if (!ecap_flts(iommu->ecap)) {
+        pr_err("No first level translation support on %s\n",
+               iommu->name);
+        return -EINVAL;
+    }
+
+    if ((flags & PASID_FLAG_FL5LP) && !cap_fl5lp_support(iommu- >cap)) {
+        pr_err("No 5-level paging support for first-level on %s\n",
+               iommu->name);
+        return -EINVAL;
+    }
+
+    spin_lock(&iommu->lock);
+    pte = intel_pasid_get_entry(dev, pasid);
+    if (!pte) {
+        spin_unlock(&iommu->lock);
+        return -ENODEV;
+    }
+
+    if (!pasid_pte_is_present(pte)) {
+        spin_unlock(&iommu->lock);
+        return -EINVAL;
+    }
+
+    WARN_ON(old_did != pasid_get_domain_id(pte));
+
+    pasid_pte_config_first_level(iommu, pte, pgd, did, flags);
+    spin_unlock(&iommu->lock);
+
+    intel_pasid_flush_present(iommu, dev, pasid, old_did, pte);
+    intel_iommu_drain_pasid_prq(dev, pasid);
+
+    return 0;
+}

pasid_pte_config_first_level() causes the pasid entry to transition from
present to non-present and then to present. In this case, calling
intel_pasid_flush_present() is not accurate, as it is only intended for
pasid entries transitioning from present to present, according to the
specification.

It's recommended to move pasid_clear_entry(pte) and
pasid_set_present(pte) out to the caller, so ...

For setup case (pasid from non-present to present):

- pasid_clear_entry(pte)
- pasid_pte_config_first_level(pte)
- pasid_set_present(pte)
- cache invalidations

For replace case (pasid from present to present)

- pasid_pte_config_first_level(pte)
- cache invalidations

The same applies to other types of setup and replace.

hmmm. Here is the reason I did it in the way of this patch:
1) pasid_clear_entry() can clear all the fields that are not supposed to
     be used by the new domain. For example, converting a nested domain to
SS
     only domain, if no pasid_clear_entry() then the FSPTR would be there.
     Although spec seems not enforce it, it might be good to clear it.
2) We don't support atomic replace yet, so the whole pasid entry transition
     is not done in one shot, so it looks to be ok to do this stepping
     transition.
3) It seems to be even worse if keep the Present bit during the transition.      The pasid entry might be broken while the Present bit indicates this is      a valid pasid entry. Say if there is in-flight DMA, the result may be
     unpredictable.

Based on the above, I chose the current way. But I admit if we are going to
support atomic replace, then we should refactor a bit. I believe at that
time we need to construct the new pasid entry first and try to exchange it to the pasid table. I can see some transition can be done in that way as we
can do atomic exchange with 128bits. thoughts? 🙂


yes 128bit cmpxchg is necessary to support atomic replacement.

Actually vt-d spec clearly says so e.g. SSPTPTR/DID must be updated
together in a present entry to not break in-flight DMA.

but... your current way (clear entry then update it) also break in- flight
DMA. So let's admit that as the 1st step it's not aimed to support
atomic replacement. With that Baolu's suggestion makes more sense
toward future extension with less refactoring required (otherwise
you should not use intel_pasid_flush_present() then the earlier
refactoring for that helper is also meaningless).

I see. The pasid entry might have some filed that is not supposed to be
used after replacement. Should we have a comment about it?

I guess all fields except SSADE and P of a pasid table entry should be
cleared in pasid_pte_config_first_level()?

perhaps we can take one more step forward. We can construct the new pte in a local variable first and then push it to the pte in the pasid table. :)


That sounds better! The entry is composed on the stack and then copied
over to the pasid table as a whole.

that's it. Like the below.

diff --git a/drivers/iommu/intel/pasid.c b/drivers/iommu/intel/pasid.c
index 6841b9892d55..0f2a926d3bd5 100644
--- a/drivers/iommu/intel/pasid.c
+++ b/drivers/iommu/intel/pasid.c
@@ -389,6 +389,50 @@ int intel_pasid_setup_first_level(struct intel_iommu *iommu,
 	return 0;
 }

+int intel_pasid_replace_first_level(struct intel_iommu *iommu,
+				    struct device *dev, pgd_t *pgd,
+				    u32 pasid, u16 did, u16 old_did,
+				    int flags)
+{
+	struct pasid_entry *pte, new_pte;
+
+	if (!ecap_flts(iommu->ecap)) {
+		pr_err("No first level translation support on %s\n",
+		       iommu->name);
+		return -EINVAL;
+	}
+
+	if ((flags & PASID_FLAG_FL5LP) && !cap_fl5lp_support(iommu->cap)) {
+		pr_err("No 5-level paging support for first-level on %s\n",
+		       iommu->name);
+		return -EINVAL;
+	}
+
+	pasid_pte_config_first_level(iommu, &new_pte, pgd, did, flags);
+
+	spin_lock(&iommu->lock);
+	pte = intel_pasid_get_entry(dev, pasid);
+	if (!pte) {
+		spin_unlock(&iommu->lock);
+		return -ENODEV;
+	}
+
+	if (!pasid_pte_is_present(pte)) {
+		spin_unlock(&iommu->lock);
+		return -EINVAL;
+	}
+
+	WARN_ON(old_did != pasid_get_domain_id(pte));
+
+	*pte = new_pte;
+	spin_unlock(&iommu->lock);
+
+	intel_pasid_flush_present(iommu, dev, pasid, old_did, pte);
+	intel_iommu_drain_pasid_prq(dev, pasid);
+
+	return 0;
+}
+
 /*
  * Set up the scalable mode pasid entry for second only translation type.
  */
@@ -456,6 +500,57 @@ int intel_pasid_setup_second_level(struct intel_iommu *iommu,
 	return 0;
 }

+int intel_pasid_replace_second_level(struct intel_iommu *iommu,
+				     struct dmar_domain *domain,
+				     struct device *dev, u16 old_did,
+				     u32 pasid)
+{
+	struct pasid_entry *pte, new_pte;
+	struct dma_pte *pgd;
+	u64 pgd_val;
+	u16 did;
+
+	/*
+	 * If hardware advertises no support for second level
+	 * translation, return directly.
+	 */
+	if (!ecap_slts(iommu->ecap)) {
+		pr_err("No second level translation support on %s\n",
+		       iommu->name);
+		return -EINVAL;
+	}
+
+	pgd = domain->pgd;
+	pgd_val = virt_to_phys(pgd);
+	did = domain_id_iommu(domain, iommu);
+
+	pasid_pte_config_second_level(iommu, &new_pte, pgd_val,
+				      domain->agaw, did,
+				      domain->dirty_tracking);
+
+	spin_lock(&iommu->lock);
+	pte = intel_pasid_get_entry(dev, pasid);
+	if (!pte) {
+		spin_unlock(&iommu->lock);
+		return -ENODEV;
+	}
+
+	if (!pasid_pte_is_present(pte)) {
+		spin_unlock(&iommu->lock);
+		return -EINVAL;
+	}
+
+	WARN_ON(old_did != pasid_get_domain_id(pte));
+
+	*pte = new_pte;
+	spin_unlock(&iommu->lock);
+
+	intel_pasid_flush_present(iommu, dev, pasid, old_did, pte);
+	intel_iommu_drain_pasid_prq(dev, pasid);
+
+	return 0;
+}
+
 /*
  * Set up dirty tracking on a second only or nested translation type.
  */
@@ -568,6 +663,38 @@ int intel_pasid_setup_pass_through(struct intel_iommu *iommu,
 	return 0;
 }

+int intel_pasid_replace_pass_through(struct intel_iommu *iommu,
+				     struct device *dev, u16 old_did,
+				     u32 pasid)
+{
+	struct pasid_entry *pte, new_pte;
+	u16 did = FLPT_DEFAULT_DID;
+
+	pasid_pte_config_pass_through(iommu, &new_pte, did);
+
+	spin_lock(&iommu->lock);
+	pte = intel_pasid_get_entry(dev, pasid);
+	if (!pte) {
+		spin_unlock(&iommu->lock);
+		return -ENODEV;
+	}
+
+	if (!pasid_pte_is_present(pte)) {
+		spin_unlock(&iommu->lock);
+		return -EINVAL;
+	}
+
+	WARN_ON(old_did != pasid_get_domain_id(pte));
+
+	*pte = new_pte;
+	spin_unlock(&iommu->lock);
+
+	intel_pasid_flush_present(iommu, dev, pasid, old_did, pte);
+	intel_iommu_drain_pasid_prq(dev, pasid);
+
+	return 0;
+}
+
 /*
  * Set the page snoop control for a pasid entry which has been set up.
  */
@@ -698,6 +825,69 @@ int intel_pasid_setup_nested(struct intel_iommu *iommu, struct device *dev,
 	return 0;
 }

+int intel_pasid_replace_nested(struct intel_iommu *iommu,
+			       struct device *dev, u32 pasid,
+			       u16 old_did, struct dmar_domain *domain)
+{
+	struct iommu_hwpt_vtd_s1 *s1_cfg = &domain->s1_cfg;
+	struct dmar_domain *s2_domain = domain->s2_domain;
+	u16 did = domain_id_iommu(domain, iommu);
+	struct pasid_entry *pte, new_pte;
+
+	/* Address width should match the address width supported by hardware */
+	switch (s1_cfg->addr_width) {
+	case ADDR_WIDTH_4LEVEL:
+		break;
+	case ADDR_WIDTH_5LEVEL:
+		if (!cap_fl5lp_support(iommu->cap)) {
+			dev_err_ratelimited(dev,
+					    "5-level paging not supported\n");
+			return -EINVAL;
+		}
+		break;
+	default:
+		dev_err_ratelimited(dev, "Invalid stage-1 address width %d\n",
+				    s1_cfg->addr_width);
+		return -EINVAL;
+	}
+
+	if ((s1_cfg->flags & IOMMU_VTD_S1_SRE) && !ecap_srs(iommu->ecap)) {
+		pr_err_ratelimited("No supervisor request support on %s\n",
+				   iommu->name);
+		return -EINVAL;
+	}
+
+	if ((s1_cfg->flags & IOMMU_VTD_S1_EAFE) && !ecap_eafs(iommu->ecap)) {
+		pr_err_ratelimited("No extended access flag support on %s\n",
+				   iommu->name);
+		return -EINVAL;
+	}
+
+	pasid_pte_config_nestd(iommu, &new_pte, s1_cfg, s2_domain, did);
+
+	spin_lock(&iommu->lock);
+	pte = intel_pasid_get_entry(dev, pasid);
+	if (!pte) {
+		spin_unlock(&iommu->lock);
+		return -ENODEV;
+	}
+
+	if (!pasid_pte_is_present(pte)) {
+		spin_unlock(&iommu->lock);
+		return -EINVAL;
+	}
+
+	WARN_ON(old_did != pasid_get_domain_id(pte));
+
+	*pte = new_pte;
+	spin_unlock(&iommu->lock);
+
+	intel_pasid_flush_present(iommu, dev, pasid, old_did, pte);
+	intel_iommu_drain_pasid_prq(dev, pasid);
+
+	return 0;
+}
+
 /*
  * Interfaces to setup or teardown a pasid table to the scalable-mode
  * context table entry:
diff --git a/drivers/iommu/intel/pasid.h b/drivers/iommu/intel/pasid.h
index dde6d3ba5ae0..082f4fe20216 100644
--- a/drivers/iommu/intel/pasid.h
+++ b/drivers/iommu/intel/pasid.h
@@ -303,6 +296,21 @@ int intel_pasid_setup_pass_through(struct intel_iommu *iommu,
 				   struct device *dev, u32 pasid);
 int intel_pasid_setup_nested(struct intel_iommu *iommu, struct device *dev,
 			     u32 pasid, struct dmar_domain *domain);
+int intel_pasid_replace_first_level(struct intel_iommu *iommu,
+				    struct device *dev, pgd_t *pgd,
+				    u32 pasid, u16 did, u16 old_did,
+				    int flags);
+int intel_pasid_replace_second_level(struct intel_iommu *iommu,
+				     struct dmar_domain *domain,
+				     struct device *dev, u16 old_did,
+				     u32 pasid);
+int intel_pasid_replace_pass_through(struct intel_iommu *iommu,
+				     struct device *dev, u16 old_did,
+				     u32 pasid);
+int intel_pasid_replace_nested(struct intel_iommu *iommu,
+			       struct device *dev, u32 pasid,
+			       u16 old_did, struct dmar_domain *domain);
+
 void intel_pasid_tear_down_entry(struct intel_iommu *iommu,
 				 struct device *dev, u32 pasid,
 				 bool fault_ignore);

With these two issues addressed, do u mind sending a new version? Let's
try to catch the pull request window. There are other series (iommufd
and vfio) about user space PASID support depending on this.

yes, for sure.

--
Regards,
Yi Liu




[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux