Re: [PATCH 7/8] iommu/arm-smmu-v3: Support IOMMU_DOMAIN_NESTED

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Aug 09, 2024 at 05:05:36PM +0100, Robin Murphy wrote:
> > +static void arm_smmu_make_nested_domain_ste(
> > +	struct arm_smmu_ste *target, struct arm_smmu_master *master,
> > +	struct arm_smmu_nested_domain *nested_domain, bool ats_enabled)
> > +{
> > +	/*
> > +	 * Userspace can request a non-valid STE through the nesting interface.
> > +	 * We relay that into a non-valid physical STE with the intention that
> > +	 * C_BAD_STE for this SID can be delivered to userspace.
> 
> NAK, that is a horrible idea. If userspace really wants to emulate that it
> can install a disabled S1 context or move the device to an empty s2 domain,
> get translation faults signalled through the normal path, and synthesise
> C_BAD_STE for itself because it knows what it's done. 

The main point is that we need the VIOMMU to become linked to the SID
though a IOMMU_DOMAIN_NESTED attachment so we know how to route events
to userspace. Some of these options won't allow that.

> Otherwise, how do you propose we would actually tell whether a real
> C_BAD_STE is due to a driver

It is the same as every other SID based event, you lookup the SID, see
there is an IOMMU_DOMAIN_NESTED attached, extract the VIOMMU and route
the whole event to the VIOMMU's event queue.

For C_BAD_STE you'd want to also check that the STE is all zeros
before doing this to detect hypervisor driver bugs. It is not perfect,
but it is not wildly unworkable either.

> Yes, userspace can spam up the event queue with translation/permission etc.
> faults, but those are at least clearly attributable and an expected part of
> normal operation; giving it free reign to spam up the event queue with what
> are currently considered *host kernel errors*, with no handling or
> mitigation, is another thing entirely.

Let's use arm_smmu_make_abort_ste():

	if (!(nested_domain->ste[0] & cpu_to_le64(STRTAB_STE_0_V))) {
		arm_smmu_make_abort_ste(target);
		return;
	}

We can look into how to transform that into a virtual C_BAD_STE as
part of the event infrastructure patches?

> > @@ -2192,6 +2255,16 @@ static void arm_smmu_tlb_inv_range_domain(unsigned long iova, size_t size,
> >   	}
> >   	__arm_smmu_tlb_inv_range(&cmd, iova, size, granule, smmu_domain);
> > +	if (smmu_domain->stage == ARM_SMMU_DOMAIN_S2 &&
> > +	    smmu_domain->nesting_parent) {
> 
> Surely nesting_parent must never be set on anything other than S2 domains in
> the first place?

Done

> > @@ -2608,13 +2681,15 @@ arm_smmu_find_master_domain(struct arm_smmu_domain *smmu_domain,
> >   			    ioasid_t ssid)
> >   {
> >   	struct arm_smmu_master_domain *master_domain;
> > +	bool nested_parent = smmu_domain->domain.type == IOMMU_DOMAIN_NESTED;
> >   	lockdep_assert_held(&smmu_domain->devices_lock);
> >   	list_for_each_entry(master_domain, &smmu_domain->devices,
> >   			    devices_elm) {
> >   		if (master_domain->master == master &&
> > -		    master_domain->ssid == ssid)
> > +		    master_domain->ssid == ssid &&
> > +		    master_domain->nested_parent == nested_parent)
> 
> As if nested_parent vs. nesting parent wasn't bad enough, 

Done - we used IOMMU_HWPT_ALLOC_NEST_PARENT so lets call them all nest_parent

> why would we need additional disambiguation here?

Oh there is mistake here, that is why it looks so weird, the
smmu_domain here is the S2 always we are supposed to be testing the
attaching domain:

--- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
+++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c
@@ -2677,11 +2677,10 @@ static void arm_smmu_disable_pasid(struct arm_smmu_master *master)
 
 static struct arm_smmu_master_domain *
 arm_smmu_find_master_domain(struct arm_smmu_domain *smmu_domain,
-                           struct arm_smmu_master *master,
-                           ioasid_t ssid)
+                           struct arm_smmu_master *master, ioasid_t ssid,
+                           bool nest_parent)
 {
        struct arm_smmu_master_domain *master_domain;
-       bool nested_parent = smmu_domain->domain.type == IOMMU_DOMAIN_NESTED;
 
        lockdep_assert_held(&smmu_domain->devices_lock);
 
@@ -2689,7 +2688,7 @@ arm_smmu_find_master_domain(struct arm_smmu_domain *smmu_domain,
                            devices_elm) {
                if (master_domain->master == master &&
                    master_domain->ssid == ssid &&
-                   master_domain->nest_parent == nested_parent)
+                   master_domain->nest_parent == nest_parent)
                        return master_domain;
        }
        return NULL;
@@ -2727,7 +2726,8 @@ static void arm_smmu_remove_master_domain(struct arm_smmu_master *master,
                return;
 
        spin_lock_irqsave(&smmu_domain->devices_lock, flags);
-       master_domain = arm_smmu_find_master_domain(smmu_domain, master, ssid);
+       master_domain = arm_smmu_find_master_domain(
+               smmu_domain, master, ssid, domain->type == IOMMU_DOMAIN_NESTED);
        if (master_domain) {
                list_del(&master_domain->devices_elm);
                kfree(master_domain);

> How could more than one attachment to the same SID:SSID exist at the
> same time?

The attachment logic puts both the new and old domain in this list
while it works on invalidating caches. This ensures we don't loose any
invalidation. We also directly put the S2 into the list when attaching
an IOMMU_DOMAIN_NESTED.

Thus, it is possible for the same S2 to be in the list twice for a
short time as switching between the S2 to an IOMMU_DOMAIN_NESTED will
cause it. They are not the same as one will have nest_parent set to do
heavier ATC invalidation.

It is an optimization to allow the naked S2 to be used as an identity
translation with less expensive ATC invalidation.

> > @@ -792,6 +803,14 @@ struct arm_smmu_domain {
> >   	u8				enforce_cache_coherency;
> >   	struct mmu_notifier		mmu_notifier;
> > +	bool				nesting_parent : 1;
> 
> Erm, please use bool consistently, or use integer bitfields consistently,
> but not a deranged mess of bool bitfields while also assigning true/false to
> full u8s... :/

I made it like this:

	struct list_head		devices;
	spinlock_t			devices_lock;
	bool				enforce_cache_coherency : 1;
	bool				nest_parent : 1;

Thanks,
Jason




[Index of Archives]     [Linux IBM ACPI]     [Linux Power Management]     [Linux Kernel]     [Linux Laptop]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Samba]     [Video 4 Linux]     [Device Mapper]     [Linux Resources]
  Powered by Linux