On Fri, Aug 09, 2024 at 05:05:36PM +0100, Robin Murphy wrote: > > +static void arm_smmu_make_nested_domain_ste( > > + struct arm_smmu_ste *target, struct arm_smmu_master *master, > > + struct arm_smmu_nested_domain *nested_domain, bool ats_enabled) > > +{ > > + /* > > + * Userspace can request a non-valid STE through the nesting interface. > > + * We relay that into a non-valid physical STE with the intention that > > + * C_BAD_STE for this SID can be delivered to userspace. > > NAK, that is a horrible idea. If userspace really wants to emulate that it > can install a disabled S1 context or move the device to an empty s2 domain, > get translation faults signalled through the normal path, and synthesise > C_BAD_STE for itself because it knows what it's done. The main point is that we need the VIOMMU to become linked to the SID though a IOMMU_DOMAIN_NESTED attachment so we know how to route events to userspace. Some of these options won't allow that. > Otherwise, how do you propose we would actually tell whether a real > C_BAD_STE is due to a driver It is the same as every other SID based event, you lookup the SID, see there is an IOMMU_DOMAIN_NESTED attached, extract the VIOMMU and route the whole event to the VIOMMU's event queue. For C_BAD_STE you'd want to also check that the STE is all zeros before doing this to detect hypervisor driver bugs. It is not perfect, but it is not wildly unworkable either. > Yes, userspace can spam up the event queue with translation/permission etc. > faults, but those are at least clearly attributable and an expected part of > normal operation; giving it free reign to spam up the event queue with what > are currently considered *host kernel errors*, with no handling or > mitigation, is another thing entirely. Let's use arm_smmu_make_abort_ste(): if (!(nested_domain->ste[0] & cpu_to_le64(STRTAB_STE_0_V))) { arm_smmu_make_abort_ste(target); return; } We can look into how to transform that into a virtual C_BAD_STE as part of the event infrastructure patches? > > @@ -2192,6 +2255,16 @@ static void arm_smmu_tlb_inv_range_domain(unsigned long iova, size_t size, > > } > > __arm_smmu_tlb_inv_range(&cmd, iova, size, granule, smmu_domain); > > + if (smmu_domain->stage == ARM_SMMU_DOMAIN_S2 && > > + smmu_domain->nesting_parent) { > > Surely nesting_parent must never be set on anything other than S2 domains in > the first place? Done > > @@ -2608,13 +2681,15 @@ arm_smmu_find_master_domain(struct arm_smmu_domain *smmu_domain, > > ioasid_t ssid) > > { > > struct arm_smmu_master_domain *master_domain; > > + bool nested_parent = smmu_domain->domain.type == IOMMU_DOMAIN_NESTED; > > lockdep_assert_held(&smmu_domain->devices_lock); > > list_for_each_entry(master_domain, &smmu_domain->devices, > > devices_elm) { > > if (master_domain->master == master && > > - master_domain->ssid == ssid) > > + master_domain->ssid == ssid && > > + master_domain->nested_parent == nested_parent) > > As if nested_parent vs. nesting parent wasn't bad enough, Done - we used IOMMU_HWPT_ALLOC_NEST_PARENT so lets call them all nest_parent > why would we need additional disambiguation here? Oh there is mistake here, that is why it looks so weird, the smmu_domain here is the S2 always we are supposed to be testing the attaching domain: --- a/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c +++ b/drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.c @@ -2677,11 +2677,10 @@ static void arm_smmu_disable_pasid(struct arm_smmu_master *master) static struct arm_smmu_master_domain * arm_smmu_find_master_domain(struct arm_smmu_domain *smmu_domain, - struct arm_smmu_master *master, - ioasid_t ssid) + struct arm_smmu_master *master, ioasid_t ssid, + bool nest_parent) { struct arm_smmu_master_domain *master_domain; - bool nested_parent = smmu_domain->domain.type == IOMMU_DOMAIN_NESTED; lockdep_assert_held(&smmu_domain->devices_lock); @@ -2689,7 +2688,7 @@ arm_smmu_find_master_domain(struct arm_smmu_domain *smmu_domain, devices_elm) { if (master_domain->master == master && master_domain->ssid == ssid && - master_domain->nest_parent == nested_parent) + master_domain->nest_parent == nest_parent) return master_domain; } return NULL; @@ -2727,7 +2726,8 @@ static void arm_smmu_remove_master_domain(struct arm_smmu_master *master, return; spin_lock_irqsave(&smmu_domain->devices_lock, flags); - master_domain = arm_smmu_find_master_domain(smmu_domain, master, ssid); + master_domain = arm_smmu_find_master_domain( + smmu_domain, master, ssid, domain->type == IOMMU_DOMAIN_NESTED); if (master_domain) { list_del(&master_domain->devices_elm); kfree(master_domain); > How could more than one attachment to the same SID:SSID exist at the > same time? The attachment logic puts both the new and old domain in this list while it works on invalidating caches. This ensures we don't loose any invalidation. We also directly put the S2 into the list when attaching an IOMMU_DOMAIN_NESTED. Thus, it is possible for the same S2 to be in the list twice for a short time as switching between the S2 to an IOMMU_DOMAIN_NESTED will cause it. They are not the same as one will have nest_parent set to do heavier ATC invalidation. It is an optimization to allow the naked S2 to be used as an identity translation with less expensive ATC invalidation. > > @@ -792,6 +803,14 @@ struct arm_smmu_domain { > > u8 enforce_cache_coherency; > > struct mmu_notifier mmu_notifier; > > + bool nesting_parent : 1; > > Erm, please use bool consistently, or use integer bitfields consistently, > but not a deranged mess of bool bitfields while also assigning true/false to > full u8s... :/ I made it like this: struct list_head devices; spinlock_t devices_lock; bool enforce_cache_coherency : 1; bool nest_parent : 1; Thanks, Jason