On Tue, Apr 30, 2024 at 01:01:57PM -0700, Tomasz Jeznach wrote: > +#define iommu_domain_to_riscv(iommu_domain) \ > + container_of(iommu_domain, struct riscv_iommu_domain, domain) > + > +#define dev_to_domain(dev) \ > + iommu_domain_to_riscv(dev_iommu_priv_get(dev)) Please use the priv properly and put a struct around it, you'll certainly need this eventually to do the rest of the advanced features. > +static void riscv_iommu_bond_unlink(struct riscv_iommu_domain *domain, struct device *dev) > +{ > + struct riscv_iommu_bond *bond, *found = NULL; > + unsigned long flags; > + > + if (!domain) > + return; > + > + spin_lock_irqsave(&domain->lock, flags); This is never locked from an irq, you don't need to use the irqsave variations. > + list_for_each_entry_rcu(bond, &domain->bonds, list) { > + if (bond->dev == dev) { > + list_del_rcu(&bond->list); > + found = bond; > + } > + } > + spin_unlock_irqrestore(&domain->lock, flags); > + > + /* Release and wait for all read-rcu critical sections have completed. */ > + kfree_rcu(found, rcu); > + synchronize_rcu(); Please no, synchronize_rcu() on a path like this is not so reasonable.. Also you don't need kfree_rcu() if you write it like this. This still looks better to do what I said before, put the iommu not the dev in the bond struct. > +static struct iommu_domain *riscv_iommu_alloc_paging_domain(struct device *dev) > +{ > + struct riscv_iommu_domain *domain; > + struct riscv_iommu_device *iommu; > + > + iommu = dev ? dev_to_iommu(dev) : NULL; > + domain = kzalloc(sizeof(*domain), GFP_KERNEL); > + if (!domain) > + return ERR_PTR(-ENOMEM); > + > + INIT_LIST_HEAD_RCU(&domain->bonds); > + spin_lock_init(&domain->lock); > + domain->numa_node = NUMA_NO_NODE; > + > + /* > + * Follow system address translation mode. > + * RISC-V IOMMU ATP mode values match RISC-V CPU SATP mode values. > + */ > + domain->pgd_mode = satp_mode >> SATP_MODE_SHIFT; This seems really strange, the iommu paging domains should be unrelated to what the CPU is doing. There is no connection between these two concepts. Just pick a size that the iommu supports. The number of radix levels is a tunable alot of iommus have that we haven't really exposed to anything else yet. > + /* > + * Note: RISC-V Privilege spec mandates that virtual addresses > + * need to be sign-extended, so if (VA_BITS - 1) is set, all > + * bits >= VA_BITS need to also be set or else we'll get a > + * page fault. However the code that creates the mappings > + * above us (e.g. iommu_dma_alloc_iova()) won't do that for us > + * for now, so we'll end up with invalid virtual addresses > + * to map. As a workaround until we get this sorted out > + * limit the available virtual addresses to VA_BITS - 1. > + */ > + domain->domain.geometry.aperture_start = 0; > + domain->domain.geometry.aperture_end = DMA_BIT_MASK(VA_BITS - 1); > + domain->domain.geometry.force_aperture = true; Yikes.. This is probably the best solution long term anyhow, unless you need to use the last page in VFIO for some reason. > static int riscv_iommu_device_domain_type(struct device *dev) > { > - return IOMMU_DOMAIN_IDENTITY; > + struct riscv_iommu_device *iommu = dev_to_iommu(dev); > + > + if (iommu->ddt_mode == RISCV_IOMMU_DDTP_MODE_BARE) > + return IOMMU_DOMAIN_IDENTITY; > + Is there even a point of binding an iommu driver if the HW can't support a DDT table? Just return -ENODEV from probe_device? Logically a iommu block that can't decode the RID has no association at all with a Linux struct device :) Jason