On Wed, May 1, 2024 at 7:56 AM Jason Gunthorpe <jgg@xxxxxxxx> wrote: > > On Tue, Apr 30, 2024 at 01:01:57PM -0700, Tomasz Jeznach wrote: > > > +#define iommu_domain_to_riscv(iommu_domain) \ > > + container_of(iommu_domain, struct riscv_iommu_domain, domain) > > + > > +#define dev_to_domain(dev) \ > > + iommu_domain_to_riscv(dev_iommu_priv_get(dev)) > > Please use the priv properly and put a struct around it, you'll > certainly need this eventually to do the rest of the advanced > features. > Done. Yes, indeed, I was going to introduce proper struct in follow up patches anyway. Pulled this change sooner. > > +static void riscv_iommu_bond_unlink(struct riscv_iommu_domain *domain, struct device *dev) > > +{ > > + struct riscv_iommu_bond *bond, *found = NULL; > > + unsigned long flags; > > + > > + if (!domain) > > + return; > > + > > + spin_lock_irqsave(&domain->lock, flags); > > This is never locked from an irq, you don't need to use the irqsave > variations. > Good point. done in v4. > > + list_for_each_entry_rcu(bond, &domain->bonds, list) { > > + if (bond->dev == dev) { > > + list_del_rcu(&bond->list); > > + found = bond; > > + } > > + } > > + spin_unlock_irqrestore(&domain->lock, flags); > > + > > + /* Release and wait for all read-rcu critical sections have completed. */ > > + kfree_rcu(found, rcu); > > + synchronize_rcu(); > > Please no, synchronize_rcu() on a path like this is not so > reasonable.. Also you don't need kfree_rcu() if you write it like this. > > This still looks better to do what I said before, put the iommu not > the dev in the bond struct. > > I was trying not to duplicate data in bond struct and use whatever is available to be referenced from dev pointer (eg iommu / ids / private iommu dev data). If I'm reading core iommu code correctly, device pointer and iommu pointers should be valid between _probe_device and _release_device calls. I've moved synchronize_rcu out of the domain attach path to _release_device, LMK if that would be ok for now. I'll have a second another to rework other patches to avoid storing dev pointers at all. > > +static struct iommu_domain *riscv_iommu_alloc_paging_domain(struct device *dev) > > +{ > > + struct riscv_iommu_domain *domain; > > + struct riscv_iommu_device *iommu; > > + > > + iommu = dev ? dev_to_iommu(dev) : NULL; > > + domain = kzalloc(sizeof(*domain), GFP_KERNEL); > > + if (!domain) > > + return ERR_PTR(-ENOMEM); > > + > > + INIT_LIST_HEAD_RCU(&domain->bonds); > > + spin_lock_init(&domain->lock); > > + domain->numa_node = NUMA_NO_NODE; > > + > > + /* > > + * Follow system address translation mode. > > + * RISC-V IOMMU ATP mode values match RISC-V CPU SATP mode values. > > + */ > > + domain->pgd_mode = satp_mode >> SATP_MODE_SHIFT; > > This seems really strange, the iommu paging domains should be > unrelated to what the CPU is doing. There is no connection between > these two concepts. > > Just pick a size that the iommu supports. > > The number of radix levels is a tunable alot of iommus have that we > haven't really exposed to anything else yet. > Makes sense. I've left an option to pick mode from MMU for cases where dev/iommu is not known at allocation time (with iommu_domain_alloc()). I'd guess it's reasonable to assume IOMMU supported page modes will match MMU. > > + /* > > + * Note: RISC-V Privilege spec mandates that virtual addresses > > + * need to be sign-extended, so if (VA_BITS - 1) is set, all > > + * bits >= VA_BITS need to also be set or else we'll get a > > + * page fault. However the code that creates the mappings > > + * above us (e.g. iommu_dma_alloc_iova()) won't do that for us > > + * for now, so we'll end up with invalid virtual addresses > > + * to map. As a workaround until we get this sorted out > > + * limit the available virtual addresses to VA_BITS - 1. > > + */ > > + domain->domain.geometry.aperture_start = 0; > > + domain->domain.geometry.aperture_end = DMA_BIT_MASK(VA_BITS - 1); > > + domain->domain.geometry.force_aperture = true; > > Yikes.. This is probably the best solution long term anyhow, unless > you need to use the last page in VFIO for some reason. > > > static int riscv_iommu_device_domain_type(struct device *dev) > > { > > - return IOMMU_DOMAIN_IDENTITY; > > + struct riscv_iommu_device *iommu = dev_to_iommu(dev); > > + > > + if (iommu->ddt_mode == RISCV_IOMMU_DDTP_MODE_BARE) > > + return IOMMU_DOMAIN_IDENTITY; > > + > > Is there even a point of binding an iommu driver if the HW can't > support a DDT table? Just return -ENODEV from probe_device? > > Logically a iommu block that can't decode the RID has no association > at all with a Linux struct device :) > Done. Good point ;) Thanks for review, - Tomasz > Jason