Hi Robin, On Mon, Jan 27, 2020 at 07:01:02PM +0000, Robin Murphy wrote: > > > > +static void sun50i_iommu_domain_free(struct iommu_domain *domain) > > > > +{ > > > > + struct sun50i_iommu_domain *sun50i_domain = to_sun50i_domain(domain); > > > > + struct sun50i_iommu *iommu = sun50i_domain->iommu; > > > > + unsigned long flags; > > > > + int i; > > > > + > > > > + spin_lock_irqsave(&sun50i_domain->dt_lock, flags); > > > > + > > > > + for (i = 0; i < NUM_DT_ENTRIES; i++) { > > > > + phys_addr_t pt_phys; > > > > + u32 *page_table; > > > > + u32 *dte_addr; > > > > + u32 dte; > > > > + > > > > + dte_addr = &sun50i_domain->dt[i]; > > > > + dte = *dte_addr; > > > > + if (!sun50i_dte_is_pt_valid(dte)) > > > > + continue; > > > > + > > > > + memset(dte_addr, 0, sizeof(*dte_addr)); > > > > + sun50i_table_flush(sun50i_domain, virt_to_phys(dte_addr), 1); > > > > > > This shouldn't be necessary - freeing a domain while it's still live is an > > > incredibly very wrong thing to do, so the hardware should have already been > > > programmed to no longer walk this table by this point. > > > > We never "garbage collect" and remove the dte for the page table we > > don't use anymore elsewhere though, so couldn't we end up in a > > situation where we don't have a page table (because it has been freed) > > at the other end of our dte, but the IOMMU doesn't know about it since > > we never flushed? > > Let me reiterate: at the point of freeing, the assumption is that this > domain should be long dissociated from the hardware. Any actual invalidation > should have already happened at the point the TTB was disabled or pointed to > some other domain, therefore fiddling with pagetable pages just before you > free them back to the kernel is just pointless busywork. > > If the TTB *was* still live here, then as soon as you call free_pages() > below the DT memory could get reallocated by someone else and filled with > data that happens to look like valid pagetables, so even if you invalidate > the TLBs the hardware could just immediately go walk that data and refill > them with nonsense, thus any pretence at invalidation is in vain anyway. Thanks, that makes a lot of sense. > The fly in the soup, however, is that default domains appear to be lacking > proper detach notifications (I hadn't consciously noticed that before), so > if you've been looking at the iommu_group_release() path it might have given > the wrong impression. So what might be justifiable here is to check if the > domain being freed is the one currently active in hardware, and if so > perform a detach (i.e. disable the TTB and invalidate everything) first, > then free everything as normal. Or just handwave that we essentially never > free a default domain anyway so it's OK to just assume that we're not > freeing anything live. I'm still a bit unsure as of what to do exactly here. I haven't found a hook that would be called when a given domain doesn't have any devices attached to it. Or did you mean that I should just create a separate function, not part of the IOMMU ops? > > > > + > > > > + if (iommu->domain == domain) > > > > + return 0; > > > > + > > > > + if (iommu->domain) > > > > + sun50i_iommu_detach_device(iommu->domain, dev); > > > > + > > > > + iommu->domain = domain; > > > > + sun50i_domain->iommu = iommu; > > > > + > > > > + return pm_runtime_get_sync(iommu->dev); > > > > > > Deferring all the actual hardware pogramming to the suspend/resume hooks is > > > a fiendishly clever idea that took me more than a moment to make sense of, > > > but how well does it work when RPM is compiled out or runtime-inhibited? > > > > We have a bunch of other controllers that require runtime_pm already, > > so it's going to be enabled. But that should be expressed in Kconfig. > > But it can still be inhibited from sysfs, right? We don't want driver > behaviour to be *unnecessarily* fragile to user actions, however silly they > may be. That's a good point :) > > > Furthermore, basing RPM on having a domain attached means that > > > you'll effectively never turn the IOMMU off, even when all the > > > clients are idle. It would make more sene to use device links like > > > most other drivers do to properly model the producer/consumer > > > relationship. > > > > I'm not familiar with device links for runtime_pm, I thought this was > > only useful for system-wide resume and suspend? > > See DL_FLAG_PM_RUNTIME - we already have several IOMMU drivers taking full > advantage of this. I'll look into it, thanks! > > > > +static int __maybe_unused sun50i_iommu_resume(struct device *dev) > > > > +{ > > > > + struct sun50i_iommu_domain *sun50i_domain; > > > > + struct sun50i_iommu *iommu; > > > > + unsigned long flags; > > > > + dma_addr_t dt_dma; > > > > + int ret; > > > > + > > > > + iommu = dev_get_drvdata(dev); > > > > + if (!iommu->domain) > > > > + return 0; > > > > + > > > > + sun50i_domain = to_sun50i_domain(iommu->domain); > > > > + dt_dma = dma_map_single(dev, sun50i_domain->dt, DT_SIZE, DMA_TO_DEVICE); > > > > > > As above. The power state of the IOMMU should be enitrely irrelevant to the > > > contents of RAM. > > > > Sorry, I should have put a comment here. > > > > I'm not quite sure what the difference between a group and domain in > > the IOMMU framework is, but since this IOMMU can only deal with a > > single address space, my understanding was that we'd need to allocate > > a single domain and group, and that the domain was the abstraction > > tied to an address space (since it's what is passed as an argument to > > map). > > That's correct, a domain is effectvely an address space, while groups > represents sets of devices that the IOMMU can isolate from each other. > IOMMUs like this one (and the MediaTek M4U in mtk_iommu.c) are a little > hard-done-by in that they do actually have some finer-grained isolation on a > basic allow/deny level, but the API really assumes that isolation happens at > the address space level, so it's easier to ignore it and just use the > single-group model anyway. > > The really neat advantage of having a guaranteed single group, though, is > that you then no longer need to care about address spaces: since the group > can only ever be attached to one domain at a time, you can have as many > domains as you like, and handle it by having the first attach_dev call on a > given domain context-switch that pagetable into the hardware. That's more or > less what you've done already, which is good, it would just benefit from > that context-switching being done in a more robust and obvious manner :) Got it, thanks :) > > So, given this, what made since was to allocate the directory table > > buffer at domain_alloc time and map it. But then, domain_alloc seems > > to not have any pointer back to the iommu we registered for some > > reason (I guess that a domain could be shared across multiple > > IOMMUs?), and so we don't have access to our IOMMU's struct device. > > I'll spare you the unrpoductive "Robin complains bitterly about the > iommu_domain_alloc() interface being terrible, episode #27"... > > You'll see two main ways that existing drivers work around that - if you're > happy to assume that you'll only ever have one IOMMU instance, or that all > instances will always be functionally equal, then you can simply keep track > of any old IOMMU device handle for DMA purposes (e.g. exynos_iommu); > otherwise if you might need to cope with multiple IOMMU instances having > different DMA capabilities then deferring instance-specific setup to the > first device attach is the de-facto standard (e.g. arm-smmu). I don't have any idea on how it's going to evolve, and the latter seems cleaner, I'll work on that. > > Also, a more general question. One of the cleanups I wanted to do was > > to remove the kmem_cache in favour of a dma_pool, which triggered that > > test. It looks like with a dma_pool, the physical address and dma > > address are not the same, even though the IOMMU is directly connected > > to the RAM so there should be no intermediate mapping. Do you know > > why? > > DMA pools are backed by dma_alloc_coherent, so (at least on arm64) the > virtual address you get will be a non-cacheable remap (assuming a > non-coherent device), and thus calling virt_to_phys() on it is bogus and > will give you nonsense. Going further off-topic, why do we need a remap instead of a regular physical address? Thanks! Maxime
Attachment:
signature.asc
Description: PGP signature