On Mon, Jul 19, 2021, at 20:15, Robin Murphy wrote: > On 2021-07-15 17:41, Sven Peter via iommu wrote: > [...] > >>> + u64 sw_bypass_cpu_start; > >>> + u64 sw_bypass_dma_start; > >>> + u64 sw_bypass_len; > >>> + > >>> + struct list_head streams; > >> > >> I'm staring to think this could just be a bitmap, in a u16 even. > > > > The problem is that these streams may come from two different > > DART instances. That is required for e.g. the dwc3 controller which > > has a weird quirk where DMA transactions go through two separate > > DARTs with no clear pattern (e.g. some xhci control structures use the > > first dart while other structures use the second one). > > Ah right, I do remember discussing that situation, but I think I > misinterpreted dart_domain->dart representing "the DART instance" here > to mean we weren't trying to accommodate that just yet. > > > Both of them need to point to the same pagetable. > > In the device tree the node will have an entry like this: > > > > dwc3_0: usb@382280000{ > > ... > > iommus = <&dwc3_0_dart_0 0>, <&dwc3_0_dart_1 1>; > > }; > > > > There's no need for a linked list though once I do this properly with > > groups. I can just use an array allocated when the first device is > > attached, which just contains apple_dart* and streamid values. > > > > > >> > >>> + > >>> + spinlock_t lock; > >>> + > >>> + struct iommu_domain domain; > >>> +}; > >>> + > >>> +/* > >>> + * This structure is attached to devices with dev_iommu_priv_set() on of_xlate > >>> + * and contains a list of streams bound to this device as defined in the > >>> + * device tree. Multiple DART instances can be attached to a single device > >>> + * and each stream is identified by its stream id. > >>> + * It's usually reference by a pointer called *cfg. > >>> + * > >>> + * A dynamic array instead of a linked list is used here since in almost > >>> + * all cases a device will just be attached to a single stream and streams > >>> + * are never removed after they have been added. > >>> + * > >>> + * @num_streams: number of streams attached > >>> + * @streams: array of structs to identify attached streams and the device link > >>> + * to the iommu > >>> + */ > >>> +struct apple_dart_master_cfg { > >>> + int num_streams; > >>> + struct { > >>> + struct apple_dart *dart; > >>> + u32 sid; > >> > >> Can't you use the fwspec for this? > > > > > > I'd be happy to use the fwspec code if that's somehow possible. > > I'm not sure how though since I need to store both the reference to the DART > > _and_ to the stream id. As far as I can tell the fwspec code would only allow > > to store the stream ids. > > (see also the previous comment regarding the dwc3 node which requires stream > > ids from two separate DART instances) > > Hmm, yes, as above I was overlooking that, although there are still > various ideas that come to mind; the question becomes whether they're > actually worthwhile or just too-clever-for-their-own-good hacks. The > exact format of fwspec->ids is not fixed (other than the ACPI IORT code > having a common understanding with the Arm SMMU drivers) so in principle > you could munge some sort of DART instance index or indeed anything, but > if it remains cleaner to manage your own data internally then by all > means keep doing that. Yeah, I can think of some hacks as well (like storing a global id->apple_dart* map or stuffing the 64bit pointer into two ints) and I've tried a few of them in the past days but didn't like either of them. I do like the idea to just put two (struct apple_dart *dart, u16 sidmap) in there though which will be plenty for all current configurations. > > >>> + struct device_link *link; > >> > >> Is it necessary to use stateless links, or could you use > >> DL_FLAG_AUTOREMOVE_SUPPLIER and not have to keep track of them manually? > > > > I'll just use DL_FLAG_AUTOREMOVE_SUPPLIER. No idea why I went for stateless links. > > > >> > > [...] > >>> + /* restore stream identity map */ > >>> + writel(0x03020100, dart->regs + DART_STREAM_REMAP); > >>> + writel(0x07060504, dart->regs + DART_STREAM_REMAP + 4); > >>> + writel(0x0b0a0908, dart->regs + DART_STREAM_REMAP + 8); > >>> + writel(0x0f0e0d0c, dart->regs + DART_STREAM_REMAP + 12); > >> > >> Any hint of what the magic numbers mean? > > > > Yes, it's just 0,1,2,3...,0xe,0xf but I can't do 8bit writes to the bus > > and 32 bit writes then require these slightly awkward "swapped" numbers. > > I'll add a comment since it's not obvious at first glance. > > Sure, I guessed that much from "identity map" - it was more a question > of why that means 0x03020100... rather than, say, 0x0c0d0e0f... or > 0x76543210..., and perhaps the reason for "restoring" it in the first place. So what this feature does is to allow the DART to take an incoming DMA stream tagged with id i and pretend that it's actually been tagged with readb(dart->regs + 0x80 + i) instead. That's as much as I can figure out by poking the hardware. More details are probably only available to Apple. Now the reason I thought I needed this was that I assumed we are handed these DARTs in an unclean state because Apple makes use of this internally: In their device tree they have a sid-remap property which I believe is a hack to make their driver simpler. The dwc3 controller requires stream 0 of dartA and stream 1 of dartB to be configured the same way. They configure dartB to remap stream 1 to stream 0 and then just mirror all MMIO writes from dartA to dartB and pretend that dwc3 only needs a single DART. As it actually turns out though, iBoot doesn't use the USB DARTs and we already get them in the sane state. I can just drop this code. (And if we actually need it for other DARTs I can also just restore those in our bootloader or add it in a follow up). > > [...] > >>> + /* > >>> + * we can't mix and match DARTs that support bypass mode with those who don't > >>> + * because the iova space in fake bypass mode generally has an offset > >>> + */ > >> > >> Erm, something doesn't sound right there... IOMMU_DOMAIN_IDENTITY should > >> be exactly what it says, regardless of how it's implemented. If you > >> can't provide a true identity mapping then you're probably better off > >> not pretending to support them in the first place. > > > > Some background: the PCIe DART only supports a 32bit VA space but RAM > > on these machines starts at 0x8_0000_0000. I have something like > > dma-ranges = <0x42000000 0 0 0x8 0 0 0xffff0000>; > > in the pcie nodes to add that offset to dma addresses. > > > > What I want to do here then is to setup an identity mapping with respect > > to the DMA layer understanding of addresses encoded in bus_dma_region. > > Now this will always just be a constant offset of 0x8_0000_0000 for > > all M1s but I didn't want to hardcode that. > > The code here is just there to guard against a situation where someone > > somehow manages to attach two devices with different offsets to the same > > domain. > > Urgh, *now* I think I get it - the addressing limitation WRT the > physical memory map layout had also slipped my mind. So you describe the > RC *as if* it had a physical bus offset, rely on iommu-dma ignoring it > when active (which is more by luck than design - we don't expect to ever > see a device with a real hard-wired offset upstream of an IOMMU, > although I did initially try to support it back in the very early days), > and otherwise statically program a translation such that anyone else who > *does* respect bus_dma_regions finds things work as expected. Yes, exactly. It's not very nice but it works... > > That actually seems like an even stronger argument for having the > fake-bypass table belong to the DART rather than the domain, and at that > point you shouldn't even need the mismatch restriction, since as long as > you haven't described the fake offset for any devices who *can* achieve > real bypass, then "attach to an identity domain" simply comes down to > doing the appropriate thing for each individual stream, regardless of > whether it's the same nominal identity domain that another device is > using or a distinct one (it's highly unlikely that two groups would ever > get attached to one identity domain rather than simply having their own > anyway, but it is technically possible). > Agreed. That sounds a lot nicer actually. > > If that's not how the abstraction is supposed to work and/or too big of a hack > > I'll just remove the software bypass mode altogether. > > PCIe won't work on 4k kernels then but the only people using this so far > > build their own kernels with patches either way and won't complain. > > And by the time Linux will actually be useful for "normal" setups > > the dma-iommu layer can hopefully just handle a larger page granularity. > > It's certainly... "creative", and TBH I don't hate it (in a "play the > hand you've been given" kind of way), but the one significant downside > is that if the DART driver isn't loaded for any reason, PCI DMA will > look like it should be usable but then just silently (or not so > silently) fail. Good point! > > FWIW if you do want to keep the option open, I'd be inclined to have the > DT just give an "honest" description of just the 32-bit limitation, then > have the DART driver's .probe_device sneakily modify the bus_dma_region > to match the relevant fake-bypass table as appropriate. It's possible > other folks might hate that even more though :D I've given that one a try and I kinda like it so far :D I'll keep it for v5 and just drop it in case someone complains. > > >>> + if (WARN_ON(domain->type == IOMMU_DOMAIN_IDENTITY && > >>> + (domain->dart->supports_bypass != dart->supports_bypass))) > >>> + return -EINVAL; > >>> + > >>> + list_for_each_entry(stream, &domain->streams, stream_head) { > >>> + if (stream->dart == dart && stream->sid == sid) { > >>> + stream->num_devices++; > >>> + return 0; > >>> + } > >>> + } > >>> + > >>> + spin_lock_irqsave(&dart->lock, flags); > >>> + > >>> + if (WARN_ON(dart->used_sids & BIT(sid))) { > >>> + ret = -EINVAL; > >>> + goto error; > >>> + } > >>> + > >>> + stream = kzalloc(sizeof(*stream), GFP_ATOMIC); > >>> + if (!stream) { > >>> + ret = -ENOMEM; > >>> + goto error; > >>> + } > >> > >> Couldn't you do this outside the lock? (If, calling back to other > >> comments, it can't get refactored out of existence anyway) > > > > Probably, but I'll first see if I can just refactor it away. > > Actually I missed that we're already under dart_domain->lock at this > point anyway, so it's not going to make much difference, but it does > mean that the spin_lock_irqsave() above could just be spin_lock(), > unless it's possible to relax the domain locking a bit such that we > don't have to do the whole domain init with IRQs masked. I can relax the locking quite a bit. Right now, I only need a spinlock around the TLB flush MMIO writes and a mutex to protect domain initialization. > > [...] > >>> +static struct iommu_domain *apple_dart_domain_alloc(unsigned int type) > >>> +{ > >>> + struct apple_dart_domain *dart_domain; > >>> + > >>> + if (type != IOMMU_DOMAIN_DMA && type != IOMMU_DOMAIN_UNMANAGED && > >>> + type != IOMMU_DOMAIN_IDENTITY && type != IOMMU_DOMAIN_BLOCKED) > >>> + return NULL; > >> > >> I want to say there's not much point in that, but then I realise I've > >> spent the last couple of days writing patches to add a new domain type :) > > > > Hah! Just because I'm curious: What is that new domain type going to be? :) > > Splitting IOMMU_DOMAIN_DMA into two to replace iommu_dma_strict being an > orthogonal thing. > > [...] > >>> +static int apple_dart_of_xlate(struct device *dev, struct of_phandle_args *args) > >>> +{ > >>> + struct platform_device *iommu_pdev = of_find_device_by_node(args->np); > >>> + struct apple_dart_master_cfg *cfg = dev_iommu_priv_get(dev); > >>> + unsigned int num_streams = cfg ? cfg->num_streams : 0; > >>> + struct apple_dart_master_cfg *cfg_new; > >>> + struct apple_dart *dart = platform_get_drvdata(iommu_pdev); > >>> + > >>> + if (args->args_count != 1) > >>> + return -EINVAL; > >>> + > >>> + cfg_new = krealloc(cfg, struct_size(cfg, streams, num_streams + 1), > >>> + GFP_KERNEL); > >>> + if (!cfg_new) > >>> + return -ENOMEM; > >>> + > >>> + cfg = cfg_new; > >>> + dev_iommu_priv_set(dev, cfg); > >>> + > >>> + cfg->num_streams = num_streams; > >>> + cfg->streams[cfg->num_streams].dart = dart; > >>> + cfg->streams[cfg->num_streams].sid = args->args[0]; > >>> + cfg->num_streams++; > >> > >> Yeah, this is way too reminiscent of the fwspec code for comfort. Even > >> if you can't use autoremove links for some reason, an array of 16 > >> device_link pointers hung off apple_dart still wins over these little > >> pointer-heavy structures if you need more than a few of them. > > > > I can get rid of the links, but I'll still need some way to store > > both the apple_dart and the sid here. Like mentioned above, I'll > > be happy to reuse the fwspec code but I don't see how yet. > > As before, if you can fit in some kind of DART instance identifier which > isn't impractical to unpack than it makes sense to use the fwspec since > it's already there. However if you still need to allocate something > per-device rather than just stashing an existing pointer in iommu_priv, > then you may as well keep everything together there. If the worst known > case could still fit in just two DART pointers and two 64-bit bitmaps, > I'd be inclined to just have that as a fixed structure and save all the > extra bother - you're not cross-architecture like the fwspec code, and > arm64's minimum kmalloc granularity has just gone back up to 128 bytes > (but even at 64 bytes you'd have had plenty of room). That's a very good point, I somehow tried to make this part as general as possible and didn't realize that this only has to work on essentially one SoC for now. I also don't expect Apple to require more than two DARTs for a single master in the future. I've tried the fixed structure now and I really like it so far. > > [...] > >>> +static int apple_dart_remove(struct platform_device *pdev) > >>> +{ > >>> + struct apple_dart *dart = platform_get_drvdata(pdev); > >>> + > >>> + devm_free_irq(dart->dev, dart->irq, dart); > >>> + > >>> + iommu_device_unregister(&dart->iommu); > >>> + iommu_device_sysfs_remove(&dart->iommu); > >>> + > >>> + clk_bulk_disable(dart->num_clks, dart->clks); > >>> + clk_bulk_unprepare(dart->num_clks, dart->clks); > >> > >> Ditto. > >> > >> And again the bus ops are still installed - that'll get really fun if > >> this is a module unload... > > > > Ugh, yeah. I'll fix that as well. I'll have to see how to make this work > > correctly with multiple DART instances. I guess I should only remove the > > bus ops once the last one is removed. Now that I think about it, this > > could also get tricky in the cleanup paths of apple_dart_probe. > > > > Maybe just add a module_init that sets up the bus ops when it finds at > > least one DART node and module_exit to tear them down again? > > Actually by this point it was late and I wasn't thinking as clearly as I > could have been, apologies ;) > > I believe a module unload is in fact the *only* time you should expect > to see .remove called - you want to set .suppress_bind_attrs in your > driver data because there's basically no way to prevent manual unbinding > from blowing up - so it should be fine to unconditionally clear the ops > at this point (being removed means you must have successfully probed, so > any ops must be yours). Makes sense, thanks! I'll let my current version simmer for a bit and wait until it's been tested by a few people and will send it in a week or so then! Best, Sven