On Mon, Apr 4, 2022, at 16:58, Rob Herring wrote: > On Sat, Apr 02, 2022 at 09:07:17PM +0200, Arnd Bergmann wrote: >> On Sat, Apr 2, 2022 at 2:38 PM Sven Peter <sven@xxxxxxxxxxxxx> wrote: >> > On Mon, Mar 21, 2022, at 18:07, Arnd Bergmann wrote: >> > > On Mon, Mar 21, 2022 at 5:50 PM Sven Peter <sven@xxxxxxxxxxxxx> wrote: >> > >> The NVMe co-processor on the Apple M1 uses a DMA address filter called >> > >> SART for some DMA transactions. This adds a simple driver used to >> > >> configure the memory regions from which DMA transactions are allowed. >> > >> >> > >> Co-developed-by: Hector Martin <marcan@xxxxxxxxx> >> > >> Signed-off-by: Hector Martin <marcan@xxxxxxxxx> >> > >> Signed-off-by: Sven Peter <sven@xxxxxxxxxxxxx> >> > > >> > > Can you add some explanation about why this uses a custom interface >> > > instead of hooking into the dma_map_ops? >> > >> > Sure. >> > In a perfect world this would just be an IOMMU implementation but since >> > SART can't create any real IOVA space using pagetables it doesn't fit >> > inside that subsytem. >> > >> > In a slightly less perfect world I could just implement dma_map_ops here >> > but that won't work either because not all DMA buffers of the NVMe >> > device have to go through SART and those allocations happen >> > inside the same device and would use the same dma_map_ops. >> > >> > The NVMe controller has two separate DMA filters: >> > >> > - NVMMU, which must be set up for any command that uses PRPs and >> > ensures that the DMA transactions only touch the pages listed >> > inside the PRP structure. NVMMU itself is tightly coupled >> > to the NVMe controller: The list of allowed pages is configured >> > based on command's tag id and even commands that require no DMA >> > transactions must be listed inside NVMMU before they are started. >> > - SART, which must be set up for some shared memory buffers (e.g. >> > log messages from the NVMe firmware) and for some NVMe debug >> > commands that don't use PRPs. >> > SART is only loosely coupled to the NVMe controller and could >> > also be used together with other devices. It's also the only >> > thing that changed between M1 and M1 Pro/Max/Ultra and that's >> > why I decided to separate it from the NVMe driver. >> > >> > I'll add this explanation to the commit message. >> >> Ok, thanks. >> >> > >> +static void sart2_get_entry(struct apple_sart *sart, int index, u8 *flags, >> > >> + phys_addr_t *paddr, size_t *size) >> > >> +{ >> > >> + u32 cfg = readl_relaxed(sart->regs + APPLE_SART2_CONFIG(index)); >> > >> + u32 paddr_ = readl_relaxed(sart->regs + APPLE_SART2_PADDR(index)); >> > > >> > > Why do you use the _relaxed() accessors here and elsewhere in the driver? >> > >> > This device itself doesn't do any DMA transactions so it needs no memory >> > synchronization barriers. Only the consumer (i.e. rtkit and nvme) read/write >> > from/to these buffers (multiple times) and they have the required barriers >> > in place whenever they are used. >> > >> > These buffers so far are only allocated at probe time though so even using >> > the normal writel/readl here won't hurt performance at all. I can just use >> > those if you prefer or alternatively add a comment why _relaxed is fine here. >> > >> > This is a bit similar to the discussion for the pinctrl series last year [1]. >> >> I think it's better to only use the _relaxed version where it actually helps, >> with a comment about it, and use the normal version elsewhere, in >> particular in functions that you have copied from the normal nvme driver. >> I had tried to compare some of your code with the other version and >> was rather confused by that. > > Oh good, I tell folks the opposite (and others do too). We don't accept > random explicit barriers without explanation, but implicit ones are > okay? The resulting code on arm32 is also pretty horrible with the L2x0 > and OMAP sync hooks not that that matters here. > > I don't really care too much which way we go, but we should document one > rule and follow that. I don't have a strong opinion either. Arnd's approach is currently documented in Documentation/driver-api/device-io.rst fwiw: On architectures that require an expensive barrier for serializing against DMA, these "relaxed" versions of the MMIO accessors only serialize against each other, but contain a less expensive barrier operation. A device driver might use these in a particularly performance sensitive fast path, with a comment that explains why the usage in a specific location is safe without the extra barriers. Sven