On Sat, Jan 21, 2023, at 15:37, Christoph Hellwig wrote: > On Fri, Jan 20, 2023 at 06:04:37PM +0100, Arnd Bergmann wrote: >> Having looked at this some more, I see that the powerpc >> version is a bit problematic here as well: this one >> flushes the partial cache lines before and after the >> DMA transfer, while only invalidating the full >> cache lines. > > That feels really odd, and might be worth a bug report to the > PPC maintainers. Right, my first step would be to change all of the current outliers to use the same set of operations where possible. >> Aside from the question for how to handle flush vs invalidate >> on DMA_FROM_DEVICE, I'm still trying to figure out how to >> best handle highmem with architecture specific cache management >> operations. The easy approach would be to leave that up >> to the architecture, passing only a physical address to >> the flush function. > > I suspect that is a good enough first step. Especially as I remember > that some architectures have physical address based cache management > anyway (unless we removed them in the meantime). ok >> A nicer interface might be to move the >> loop over highmem pages out into common code, flush >> lowmem pages by virtual addresss, and have a separate >> callback for highmem pages that takes a page pointer, >> like > > I'd rather avoid multiple callbacks if we can. But maybe solve > the simple problem first and just pass the paddr and then > iterate from there. Ok, fair enough. This means we can't easily put the kmap_atomic() into common code for highmem, though the per-page loop would still work. >> struct dma_cache_ops { >> void (*dma_cache_wback_inv)(void *start, unsigned long sz); >> void (*dma_cache_inv)(void *start, unsigned long sz); >> void (*dma_cache_wback)(void *start, unsigned long sz); >> #ifdef CONFIG_HIGHMEM >> void (*dma_cache_wback_inv_high_page)(struct page *, size_t start, unsigned long sz); >> void (*dma_cache_inv_high_page)(struct page *, size_t start, unsigned long sz); >> void (*dma_cache_wback_high_page)(struct page *, size_t start, unsigned long sz); > > Btw, I really don't think these should be indirect calls. > For sane architectures there should be exactly one way to call them, > and the onces that have different implementations really should be > using alternatives instead of expensive indirect calls. I was thinking of using STATIC_CALL() as an optimization here, which I find easier to read and understand than alternatives. One advantage here is that this allows the actual cache operations to be declared locally in the architecture without letting drivers call them, but still update the common code to work without indirect branches. The main downside is that this is currently only optimized on powerpc and x86, both of which don't actually need CPU specific callbacks. ARC, ARM, and MIPS on the other hand already have indirect function pointers, RISC-V would likely benefit the most from either alternatives or static_call, as it already uses alternatives and has one implementation that is clearly preferred over the others. Arnd