On Fri, Mar 31, 2023, at 17:12, Robin Murphy wrote: > On 31/03/2023 3:00 pm, Arnd Bergmann wrote: >> On Mon, Mar 27, 2023, at 14:48, Robin Murphy wrote: >> >> To be on the safe side, I'd have to pass a flag into >> arch_dma_mark_clean() about coherency, to let the arm >> implementation still require the extra dcache flush >> for coherent DMA, while ia64 can ignore that flag. > > Coherent DMA on Arm is assumed to be inner-shareable, so a coherent DMA > write should be pretty much equivalent to a coherent write by another > CPU (or indeed the local CPU itself) - nothing says that it *couldn't* > dirty a line in a data cache above the level of unification, so in > general the assumption must be that, yes, if coherent DMA is writing > data intended to be executable, then it's going to want a Dcache clean > to PoU and an Icache invalidate to PoU before trying to execute it. By > comparison, a non-coherent DMA transfer will inherently have to > invalidate the Dcache all the way to PoC in its dma_unmap, thus cannot > leave dirty data above the PoU, so only the Icache maintenance is > required in the executable case. Ok, makes sense. I've already started reworking my patch for it. > (FWIW I believe the Armv8 IDC/DIC features can safely be considered > irrelevant to 32-bit kernels) > > I don't know a great deal about IA-64, but it appears to be using its > PG_arch_1 flag in a subtly different manner to Arm, namely to optimise > out the *Icache* maintenance. So if anything, it seems IA-64 is the > weirdo here (who'd have guessed?) where DMA manages to be *more* > coherent than the CPUs themselves :) I checked this in the ia64 manual, and as far as I can tell, it originally only had one cacheflush instruction that flushes the dcache and invalidates the icache at the same time. So flush_icache_range() actually does both and flush_dcache_page() instead just marks the page as dirty to ensure flush_icache_range() does not get skipped after a writing a page from the kernel. On later Itaniums, there is apparently a separate icache flush instruction that gets used in flush_icache_range(), but that still works for the DMA case that is allowed to skip the flush. > This is all now making me think we need some careful consideration of > whether the benefits of consolidating code outweigh the confusion of > conflating multiple different meanings of "clean" together... The difference in usage of PG_dcache_clean/PG_dcache_dirty/PG_arch_1 across architectures is certainly big enough that we can't just define a a common arch_dma_mark_clean() across architectures, but I think the idea of having a common entry point for arch_dma_mark_clean() to be called from the dma-mapping code to do something architecture specific after a DMA is clean still makes sense, Arnd