On Fri, Mar 31, 2023, at 17:12, Robin Murphy wrote:
On 31/03/2023 3:00 pm, Arnd Bergmann wrote:
On Mon, Mar 27, 2023, at 14:48, Robin Murphy wrote:
To be on the safe side, I'd have to pass a flag into
arch_dma_mark_clean() about coherency, to let the arm
implementation still require the extra dcache flush
for coherent DMA, while ia64 can ignore that flag.
Coherent DMA on Arm is assumed to be inner-shareable, so a coherent DMA
write should be pretty much equivalent to a coherent write by another
CPU (or indeed the local CPU itself) - nothing says that it *couldn't*
dirty a line in a data cache above the level of unification, so in
general the assumption must be that, yes, if coherent DMA is writing
data intended to be executable, then it's going to want a Dcache clean
to PoU and an Icache invalidate to PoU before trying to execute it. By
comparison, a non-coherent DMA transfer will inherently have to
invalidate the Dcache all the way to PoC in its dma_unmap, thus cannot
leave dirty data above the PoU, so only the Icache maintenance is
required in the executable case.
Ok, makes sense. I've already started reworking my patch for it.
(FWIW I believe the Armv8 IDC/DIC features can safely be considered
irrelevant to 32-bit kernels)
I don't know a great deal about IA-64, but it appears to be using its
PG_arch_1 flag in a subtly different manner to Arm, namely to optimise
out the *Icache* maintenance. So if anything, it seems IA-64 is the
weirdo here (who'd have guessed?) where DMA manages to be *more*
coherent than the CPUs themselves :)
I checked this in the ia64 manual, and as far as I can tell, it originally
only had one cacheflush instruction that flushes the dcache and invalidates
the icache at the same time. So flush_icache_range() actually does
both and flush_dcache_page() instead just marks the page as dirty to
ensure flush_icache_range() does not get skipped after a writing a
page from the kernel.
On later Itaniums, there is apparently a separate icache flush
instruction that gets used in flush_icache_range(), but that
still works for the DMA case that is allowed to skip the flush.
This is all now making me think we need some careful consideration of
whether the benefits of consolidating code outweigh the confusion of
conflating multiple different meanings of "clean" together...
The difference in usage of PG_dcache_clean/PG_dcache_dirty/PG_arch_1
across architectures is certainly big enough that we can't just
define a a common arch_dma_mark_clean() across architectures, but
I think the idea of having a common entry point for
arch_dma_mark_clean() to be called from the dma-mapping code
to do something architecture specific after a DMA is clean still
makes sense,
Arnd