On Fri, May 5, 2023 at 9:19 PM Arnd Bergmann <arnd@xxxxxxxx> wrote: > > On Fri, May 5, 2023, at 07:47, Guo Ren wrote: > > On Mon, Mar 27, 2023 at 8:15 PM Arnd Bergmann <arnd@xxxxxxxxxx> wrote: > > >> > >> riscv also invalidates the caches before the transfer, which does > >> not appear to serve any purpose. > > Yes, we can't guarantee the CPU pre-load cache lines randomly during > > dma working. > > > > But I've two purposes to keep invalidates before dma transfer: > > - We clearly tell the CPU these cache lines are invalid. The caching > > algorithm would use these invalid slots first instead of replacing > > valid ones. > > - Invalidating is very cheap. Actually, flush and clean have the same > > performance in our machine. > > The main purpose of the series was to get consistent behavior on > all machines, so I really don't want a custom optimization on > one architecture. You make a good point about cacheline reuse > after invalidation, but if we do that, I'd suggest doing this > across all architectures. Yes, invalidation of DMA_FROM_DEVICE-for_device is a proposal for all architectures. > > > So, how about: > > > > diff --git a/arch/riscv/mm/dma-noncoherent.c b/arch/riscv/mm/dma-noncoherent.c > > index d919efab6eba..2c52fbc15064 100644 > > --- a/arch/riscv/mm/dma-noncoherent.c > > +++ b/arch/riscv/mm/dma-noncoherent.c > > @@ -22,8 +22,6 @@ void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, > > ALT_CMO_OP(clean, vaddr, size, riscv_cbom_block_size); > > break; > > case DMA_FROM_DEVICE: > > - ALT_CMO_OP(clean, vaddr, size, riscv_cbom_block_size); > > - break; > > case DMA_BIDIRECTIONAL: > > ALT_CMO_OP(flush, vaddr, size, riscv_cbom_block_size); > > break; > > This is something we can consider. Unfortunately, this is something > that no architecture (except pa-risc, which has other problems) > does at the moment, so we'd probably need to have a proper debate > about this. > > We already have two conflicting ways to handle DMA_FROM_DEVICE, > either invalidate/invalidate, or clean/invalidate. I can see I vote to invalidate/invalidate. My key point is to let DMA_FROM_DEVICE-for_device invalidate, and DMA_BIDIRECTIONAL contains DMA_FROM_DEVICE. So I also agree: @@ -22,8 +22,6 @@ void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, ALT_CMO_OP(clean, vaddr, size, riscv_cbom_block_size); break; case DMA_FROM_DEVICE: - ALT_CMO_OP(clean, vaddr, size, riscv_cbom_block_size); + ALT_CMO_OP(invalidate, vaddr, size, riscv_cbom_block_size); break; case DMA_BIDIRECTIONAL: ALT_CMO_OP(flush, vaddr, size, riscv_cbom_block_size); break; > that flush/invalidate may be a sensible option as well, but I'd > want to have that discussion after the series is complete, so > we can come to a generic solution that has the same documented > behavior across all architectures. Yes, I agree to unify them into a generic solution first. My proposal could be another topic in the future. For that purpose, I give Acked-by: Guo Ren <guoren@xxxxxxxxxx> > > In particular, if we end up moving arm64 and riscv back to the > traditional invalidate/invalidate for DMA_FROM_DEVICE and > document that driver must not rely on buffers getting cleaned After invalidation, the cache lines are also cleaned, right? So why do we need to document it additionally? > before a partial DMA_FROM_DEVICE, the question between clean > or flush becomes moot as well. > > > @@ -42,7 +40,7 @@ void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, > > break; > > case DMA_FROM_DEVICE: > > case DMA_BIDIRECTIONAL: > > /* I'm not sure all drivers have guaranteed cacheline > > alignment. If not, this inval would cause problems */ > > - ALT_CMO_OP(flush, vaddr, size, riscv_cbom_block_size); > > + ALT_CMO_OP(inval, vaddr, size, riscv_cbom_block_size); > > break; > > This is my original patch, and I would not mix it with the other > change. The problem with non-aligned DMA_BIDIRECTIONAL buffers in > is that both flush and inval would be wrong if you get simultaneous > writes from device and cpu to the same cache line, so there is > no way to win this. Using inval instead of flush would at least > work if the CPU data in the cacheline is read-only from the CPU, > so that seems better than something that is always wrong. If CPU data in the cacheline is read-only, the cacheline would never be dirty. Yes, It's always safe. Okay, I agree we must keep cache-line-aligned. I comment it here, just worry some dirty drivers couldn't work with the "invalid mechanism" because of the CPU data corruption, and device data in the cacheline is useless. > > The documented API is that sharing the cache line is not allowed > at all, so anything that would observe a difference between the > two is also a bug. One idea that we have considered already is > that we could overwrite the unused bits of the cacheline with > poison values and/or mark them as invalid using KASAN for debugging > purposes, to find drivers that already violate this. > > Arnd -- Best Regards Guo Ren