On Wed, Aug 9, 2023, at 03:09, Li Chen wrote: > On Tue, 08 Aug 2023 15:44:44 +0800, > Arnd Bergmann wrote: >> On Tue, Aug 8, 2023, at 09:03, mani wrote: >> > On Mon, Aug 07, 2023 at 08:28:30PM +0800, Li Chen wrote: > >> If the cache is coherent with the device, then reading from the cache >> is clearly the right thing to do, > > I guess that even SoCs with CCI support might not handle cache for RC > access if specific bus interfaces are not connected. Correct, each device in the system can be cache-coherent or noncoherent, independent of the others, and needs to be marked correctly in the DT. The dma_alloc_coherent() call will either allocate cacheable or noncachable memory based on what the kernel thinks is required for the particular device. >> but the mentioned "stall" problem may >> be related to the store buffers, where an dma_wmb() after the >> WRITE_ONCE() is missing. Similarly, a dma_rmb() might be missing before >> a READ_ONCE() to prevent prefetching during out-of-order execution. >> >> With readl()/writel(), you already get very heavy barriers, so it may >> end up working by accident, but these barriers are at the other side >> of the access (before writel and after readl) and may be the wrong >> type of barrier depending on the CPU. > > For systems that aren't cache-coherent, is it accurate to say that the > store > buffer might still be utilized, and that there might still be a need > for dma_wmb and dma_rmb? Yes, the ordering is really independent of the cache, so these will be needed for portable code either way, the same way you need smp_wmb()/smp_rmb() between CPUs accessing shared memory locally. Arnd