Hi, Catalin On 2021/12/17 19:59, Catalin Marinas wrote: > On Fri, Dec 17, 2021 at 04:56:11PM +0800, Xiongfeng Wang wrote: >> For memory accesses with Normal-Non Cacheable attributes, the CPU may do > > You may want to mention "arm64 Normal Non-Cacheable" as other > architectures have a different meaning of NC. > >> write combining. But in some situation, this is bad for the performance >> because the prior access may wait too long just to be merged. >> >> We introduce io_stop_wc() to prevent the Normal-NC memory accesses before >> this instruction to be merged with memory accesses after this >> instruction. >> >> We add implementation for ARM64 using DGH instruction and provide NOP >> implementation for other architectures. >> >> Signed-off-by: Xiongfeng Wang <wangxiongfeng2@xxxxxxxxxx> >> --- >> Documentation/memory-barriers.txt | 9 +++++++++ >> arch/arm64/include/asm/barrier.h | 9 +++++++++ >> include/asm-generic/barrier.h | 11 +++++++++++ >> 3 files changed, 29 insertions(+) >> >> diff --git a/Documentation/memory-barriers.txt b/Documentation/memory-barriers.txt >> index 7367ada13208..b868b51b1801 100644 >> --- a/Documentation/memory-barriers.txt >> +++ b/Documentation/memory-barriers.txt >> @@ -1950,6 +1950,15 @@ There are some more advanced barrier functions: >> For load from persistent memory, existing read memory barriers are sufficient >> to ensure read ordering. >> >> + (*) io_stop_wc(); >> + >> + For memory accesses with Normal-Non Cacheable attributes (e.g. those >> + returned by ioremap_wc()), the CPU may do write combining. But in some >> + situation, this is bad for the performance because the prior access may >> + wait too long just to be merged. io_stop_wc() can be used to prevent >> + merging memory accesses with Normal-Non Cacheable attributes before this >> + instruction with any memory accesses appearing after this instruction. > > I'm fine with the patch in general but the comment here and in > asm-generic/barrier.h should avoid Normal Non-Cacheable as that's an > arm-specific term. Looking at Documentation/driver-api/device-io.rst, we > could simply say "write-combining". Something like: > > For memory accesses with write-combining attributes (e.g. those > returned by ioremap_wc()), the CPU may wait for prior accesses to > be merged with subsequent ones. io_stop_wc() can be used to prevent > the merging of write-combining memory accesses before this macro > with those after it when such wait has performance implications. > > (feel free to rephrase it but avoid Normal NC here) Thanks for your advice! I will send another version soon. Thanks, Xiongfeng > >> + >> =============================== >> IMPLICIT KERNEL MEMORY BARRIERS >> =============================== >> diff --git a/arch/arm64/include/asm/barrier.h b/arch/arm64/include/asm/barrier.h >> index 1c5a00598458..62217be36217 100644 >> --- a/arch/arm64/include/asm/barrier.h >> +++ b/arch/arm64/include/asm/barrier.h >> @@ -26,6 +26,14 @@ >> #define __tsb_csync() asm volatile("hint #18" : : : "memory") >> #define csdb() asm volatile("hint #20" : : : "memory") >> >> +/* >> + * Data Gathering Hint: >> + * This instruction prevents merging memory accesses with Normal-NC or >> + * Device-GRE attributes before the hint instruction with any memory accesses >> + * appearing after the hint instruction. >> + */ >> +#define dgh() asm volatile("hint #6" : : : "memory") > > This is fine, arm-specific code. > >> + >> #ifdef CONFIG_ARM64_PSEUDO_NMI >> #define pmr_sync() \ >> do { \ >> @@ -46,6 +54,7 @@ >> #define dma_rmb() dmb(oshld) >> #define dma_wmb() dmb(oshst) >> >> +#define io_stop_wc() dgh() >> >> #define tsb_csync() \ >> do { \ >> diff --git a/include/asm-generic/barrier.h b/include/asm-generic/barrier.h >> index 640f09479bdf..083be6d34cb9 100644 >> --- a/include/asm-generic/barrier.h >> +++ b/include/asm-generic/barrier.h >> @@ -251,5 +251,16 @@ do { \ >> #define pmem_wmb() wmb() >> #endif >> >> +/* >> + * ioremap_wc() maps I/O memory as memory with Normal-Non Cacheable attributes. >> + * The CPU may do write combining for this kind of memory access. io_stop_wc() >> + * prevents merging memory accesses with Normal-Non Cacheable attributes >> + * before this instruction with any memory accesses appearing after this >> + * instruction. > > Please change this as well along the lines of my comment above. > >> + */ >> +#ifndef io_stop_wc >> +#define io_stop_wc do { } while (0) >> +#endif >> + >> #endif /* !__ASSEMBLY__ */ >> #endif /* __ASM_GENERIC_BARRIER_H */ > > Thanks. >