Hi Huacai, Copying DMA mapping maintainers for any input they may have. On Wed, Sep 05, 2018 at 05:33:02PM +0800, Huacai Chen wrote: > static inline void local_r4k___flush_cache_all(void * args) > { > switch (current_cpu_type()) { > case CPU_LOONGSON2: > - case CPU_LOONGSON3: > case CPU_R4000SC: > case CPU_R4000MC: > case CPU_R4400SC: > @@ -480,6 +497,11 @@ static inline void local_r4k___flush_cache_all(void * args) > r4k_blast_scache(); > break; > > + case CPU_LOONGSON3: > + /* Use get_ebase_cpunum() for both NUMA=y/n */ > + r4k_blast_scache_node(get_ebase_cpunum() >> 2); > + break; > + I wonder if we could instead just include the node ID bits in INDEX_BASE? Then we could continue using r4k_blast_scache() here as usual. > case CPU_BMIPS5000: > r4k_blast_scache(); > __sync(); > @@ -840,10 +862,14 @@ static void r4k_dma_cache_wback_inv(unsigned long addr, unsigned long size) > > preempt_disable(); > if (cpu_has_inclusive_pcaches) { > - if (size >= scache_size) > - r4k_blast_scache(); > - else > + if (size >= scache_size) { > + if (current_cpu_type() != CPU_LOONGSON3) > + r4k_blast_scache(); > + else > + r4k_blast_scache_node(pa_to_nid(addr)); > + } else { > blast_scache_range(addr, addr + size); > + } > preempt_enable(); > __sync(); > return; Hmm, so if I understand correctly this will writeback+invalidate the L2 for one node only? ie. you just changed which node that is. I'm presuming L2 ops performed in one node aren't broadcast to other nodes, otherwise this patch is pointless? Thus presumably L2 caches in other nodes may contain stale data, right? Or even worse, dirty data which may get written back at any moment? I'm not sure this is safe - do you need to operate on all L2 caches in the system here? I also wonder whether it would be cleaner for Loongson3 to provide a custom struct dma_map_ops to implement this, rather than adding the condition to the generic implementation. Thanks, Paul