On Thu, Sep 27, 2018 at 5:47 AM Paul Burton <paul.burton@xxxxxxxx> wrote: > > Hi Huacai, > > Copying DMA mapping maintainers for any input they may have. > > On Wed, Sep 05, 2018 at 05:33:02PM +0800, Huacai Chen wrote: > > static inline void local_r4k___flush_cache_all(void * args) > > { > > switch (current_cpu_type()) { > > case CPU_LOONGSON2: > > - case CPU_LOONGSON3: > > case CPU_R4000SC: > > case CPU_R4000MC: > > case CPU_R4400SC: > > @@ -480,6 +497,11 @@ static inline void local_r4k___flush_cache_all(void * args) > > r4k_blast_scache(); > > break; > > > > + case CPU_LOONGSON3: > > + /* Use get_ebase_cpunum() for both NUMA=y/n */ > > + r4k_blast_scache_node(get_ebase_cpunum() >> 2); > > + break; > > + > > I wonder if we could instead just include the node ID bits in > INDEX_BASE? Then we could continue using r4k_blast_scache() here as > usual. Yes, but it has no advantages. > > > case CPU_BMIPS5000: > > r4k_blast_scache(); > > __sync(); > > @@ -840,10 +862,14 @@ static void r4k_dma_cache_wback_inv(unsigned long addr, unsigned long size) > > > > preempt_disable(); > > if (cpu_has_inclusive_pcaches) { > > - if (size >= scache_size) > > - r4k_blast_scache(); > > - else > > + if (size >= scache_size) { > > + if (current_cpu_type() != CPU_LOONGSON3) > > + r4k_blast_scache(); > > + else > > + r4k_blast_scache_node(pa_to_nid(addr)); > > + } else { > > blast_scache_range(addr, addr + size); > > + } > > preempt_enable(); > > __sync(); > > return; > > Hmm, so if I understand correctly this will writeback+invalidate the L2 > for one node only? ie. you just changed which node that is. > > I'm presuming L2 ops performed in one node aren't broadcast to other > nodes, otherwise this patch is pointless? > > Thus presumably L2 caches in other nodes may contain stale data, right? > Or even worse, dirty data which may get written back at any moment? > > I'm not sure this is safe - do you need to operate on all L2 caches in > the system here? > > I also wonder whether it would be cleaner for Loongson3 to provide a > custom struct dma_map_ops to implement this, rather than adding the > condition to the generic implementation. In Loongson-3, L2 cache is shared by all nodes, that means a memory address has only one copy in L2 (node-0's memory only cached in node-0's L2, node-1's memory only cached in node-1's memory. If node-0 want to access node-1's memory, it may hit in node-1's L2). Huacai > > Thanks, > Paul