Re: [PATCH 02/10] MIPS: c-r4k: Add r4k_blast_scache_node for Loongson-3

Huacai Chen <chenhc@xxxxxxxxxx> · Thu, 27 Sep 2018 15:25:10 +0800



On Thu, Sep 27, 2018 at 5:47 AM Paul Burton <paul.burton@xxxxxxxx> wrote:
>
> Hi Huacai,
>
> Copying DMA mapping maintainers for any input they may have.
>
> On Wed, Sep 05, 2018 at 05:33:02PM +0800, Huacai Chen wrote:
> >  static inline void local_r4k___flush_cache_all(void * args)
> >  {
> >       switch (current_cpu_type()) {
> >       case CPU_LOONGSON2:
> > -     case CPU_LOONGSON3:
> >       case CPU_R4000SC:
> >       case CPU_R4000MC:
> >       case CPU_R4400SC:
> > @@ -480,6 +497,11 @@ static inline void local_r4k___flush_cache_all(void * args)
> >               r4k_blast_scache();
> >               break;
> >
> > +     case CPU_LOONGSON3:
> > +             /* Use get_ebase_cpunum() for both NUMA=y/n */
> > +             r4k_blast_scache_node(get_ebase_cpunum() >> 2);
> > +             break;
> > +
>
> I wonder if we could instead just include the node ID bits in
> INDEX_BASE? Then we could continue using r4k_blast_scache() here as
> usual.
Yes, but it has no advantages.

>
> >       case CPU_BMIPS5000:
> >               r4k_blast_scache();
> >               __sync();
> > @@ -840,10 +862,14 @@ static void r4k_dma_cache_wback_inv(unsigned long addr, unsigned long size)
> >
> >       preempt_disable();
> >       if (cpu_has_inclusive_pcaches) {
> > -             if (size >= scache_size)
> > -                     r4k_blast_scache();
> > -             else
> > +             if (size >= scache_size) {
> > +                     if (current_cpu_type() != CPU_LOONGSON3)
> > +                             r4k_blast_scache();
> > +                     else
> > +                             r4k_blast_scache_node(pa_to_nid(addr));
> > +             } else {
> >                       blast_scache_range(addr, addr + size);
> > +             }
> >               preempt_enable();
> >               __sync();
> >               return;
>
> Hmm, so if I understand correctly this will writeback+invalidate the L2
> for one node only? ie. you just changed which node that is.
>
> I'm presuming L2 ops performed in one node aren't broadcast to other
> nodes, otherwise this patch is pointless?
>
> Thus presumably L2 caches in other nodes may contain stale data, right?
> Or even worse, dirty data which may get written back at any moment?
>
> I'm not sure this is safe - do you need to operate on all L2 caches in
> the system here?
>
> I also wonder whether it would be cleaner for Loongson3 to provide a
> custom struct dma_map_ops to implement this, rather than adding the
> condition to the generic implementation.
In Loongson-3, L2 cache is shared by all nodes, that means a memory
address has only one copy in L2 (node-0's memory only cached in
node-0's L2, node-1's memory only cached in node-1's memory. If node-0
want to access node-1's memory, it may hit in node-1's L2).

Huacai

>
> Thanks,
>     Paul