Re: [PATCH v2] riscv: mm: synchronize MMU after pte change

Alan Kao <alankao@xxxxxxxxxxxxx> · Wed, 19 Jun 2019 15:44:38 +0800

Hi all,

On Wed, Jun 19, 2019 at 01:51:43AM +0000, Gary Guo wrote:
> I don't think it is sensible that any hardware would actually cache invalid entries.

I need some clarification here.  What does "invalid entries" mean here?
The inconsistency between TLB and page table?

> From my research https://arxiv.org/pdf/1905.06825 (will appear in CARRV 2019), existing Linux code already emits too many SFENCE.VMAs (94% of all flushes) that are completely unnecessary for a hardware that does not cache invalid entries. Adding a couple of more would definitely not good, considering that TLB flush could be expensive for some microarchitectures.
> 
> I think the spec should be fixed, recommending against invalid entries to be cached, and then we can do things similar to other architectures, i.e. use flush_tlb_fix_spurious_fault instead.
> 
> > -----Original Message-----
> > From: linux-riscv <linux-riscv-bounces@xxxxxxxxxxxxxxxxxxx> On Behalf Of Paul
> > Walmsley
> > Sent: Monday, June 17, 2019 11:33
> > To: ShihPo Hung <shihpo.hung@xxxxxxxxxx>
> > Cc: linux-riscv@xxxxxxxxxxxxxxxxxxx; Palmer Dabbelt <palmer@xxxxxxxxxx>; Albert
> > Ou <aou@xxxxxxxxxxxxxxxxx>; stable@xxxxxxxxxxxxxxx
> > Subject: Re: [PATCH v2] riscv: mm: synchronize MMU after pte change
> > 
> > Hi ShihPo,
> > 
> > On Mon, 17 Jun 2019, ShihPo Hung wrote:
> > 
> > > Because RISC-V compliant implementations can cache invalid entries
> > > in TLB, an SFENCE.VMA is necessary after changes to the page table.
> > > This patch adds an SFENCE.vma for the vmalloc_fault path.
> > >
> > > Signed-off-by: ShihPo Hung <shihpo.hung@xxxxxxxxxx>
> > > Cc: Palmer Dabbelt <palmer@xxxxxxxxxx>
> > > Cc: Albert Ou <aou@xxxxxxxxxxxxxxxxx>
> > > Cc: Paul Walmsley <paul.walmsley@xxxxxxxxxx>
> > > Cc: linux-riscv@xxxxxxxxxxxxxxxxxxx
> > > Cc: stable@xxxxxxxxxxxxxxx
> > 
> > Looks like something in your patch flow converted tabs to whitespace, so
> > the patch doesn't apply.  Then, with the tabs fixed, the comment text
> > exceeds 80 columns.
> > 
> > I've fixed those issues by hand for this patch (revised patch below, as
> > queued for v5.2-rc), but could you please fix these issues for future
> > patches?  Running checkpatch.pl --strict should help identify these issues
> > before posting.
> > 
> > 
> > thanks,
> > 
> > - Paul
> > 
> > 
> > From: ShihPo Hung <shihpo.hung@xxxxxxxxxx>
> > Date: Mon, 17 Jun 2019 12:26:17 +0800
> > Subject: [PATCH] [PATCH v2] riscv: mm: synchronize MMU after pte change
> > 
> > Because RISC-V compliant implementations can cache invalid entries
> > in TLB, an SFENCE.VMA is necessary after changes to the page table.
> > This patch adds an SFENCE.vma for the vmalloc_fault path.
> > 
> > Signed-off-by: ShihPo Hung <shihpo.hung@xxxxxxxxxx>
> > [paul.walmsley@xxxxxxxxxx: reversed tab->whitespace conversion,
> >  wrapped comment lines]
> > Signed-off-by: Paul Walmsley <paul.walmsley@xxxxxxxxxx>
> > Cc: Palmer Dabbelt <palmer@xxxxxxxxxx>
> > Cc: Albert Ou <aou@xxxxxxxxxxxxxxxxx>
> > Cc: Paul Walmsley <paul.walmsley@xxxxxxxxxx>
> > Cc: linux-riscv@xxxxxxxxxxxxxxxxxxx
> > Cc: stable@xxxxxxxxxxxxxxx

I guess this patch is inspired by a discuss from sw-dev forum:
https://groups.google.com/a/groups.riscv.org/forum/#!msg/sw-dev/-M-eRDmGuEc/m1__-sfqAAAJ

Some processors in our AndeStar V5 family do suffer the issue this
patch addressed.  Along this vmalloc_fault path in do_page_fault,
pud and pmd (level-3/2 PTE) are set in dcache, but without SFENCE.VMA
the handler returns.  Then, the page table worker does not snoop
dcache, so the PTE update has no effect and it loops in faults.

Just as what spec mentions,
"The SFENCE.VMA is used to flush any local hardware caches related to
address translation." 

> > ---
> >  arch/riscv/mm/fault.c | 13 +++++++++++++
> >  1 file changed, 13 insertions(+)
> > 
> > diff --git a/arch/riscv/mm/fault.c b/arch/riscv/mm/fault.c
> > index cec8be9e2d6a..5b72e60c5a6b 100644
> > --- a/arch/riscv/mm/fault.c
> > +++ b/arch/riscv/mm/fault.c
> > @@ -29,6 +29,7 @@
> > 
> >  #include <asm/pgalloc.h>
> >  #include <asm/ptrace.h>
> > +#include <asm/tlbflush.h>
> > 
> >  /*
> >   * This routine handles page faults.  It determines the address and the
> > @@ -278,6 +279,18 @@ asmlinkage void do_page_fault(struct pt_regs *regs)
> >  		pte_k = pte_offset_kernel(pmd_k, addr);
> >  		if (!pte_present(*pte_k))
> >  			goto no_context;
> > +
> > +		/*
> > +		 * The kernel assumes that TLBs don't cache invalid
> > +		 * entries, but in RISC-V, SFENCE.VMA specifies an
> > +		 * ordering constraint, not a cache flush; it is
> > +		 * necessary even after writing invalid entries.
> > +		 * Relying on flush_tlb_fix_spurious_fault would
> > +		 * suffice, but the extra traps reduce
> > +		 * performance. So, eagerly SFENCE.VMA.
> > +		 */
> > +		local_flush_tlb_page(addr);
> > +
> >  		return;
> >  	}
> >  }
> > --
> > 2.20.1
> > 
> > 
> > _______________________________________________
> > linux-riscv mailing list
> > linux-riscv@xxxxxxxxxxxxxxxxxxx
> > http://lists.infradead.org/mailman/listinfo/linux-riscv
> 
> _______________________________________________
> linux-riscv mailing list
> linux-riscv@xxxxxxxxxxxxxxxxxxx
> http://lists.infradead.org/mailman/listinfo/linux-riscv