On Fri, Jul 21, 2023 at 4:01 PM Alexandre Ghiti <alex@xxxxxxxx> wrote: > > > On 21/07/2023 18:08, Guo Ren wrote: > > On Fri, Jul 21, 2023 at 11:19 PM Alexandre Ghiti <alex@xxxxxxxx> wrote: > >> > >> On 21/07/2023 16:51, guoren@xxxxxxxxxx wrote: > >>> From: Guo Ren <guoren@xxxxxxxxxxxxxxxxx> > >>> > >>> RISC-V specification permits the caching of PTEs whose V (Valid) > >>> bit is clear. Operating systems must be written to cope with this > >>> possibility, but implementers are reminded that eagerly caching > >>> invalid PTEs will reduce performance by causing additional page > >>> faults. > >>> > >>> So we must keep vmalloc_fault for the spurious page faults of kernel > >>> virtual address from an OoO machine. > >>> > >>> Signed-off-by: Guo Ren <guoren@xxxxxxxxxxxxxxxxx> > >>> Signed-off-by: Guo Ren <guoren@xxxxxxxxxx> > >>> --- > >>> arch/riscv/mm/fault.c | 3 +-- > >>> 1 file changed, 1 insertion(+), 2 deletions(-) > >>> > >>> diff --git a/arch/riscv/mm/fault.c b/arch/riscv/mm/fault.c > >>> index 85165fe438d8..f662c9eae7d4 100644 > >>> --- a/arch/riscv/mm/fault.c > >>> +++ b/arch/riscv/mm/fault.c > >>> @@ -258,8 +258,7 @@ void handle_page_fault(struct pt_regs *regs) > >>> * only copy the information from the master page table, > >>> * nothing more. > >>> */ > >>> - if ((!IS_ENABLED(CONFIG_MMU) || !IS_ENABLED(CONFIG_64BIT)) && > >>> - unlikely(addr >= VMALLOC_START && addr < VMALLOC_END)) { > >>> + if (unlikely(addr >= TASK_SIZE)) { > >>> vmalloc_fault(regs, code, addr); > >>> return; > >>> } > >> > >> Can you share what you are trying to fix here? > > We met a spurious page fault panic on an OoO machine. > > > > 1. The processor speculative execution brings the V=0 entries into the > > TLB in the kernel virtual address. > > 2. Linux kernel installs the kernel virtual address with the page, and V=1 > > 3. When kernel code access the kernel virtual address, it would raise > > a page fault as the V=0 entry in the tlb. > > 4. No vmalloc_fault, then panic. > > > >> I have a fix (that's currently running our CI) for commit 7d3332be011e > >> ("riscv: mm: Pre-allocate PGD entries for vmalloc/modules area") that > >> implements flush_cache_vmap() since we lost the vmalloc_fault. > > Could you share that patch? > > > Here we go: > > > Author: Alexandre Ghiti <alexghiti@xxxxxxxxxxxx> > Date: Fri Jul 21 08:43:44 2023 +0000 > > riscv: Implement flush_cache_vmap() > > The RISC-V kernel needs a sfence.vma after a page table > modification: we > used to rely on the vmalloc fault handling to emit an sfence.vma, but > commit 7d3332be011e ("riscv: mm: Pre-allocate PGD entries for > vmalloc/modules area") got rid of this path for 64-bit kernels, so > now we > need to explicitly emit a sfence.vma in flush_cache_vmap(). > > Note that we don't need to implement flush_cache_vunmap() as the > generic > code should emit a flush tlb after unmapping a vmalloc region. > > Fixes: 7d3332be011e ("riscv: mm: Pre-allocate PGD entries for > vmalloc/modules area") > Signed-off-by: Alexandre Ghiti <alexghiti@xxxxxxxxxxxx> > > diff --git a/arch/riscv/include/asm/cacheflush.h > b/arch/riscv/include/asm/cacheflush.h > index 8091b8bf4883..b93ffddf8a61 100644 > --- a/arch/riscv/include/asm/cacheflush.h > +++ b/arch/riscv/include/asm/cacheflush.h > @@ -37,6 +37,10 @@ static inline void flush_dcache_page(struct page *page) > #define flush_icache_user_page(vma, pg, addr, len) \ > flush_icache_mm(vma->vm_mm, 0) > > +#ifdef CONFIG_64BIT > +#define flush_cache_vmap(start, end) flush_tlb_kernel_range(start, end) > +#endif I don't want that, and flush_tlb_kernel_range is flush_tlb_all. In addition, it would call IPI, which is a performance killer. What's the problem of spurious fault replay? It only costs a local_tlb_flush with vaddr. > + > #ifndef CONFIG_SMP > > #define flush_icache_all() local_flush_icache_all() > > > Let me know if that works for you! > > > > > > -- Best Regards Guo Ren