On Tue, Sep 29, 2020 at 3:38 PM Marco Elver <elver@xxxxxxxxxx> wrote: > Add architecture specific implementation details for KFENCE and enable > KFENCE for the x86 architecture. In particular, this implements the > required interface in <asm/kfence.h> for setting up the pool and > providing helper functions for protecting and unprotecting pages. > > For x86, we need to ensure that the pool uses 4K pages, which is done > using the set_memory_4k() helper function. [...] > diff --git a/arch/x86/include/asm/kfence.h b/arch/x86/include/asm/kfence.h [...] > +/* Protect the given page and flush TLBs. */ > +static inline bool kfence_protect_page(unsigned long addr, bool protect) > +{ > + unsigned int level; > + pte_t *pte = lookup_address(addr, &level); > + > + if (!pte || level != PG_LEVEL_4K) Do we actually expect this to happen, or is this just a "robustness" check? If we don't expect this to happen, there should be a WARN_ON() around the condition. > + return false; > + > + if (protect) > + set_pte(pte, __pte(pte_val(*pte) & ~_PAGE_PRESENT)); > + else > + set_pte(pte, __pte(pte_val(*pte) | _PAGE_PRESENT)); Hmm... do we have this helper (instead of using the existing helpers for modifying memory permissions) to work around the allocation out of the data section? > + flush_tlb_one_kernel(addr); > + return true; > +} > + > +#endif /* _ASM_X86_KFENCE_H */ > diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c [...] > @@ -701,6 +702,9 @@ no_context(struct pt_regs *regs, unsigned long error_code, > } > #endif > > + if (kfence_handle_page_fault(address)) > + return; > + > /* > * 32-bit: > * The standard 5 lines of diff context don't really make it obvious what's going on here. Here's a diff with more context: /* * Stack overflow? During boot, we can fault near the initial * stack in the direct map, but that's not an overflow -- check * that we're in vmalloc space to avoid this. */ if (is_vmalloc_addr((void *)address) && (((unsigned long)tsk->stack - 1 - address < PAGE_SIZE) || address - ((unsigned long)tsk->stack + THREAD_SIZE) < PAGE_SIZE)) { unsigned long stack = __this_cpu_ist_top_va(DF) - sizeof(void *); /* * We're likely to be running with very little stack space * left. It's plausible that we'd hit this condition but * double-fault even before we get this far, in which case * we're fine: the double-fault handler will deal with it. * * We don't want to make it all the way into the oops code * and then double-fault, though, because we're likely to * break the console driver and lose most of the stack dump. */ asm volatile ("movq %[stack], %%rsp\n\t" "call handle_stack_overflow\n\t" "1: jmp 1b" : ASM_CALL_CONSTRAINT : "D" ("kernel stack overflow (page fault)"), "S" (regs), "d" (address), [stack] "rm" (stack)); unreachable(); } #endif + if (kfence_handle_page_fault(address)) + return; + /* * 32-bit: * * Valid to do another page fault here, because if this fault * had been triggered by is_prefetch fixup_exception would have * handled it. * * 64-bit: * * Hall of shame of CPU/BIOS bugs. */ if (is_prefetch(regs, error_code, address)) return; if (is_errata93(regs, address)) return; /* * Buggy firmware could access regions which might page fault, try to * recover from such faults. */ if (IS_ENABLED(CONFIG_EFI)) efi_recover_from_page_fault(address); oops: /* * Oops. The kernel tried to access some bad page. We'll have to * terminate things with extreme prejudice: */ flags = oops_begin(); Shouldn't kfence_handle_page_fault() happen after prefetch handling, at least? Maybe directly above the "oops" label?