On Thu, Oct 29, 2020 at 2:17 PM Marco Elver <elver@xxxxxxxxxx> wrote: > Add architecture specific implementation details for KFENCE and enable > KFENCE for the x86 architecture. In particular, this implements the > required interface in <asm/kfence.h> for setting up the pool and > providing helper functions for protecting and unprotecting pages. > > For x86, we need to ensure that the pool uses 4K pages, which is done > using the set_memory_4k() helper function. > > Reviewed-by: Dmitry Vyukov <dvyukov@xxxxxxxxxx> > Co-developed-by: Marco Elver <elver@xxxxxxxxxx> > Signed-off-by: Marco Elver <elver@xxxxxxxxxx> > Signed-off-by: Alexander Potapenko <glider@xxxxxxxxxx> [...] > diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c [...] > @@ -725,6 +726,9 @@ no_context(struct pt_regs *regs, unsigned long error_code, > if (IS_ENABLED(CONFIG_EFI)) > efi_recover_from_page_fault(address); > > + if (kfence_handle_page_fault(address)) > + return; We can also get to this point due to an attempt to execute a data page. That's very unlikely (given that the same thing would also crash if you tried to do it with normal heap memory, and KFENCE allocations are extremely rare); but we might want to try to avoid handling such faults as KFENCE faults, since KFENCE will assume that it has resolved the fault and retry execution of the faulting instruction. Once kernel protection keys are introduced, those might cause the same kind of trouble. So we might want to gate this on a check like "if ((error_code & X86_PF_PROT) == 0)" (meaning "only handle the fault if the fault was caused by no page being present", see enum x86_pf_error_code). Unrelated sidenote: Since we're hooking after exception fixup handling, the debug-only KFENCE_STRESS_TEST_FAULTS can probably still cause some behavioral differences through spurious faults in places like copy_user_enhanced_fast_string (where the exception table entries are used even if the *kernel* pointer, not the user pointer, causes a fault). But since KFENCE_STRESS_TEST_FAULTS is exclusively for KFENCE development, the difference might not matter. And ordering them the other way around definitely isn't possible, because the kernel relies on being able to fixup OOB reads. So there probably isn't really anything we can do better here; it's just something to keep in mind. Maybe you can add a little warning to the help text for that Kconfig entry that warns people about this? > + > oops: > /* > * Oops. The kernel tried to access some bad page. We'll have to > -- > 2.29.1.341.ge80a0c044ae-goog >