On Wed, Oct 7, 2020 at 3:09 PM Marco Elver <elver@xxxxxxxxxx> wrote: > On Fri, 2 Oct 2020 at 07:45, Jann Horn <jannh@xxxxxxxxxx> wrote: > > On Tue, Sep 29, 2020 at 3:38 PM Marco Elver <elver@xxxxxxxxxx> wrote: > > > Add architecture specific implementation details for KFENCE and enable > > > KFENCE for the x86 architecture. In particular, this implements the > > > required interface in <asm/kfence.h> for setting up the pool and > > > providing helper functions for protecting and unprotecting pages. > > > > > > For x86, we need to ensure that the pool uses 4K pages, which is done > > > using the set_memory_4k() helper function. > > [...] > > > diff --git a/arch/x86/include/asm/kfence.h b/arch/x86/include/asm/kfence.h > > [...] > > > +/* Protect the given page and flush TLBs. */ > > > +static inline bool kfence_protect_page(unsigned long addr, bool protect) > > > +{ > > > + unsigned int level; > > > + pte_t *pte = lookup_address(addr, &level); > > > + > > > + if (!pte || level != PG_LEVEL_4K) > > > > Do we actually expect this to happen, or is this just a "robustness" > > check? If we don't expect this to happen, there should be a WARN_ON() > > around the condition. > > It's not obvious here, but we already have this covered with a WARN: > the core.c code has a KFENCE_WARN_ON, which disables KFENCE on a > warning. So for this specific branch: Can it ever happen? If not, please either remove it or add WARN_ON(). That serves two functions: It ensures that if something unexpected happens, we see a warning, and it hints to people reading the code "this isn't actually expected to happen, you don't have to wrack your brain trying to figure out for which scenario this branch is intended". > > > + return false; > > > + > > > + if (protect) > > > + set_pte(pte, __pte(pte_val(*pte) & ~_PAGE_PRESENT)); > > > + else > > > + set_pte(pte, __pte(pte_val(*pte) | _PAGE_PRESENT)); > > > > Hmm... do we have this helper (instead of using the existing helpers > > for modifying memory permissions) to work around the allocation out of > > the data section? > > I just played around with using the set_memory.c functions, to remind > myself why this didn't work. I experimented with using > set_memory_{np,p}() functions; set_memory_p() isn't implemented, but > is easily added (which I did for below experiment). However, this > didn't quite work: [...] > For one, smp_call_function_many_cond() doesn't want to be called with > interrupts disabled, and we may very well get a KFENCE allocation or > page fault with interrupts disabled / within interrupts. > > Therefore, to be safe, we should avoid IPIs. set_direct_map_invalid_noflush() does that, too, I think? And that's already implemented for both arm64 and x86. > It follows that setting > the page attribute is best-effort, and we can tolerate some > inaccuracy. Lazy fault handling should take care of faults after we > set the page as PRESENT. [...] > > Shouldn't kfence_handle_page_fault() happen after prefetch handling, > > at least? Maybe directly above the "oops" label? > > Good question. AFAIK it doesn't matter, as is_kfence_address() should > never apply for any of those that follow, right? In any case, it > shouldn't hurt to move it down. is_prefetch() ignores any #PF not caused by instruction fetch if it comes from kernel mode and the faulting instruction is one of the PREFETCH* instructions. (Which is not supposed to happen - the processor should just be ignoring the fault for PREFETCH instead of generating an exception AFAIK. But the comments say that this is about CPU bugs and stuff.) While this is probably not a big deal anymore partly because the kernel doesn't use software prefetching in many places anymore, it seems to me like, in principle, this could also cause page faults that should be ignored in KFENCE regions if someone tries to do PREFETCH on an out-of-bounds array element or a dangling pointer or something.