On Fri, Jun 21, 2019 at 10:57 AM Alexander Potapenko <glider@xxxxxxxxxx> wrote: > > On Fri, Jun 21, 2019 at 9:09 AM Michal Hocko <mhocko@xxxxxxxxxx> wrote: > > > > On Mon 17-06-19 17:10:49, Alexander Potapenko wrote: > > > The new options are needed to prevent possible information leaks and > > > make control-flow bugs that depend on uninitialized values more > > > deterministic. > > > > > > init_on_alloc=1 makes the kernel initialize newly allocated pages and heap > > > objects with zeroes. Initialization is done at allocation time at the > > > places where checks for __GFP_ZERO are performed. > > > > > > init_on_free=1 makes the kernel initialize freed pages and heap objects > > > with zeroes upon their deletion. This helps to ensure sensitive data > > > doesn't leak via use-after-free accesses. > > > > > > Both init_on_alloc=1 and init_on_free=1 guarantee that the allocator > > > returns zeroed memory. The two exceptions are slab caches with > > > constructors and SLAB_TYPESAFE_BY_RCU flag. Those are never > > > zero-initialized to preserve their semantics. > > > > > > Both init_on_alloc and init_on_free default to zero, but those defaults > > > can be overridden with CONFIG_INIT_ON_ALLOC_DEFAULT_ON and > > > CONFIG_INIT_ON_FREE_DEFAULT_ON. > > > > > > Slowdown for the new features compared to init_on_free=0, > > > init_on_alloc=0: > > > > > > hackbench, init_on_free=1: +7.62% sys time (st.err 0.74%) > > > hackbench, init_on_alloc=1: +7.75% sys time (st.err 2.14%) > > > > > > Linux build with -j12, init_on_free=1: +8.38% wall time (st.err 0.39%) > > > Linux build with -j12, init_on_free=1: +24.42% sys time (st.err 0.52%) > > > Linux build with -j12, init_on_alloc=1: -0.13% wall time (st.err 0.42%) > > > Linux build with -j12, init_on_alloc=1: +0.57% sys time (st.err 0.40%) > > > > > > The slowdown for init_on_free=0, init_on_alloc=0 compared to the > > > baseline is within the standard error. > > > > > > The new features are also going to pave the way for hardware memory > > > tagging (e.g. arm64's MTE), which will require both on_alloc and on_free > > > hooks to set the tags for heap objects. With MTE, tagging will have the > > > same cost as memory initialization. > > > > > > Although init_on_free is rather costly, there are paranoid use-cases where > > > in-memory data lifetime is desired to be minimized. There are various > > > arguments for/against the realism of the associated threat models, but > > > given that we'll need the infrastructre for MTE anyway, and there are > > > people who want wipe-on-free behavior no matter what the performance cost, > > > it seems reasonable to include it in this series. > > > > Thanks for reworking the original implemenation. This looks much better! > > > > > Signed-off-by: Alexander Potapenko <glider@xxxxxxxxxx> > > > Acked-by: Kees Cook <keescook@xxxxxxxxxxxx> > > > To: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> > > > To: Christoph Lameter <cl@xxxxxxxxx> > > > To: Kees Cook <keescook@xxxxxxxxxxxx> > > > Cc: Masahiro Yamada <yamada.masahiro@xxxxxxxxxxxxx> > > > Cc: Michal Hocko <mhocko@xxxxxxxxxx> > > > Cc: James Morris <jmorris@xxxxxxxxx> > > > Cc: "Serge E. Hallyn" <serge@xxxxxxxxxx> > > > Cc: Nick Desaulniers <ndesaulniers@xxxxxxxxxx> > > > Cc: Kostya Serebryany <kcc@xxxxxxxxxx> > > > Cc: Dmitry Vyukov <dvyukov@xxxxxxxxxx> > > > Cc: Sandeep Patil <sspatil@xxxxxxxxxxx> > > > Cc: Laura Abbott <labbott@xxxxxxxxxx> > > > Cc: Randy Dunlap <rdunlap@xxxxxxxxxxxxx> > > > Cc: Jann Horn <jannh@xxxxxxxxxx> > > > Cc: Mark Rutland <mark.rutland@xxxxxxx> > > > Cc: Marco Elver <elver@xxxxxxxxxx> > > > Cc: linux-mm@xxxxxxxxx > > > Cc: linux-security-module@xxxxxxxxxxxxxxx > > > Cc: kernel-hardening@xxxxxxxxxxxxxxxxxx > > > > Acked-by: Michal Hocko <mhocko@xxxxxxx> # page allocator parts. > > > > kmalloc based parts look good to me as well but I am not sure I fill > > qualified to give my ack there without much more digging and I do not > > have much time for that now. > > > > [...] > > > diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c > > > index fd5c95ff9251..2f75dd0d0d81 100644 > > > --- a/kernel/kexec_core.c > > > +++ b/kernel/kexec_core.c > > > @@ -315,7 +315,7 @@ static struct page *kimage_alloc_pages(gfp_t gfp_mask, unsigned int order) > > > arch_kexec_post_alloc_pages(page_address(pages), count, > > > gfp_mask); > > > > > > - if (gfp_mask & __GFP_ZERO) > > > + if (want_init_on_alloc(gfp_mask)) > > > for (i = 0; i < count; i++) > > > clear_highpage(pages + i); > > > } > > > > I am not really sure I follow here. Why do we want to handle > > want_init_on_alloc here? The allocated memory comes from the page > > allocator and so it will get zeroed there. arch_kexec_post_alloc_pages > > might touch the content there but is there any actual risk of any kind > > of leak? > You're right, we don't want to initialize this memory if init_on_alloc is on. > We need something along the lines of: > if (!static_branch_unlikely(&init_on_alloc)) > if (gfp_mask & __GFP_ZERO) > // clear the pages > > Another option would be to disable initialization in alloc_pages() using a flag. > > > > > diff --git a/mm/dmapool.c b/mm/dmapool.c > > > index 8c94c89a6f7e..e164012d3491 100644 > > > --- a/mm/dmapool.c > > > +++ b/mm/dmapool.c > > > @@ -378,7 +378,7 @@ void *dma_pool_alloc(struct dma_pool *pool, gfp_t mem_flags, > > > #endif > > > spin_unlock_irqrestore(&pool->lock, flags); > > > > > > - if (mem_flags & __GFP_ZERO) > > > + if (want_init_on_alloc(mem_flags)) > > > memset(retval, 0, pool->size); > > > > > > return retval; > > > > Don't you miss dma_pool_free and want_init_on_free? > Agreed. > I'll fix this and add tests for DMA pools as well. This doesn't seem to be easy though. One needs a real DMA-capable device to allocate using DMA pools. On the other hand, what happens to a DMA pool when it's destroyed, isn't it wiped by pagealloc? I'm inclined towards not touching mm/dmapool.c in this patch series, as it is probably orthogonal to the idea of hardening the heap/pagealloc. > > -- > > Michal Hocko > > SUSE Labs > > > > -- > Alexander Potapenko > Software Engineer > > Google Germany GmbH > Erika-Mann-Straße, 33 > 80636 München > > Geschäftsführer: Paul Manicle, Halimah DeLaine Prado > Registergericht und -nummer: Hamburg, HRB 86891 > Sitz der Gesellschaft: Hamburg -- Alexander Potapenko Software Engineer Google Germany GmbH Erika-Mann-Straße, 33 80636 München Geschäftsführer: Paul Manicle, Halimah DeLaine Prado Registergericht und -nummer: Hamburg, HRB 86891 Sitz der Gesellschaft: Hamburg