Re: [PATCH] vmap(): don't allow invalid pages

Yury Norov <yury.norov@xxxxxxxxx> · Wed, 19 Jan 2022 09:00:23 -0800

On Wed, Jan 19, 2022 at 3:17 AM Mark Rutland <mark.rutland@xxxxxxx> wrote:
>
> Hi,
>
> I replied ot the original RFC before spotting this; duplicating those comments
> here because I think they apply regardless of the mechanism used to work around
> this.
>
> On Tue, Jan 18, 2022 at 03:52:44PM -0800, Yury Norov wrote:
> > vmap() takes struct page *pages as one of arguments, and user may provide
> > an invalid pointer which would lead to DABT at address translation later.
> >
> > Currently, kernel checks the pages against NULL. In my case, however, the
> > address was not NULL, and was big enough so that the hardware generated
> > Address Size Abort on arm64.
>
> Can you give an example of when this might happen? It sounds like you're
> actually hitting this, so a backtrace would be nice.
>
> I'm a bit confused as to when why we'd try to vmap() pages that we
> didn't have a legitimate struct page for -- where did these addresses
> come from?
>
> It sounds like this is going wrong at a higher level, and we're passing
> entirely bogus struct page pointers around. This seems like the sort of
> thing DEBUG_VIRTUAL or similar should check when we initially generate
> the struct page pointer.

Hi Mark,

This is an out-of-tree code that does:

    vaddr1 = dma_alloc_coherent()
    page = virt_to_page() // Wrong here
    ...
    vaddr2 = vmap(page)
    memset(vaddr2) // Fault here

virt_to_page() returns a wrong pointer if vaddr1 is not a linear kernel address.
The problem is that vmap() populates pte with bad pfn successfully, and it's
much harder to debug at memory access time.

> > Interestingly, this abort happens even if copy_from_kernel_nofault() is
> > used, which is quite inconvenient for debugging purposes.
>
> I can go take a look at this, but TBH we never expect to take an address size
> fault to begin with, so this is arguably correct -- it's an internal
> consistency problem.
>
> > This patch adds a pfn_valid() check into vmap() path, so that invalid
> > mapping will not be created.
> >
> > RFC: https://lkml.org/lkml/2022/1/18/815
> > v1: use pfn_valid() instead of adding an arch-specific
> >     arch_vmap_page_valid(). Thanks to Matthew Wilcox for the hint.
> >
> > Signed-off-by: Yury Norov <yury.norov@xxxxxxxxx>
> > ---
> >  mm/vmalloc.c | 2 ++
> >  1 file changed, 2 insertions(+)
> >
> > diff --git a/mm/vmalloc.c b/mm/vmalloc.c
> > index d2a00ad4e1dd..a4134ee56b10 100644
> > --- a/mm/vmalloc.c
> > +++ b/mm/vmalloc.c
> > @@ -477,6 +477,8 @@ static int vmap_pages_pte_range(pmd_t *pmd, unsigned long addr,
> >                       return -EBUSY;
> >               if (WARN_ON(!page))
> >                       return -ENOMEM;
> > +             if (WARN_ON(!pfn_valid(page_to_pfn(page))))
> > +                     return -EINVAL;
>
> My fear here is that for this to fire, we've already passed a bogus struct page
> pointer around the intermediate infrastructure, and any of that might try to
> use it in unsafe ways (in future even if we don't use it today).
>
> I think the fundamental issue here is that we generate a bogus struct page
> pointer at all, and knowing where that came from would help to fix that.

You're right. That's why WARN_ON() is used for the page == null in the code
above, I believe, - to let client code know that something goes wrong, and it's
not a regular ENOMEM situation.

Thanks,
Yury

> Thanks,
> Mark.
>
> >               set_pte_at(&init_mm, addr, pte, mk_pte(page, prot));
> >               (*nr)++;
> >       } while (pte++, addr += PAGE_SIZE, addr != end);
> > --
> > 2.30.2
> >