Hi, I replied ot the original RFC before spotting this; duplicating those comments here because I think they apply regardless of the mechanism used to work around this. On Tue, Jan 18, 2022 at 03:52:44PM -0800, Yury Norov wrote: > vmap() takes struct page *pages as one of arguments, and user may provide > an invalid pointer which would lead to DABT at address translation later. > > Currently, kernel checks the pages against NULL. In my case, however, the > address was not NULL, and was big enough so that the hardware generated > Address Size Abort on arm64. Can you give an example of when this might happen? It sounds like you're actually hitting this, so a backtrace would be nice. I'm a bit confused as to when why we'd try to vmap() pages that we didn't have a legitimate struct page for -- where did these addresses come from? It sounds like this is going wrong at a higher level, and we're passing entirely bogus struct page pointers around. This seems like the sort of thing DEBUG_VIRTUAL or similar should check when we initially generate the struct page pointer. > Interestingly, this abort happens even if copy_from_kernel_nofault() is > used, which is quite inconvenient for debugging purposes. I can go take a look at this, but TBH we never expect to take an address size fault to begin with, so this is arguably correct -- it's an internal consistency problem. > This patch adds a pfn_valid() check into vmap() path, so that invalid > mapping will not be created. > > RFC: https://lkml.org/lkml/2022/1/18/815 > v1: use pfn_valid() instead of adding an arch-specific > arch_vmap_page_valid(). Thanks to Matthew Wilcox for the hint. > > Signed-off-by: Yury Norov <yury.norov@xxxxxxxxx> > --- > mm/vmalloc.c | 2 ++ > 1 file changed, 2 insertions(+) > > diff --git a/mm/vmalloc.c b/mm/vmalloc.c > index d2a00ad4e1dd..a4134ee56b10 100644 > --- a/mm/vmalloc.c > +++ b/mm/vmalloc.c > @@ -477,6 +477,8 @@ static int vmap_pages_pte_range(pmd_t *pmd, unsigned long addr, > return -EBUSY; > if (WARN_ON(!page)) > return -ENOMEM; > + if (WARN_ON(!pfn_valid(page_to_pfn(page)))) > + return -EINVAL; My fear here is that for this to fire, we've already passed a bogus struct page pointer around the intermediate infrastructure, and any of that might try to use it in unsafe ways (in future even if we don't use it today). I think the fundamental issue here is that we generate a bogus struct page pointer at all, and knowing where that came from would help to fix that. Thanks, Mark. > set_pte_at(&init_mm, addr, pte, mk_pte(page, prot)); > (*nr)++; > } while (pte++, addr += PAGE_SIZE, addr != end); > -- > 2.30.2 >