On Tue, Jan 18, 2022 at 10:17 PM Anshuman Khandual <anshuman.khandual@xxxxxxx> wrote: > > > > On 1/19/22 6:21 AM, Matthew Wilcox wrote: > > On Tue, Jan 18, 2022 at 03:52:44PM -0800, Yury Norov wrote: > >> vmap() takes struct page *pages as one of arguments, and user may provide > >> an invalid pointer which would lead to DABT at address translation later. > > > > Could we spell out 'DABT'? Presumably that's an ARM-specific thing. > > Just like we don't say #PF for Intel page faults, I think this is > > probably a 'data abort'? > > Right, it is data abort. > > > > >> Currently, kernel checks the pages against NULL. In my case, however, the > >> address was not NULL, and was big enough so that the hardware generated > >> Address Size Abort on arm64. > > Could you please provide the actual abort stack here with details. [ 665.484101] Unhandled fault at 0xffff8000252cd000 [ 665.488807] Mem abort info: [ 665.491617] ESR = 0x96000043 [ 665.494675] EC = 0x25: DABT (current EL), IL = 32 bits [ 665.499985] SET = 0, FnV = 0 [ 665.503039] EA = 0, S1PTW = 0 [ 665.506167] Data abort info: [ 665.509047] ISV = 0, ISS = 0x00000043 [ 665.512882] CM = 0, WnR = 1 [ 665.515851] swapper pgtable: 4k pages, 48-bit VAs, pgdp=00000000818cb000 [ 665.522550] [ffff8000252cd000] pgd=000000affcfff003, pud=000000affcffe003, pmd=0000008fad8c3003, pte=00688000a5217713 [ 665.533160] Internal error: level 3 address size fault: 96000043 [#1] SMP [ 665.539936] Modules linked in: [...] [ 665.616212] CPU: 178 PID: 13199 Comm: test Tainted: P OE 5.4.0-84-generic #94~18.04.1-Ubuntu [ 665.626806] Hardware name: HPE Apollo 70 /C01_APACHE_MB , BIOS L50_5.13_1.0.6 07/10/2018 [ 665.636618] pstate: 80400009 (Nzcv daif +PAN -UAO) [ 665.641407] pc : __memset+0x38/0x188 [ 665.645146] lr : test+0xcc/0x3f8 [ 665.650184] sp : ffff8000359bb840 [ 665.653486] x29: ffff8000359bb840 x28: 0000000000000000 [ 665.658785] x27: 0000000000000000 x26: 0000000000231000 [ 665.664083] x25: ffff00ae660f6110 x24: ffff00ae668cb800 [ 665.669382] x23: 0000000000000001 x22: ffff00af533e5000 [ 665.674680] x21: 0000000000001000 x20: 0000000000000000 [ 665.679978] x19: ffff00ae66950000 x18: ffffffffffffffff [ 665.685276] x17: 00000000588636a5 x16: 0000000000000013 [ 665.690574] x15: ffffffffffffffff x14: 000000000007ffff [ 665.695872] x13: 0000000080000000 x12: 0140000000000000 [ 665.701170] x11: 0000000000000041 x10: ffff8000652cd000 [ 665.706468] x9 : ffff8000252cf000 x8 : ffff8000252cd000 [ 665.711767] x7 : 0303030303030303 x6 : 0000000000001000 [ 665.717065] x5 : ffff8000252cd000 x4 : 0000000000000000 [ 665.722363] x3 : ffff8000252cdfff x2 : 0000000000000001 [ 665.727661] x1 : 0000000000000003 x0 : ffff8000252cd000 [ 665.732960] Call trace: [ 665.735395] __memset+0x38/0x188 [...] TCR_EL1 is 34b5503510, and so TCR_EL1.IPS is 0b100. The pfn that caused address size abort has bit#47 set, which is far above 16TB that is allowed by ips == 100. > >> > >> Interestingly, this abort happens even if copy_from_kernel_nofault() is > >> used, which is quite inconvenient for debugging purposes. > >> > >> This patch adds a pfn_valid() check into vmap() path, so that invalid > >> mapping will not be created. > >> > >> RFC: https://lkml.org/lkml/2022/1/18/815 > >> v1: use pfn_valid() instead of adding an arch-specific > >> arch_vmap_page_valid(). Thanks to Matthew Wilcox for the hint. > > This should be after the '---' below. > > >> > >> Signed-off-by: Yury Norov <yury.norov@xxxxxxxxx> > > > > Suggested-by: Matthew Wilcox (Oracle) <willy@xxxxxxxxxxxxx> > > > >> --- > >> mm/vmalloc.c | 2 ++ > >> 1 file changed, 2 insertions(+) > >> > >> diff --git a/mm/vmalloc.c b/mm/vmalloc.c > >> index d2a00ad4e1dd..a4134ee56b10 100644 > >> --- a/mm/vmalloc.c > >> +++ b/mm/vmalloc.c > >> @@ -477,6 +477,8 @@ static int vmap_pages_pte_range(pmd_t *pmd, unsigned long addr, > >> return -EBUSY; > >> if (WARN_ON(!page)) > >> return -ENOMEM; > >> + if (WARN_ON(!pfn_valid(page_to_pfn(page)))) > >> + return -EINVAL; > >> set_pte_at(&init_mm, addr, pte, mk_pte(page, prot)); > >> (*nr)++; > >> } while (pte++, addr += PAGE_SIZE, addr != end); > >> -- > >> 2.30.2 > >> > > Why should not this just scan over the entire user provided struct page > array and make sure that all pages there in are valid via above method, > but in vmap() itself before calling vmap_pages_range(). Because seems > like a single invalid page detected in vmap_pages_pte_range() will > anyways abort the entire vmap(). This will also enable us to drop the > existing NULL check above. I can do this, but why is it any better than the current approach? Thanks, Yury