Hi Lina, [+Ard, Mark and Ryan], On Thu, Jul 18, 2024 at 06:39:10PM +0900, Asahi Lina wrote: > I ran into this with the Asahi Linux downstream kernel, based on v6.9.9, > but I believe the problem is also still upstream. The issue seems to be > an interaction between folding one page table level at compile time and > another one at runtime. Thanks for reporting this! > With this config, we have: > > CONFIG_PGTABLE_LEVELS=4 > PAGE_SHIFT=14 > PMD_SHIFT=25 > PUD_SHIFT=36 > PGDIR_SHIFT=47 > pgtable_l5_enabled() == false (compile time) > pgtable_l4_enabled() == false (runtime, due to no LPA2) I think this is 'defconfig' w/ 16k pages, although I wasn't able to trigger the issue quickly under QEMU with that. Your analysis looks correct, however. > With p4d folded at compile-time, and pud folded at runtime when LPA2 is > not supported. > > With this setup, pgd_offset() is broken since the pgd is actually > supposed to become a pud but the shift is wrong, as it is set at compile > time: > > #define pgd_index(a) (((a) >> PGDIR_SHIFT) & (PTRS_PER_PGD - 1)) > > static inline pgd_t *pgd_offset_pgd(pgd_t *pgd, unsigned long address) > { > return (pgd + pgd_index(address)); > }; > > Then we follow the gup logic (abbreviated): > > gup_pgd_range: > pgdp = pgd_offset(current->mm, addr); > pgd_t pgd = READ_ONCE(*pgdp); > > At this point, pgd is just the 0th entry of the top level page table > (since those extra address bits will always be 0 for valid 47-bit user > addresses). > > p4d then gets folded via pgtable-nop4d.h: > > gup_p4d_range: > p4dp = p4d_offset_lockless(pgdp, pgd, addr); > = p4d_offset(&(pgd), address) > = &pgd > p4d_t p4d = READ_ONCE(*p4dp); > > Now we have p4dp = stack address of pgd, and p4d = pgd. > > gup_pud_range: > pudp = pud_offset_lockless(p4dp, p4d, addr); > -> if (!pgtable_l4_enabled()) > = p4d_to_folded_pud(p4dp, addr); > = (pud_t *)PTR_ALIGN_DOWN(p4dp, PAGE_SIZE) + pud_index(addr); > pud_t pud = READ_ONCE(*pudp); > > Which is bad pointer math because it only works if p4dp points to a real > page table entry inside a page table, not a single u64 stack address. Cheers for the explanation; I agree that 6.10 looks like it's affected in the same way, even though I couldn't reproduce the crash. I think the root of the problem is that p4d_offset_lockless() returns a stack address when the p4d level is folded. I wondered about changing the dummy pXd_offset_lockless() definitions in linux/pgtable.h to pass the real pointer through instead of the address of the local, but then I suppose _most_ pXd_offset() implementations are going to dereference that and it would break the whole point of having _lockless routines to start with. What if we provided our own implementation of p4d_offset_lockless() for the folding case, which could just propagate the page-table pointer? Diff below. > This causes random oopses in internal_get_user_pages_fast and related > codepaths. Do you have a reliable way to trigger those? I tried doing some GUPpy things like strace (access_process_vm()) but it all seemed fine. Thanks, Will --->8 diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h index f8efbc128446..3afe624a39e1 100644 --- a/arch/arm64/include/asm/pgtable.h +++ b/arch/arm64/include/asm/pgtable.h @@ -1065,6 +1065,13 @@ static inline bool pgtable_l5_enabled(void) { return false; } #define p4d_offset_kimg(dir,addr) ((p4d_t *)dir) +static inline +p4d_t *p4d_offset_lockless(pgd_t *pgdp, pgd_t pgd, unsigned long addr) +{ + return p4d_offset(pgdp, addr); +} +#define p4d_offset_lockless p4d_offset_lockless + #endif /* CONFIG_PGTABLE_LEVELS > 4 */ #define pgd_ERROR(e) \