Re: LPA2 on non-LPA2 hardware broken with 16K pages

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On 7/18/24 18:44, Will Deacon wrote:
Hi Lina, [+Ard, Mark and Ryan],

On Thu, Jul 18, 2024 at 06:39:10PM +0900, Asahi Lina wrote:
I ran into this with the Asahi Linux downstream kernel, based on v6.9.9,
but I believe the problem is also still upstream. The issue seems to be
an interaction between folding one page table level at compile time and
another one at runtime.
Thanks for reporting this!

With this config, we have:

CONFIG_PGTABLE_LEVELS=4
PAGE_SHIFT=14
PMD_SHIFT=25
PUD_SHIFT=36
PGDIR_SHIFT=47
pgtable_l5_enabled() == false (compile time)
pgtable_l4_enabled() == false (runtime, due to no LPA2)
I think this is 'defconfig' w/ 16k pages, although I wasn't able to
trigger the issue quickly under QEMU with that. Your analysis looks
correct, however.

Hi Will,

I was also trying to debug this; indeed this is 16K defconfig, and
pgtable_l4_enabled() is returning false on non-LPA2 hardware. Is this
the intended behaviour? Don't we require 4-level pagetable to resolve
virtual addresses on 16K?


With p4d folded at compile-time, and pud folded at runtime when LPA2 is
not supported.

With this setup, pgd_offset() is broken since the pgd is actually
supposed to become a pud but the shift is wrong, as it is set at compile
time:

#define pgd_index(a)  (((a) >> PGDIR_SHIFT) & (PTRS_PER_PGD - 1))

static inline pgd_t *pgd_offset_pgd(pgd_t *pgd, unsigned long address)
{
         return (pgd + pgd_index(address));
};

Then we follow the gup logic (abbreviated):

gup_pgd_range:
     pgdp = pgd_offset(current->mm, addr);
     pgd_t pgd = READ_ONCE(*pgdp);

At this point, pgd is just the 0th entry of the top level page table
(since those extra address bits will always be 0 for valid 47-bit user
addresses).

p4d then gets folded via pgtable-nop4d.h:

gup_p4d_range:
     p4dp = p4d_offset_lockless(pgdp, pgd, addr);
          = p4d_offset(&(pgd), address)
          = &pgd
     p4d_t p4d = READ_ONCE(*p4dp);

Now we have p4dp = stack address of pgd, and p4d = pgd.

gup_pud_range:
     pudp = pud_offset_lockless(p4dp, p4d, addr);
          -> if (!pgtable_l4_enabled())
            = p4d_to_folded_pud(p4dp, addr);
            = (pud_t *)PTR_ALIGN_DOWN(p4dp, PAGE_SIZE) + pud_index(addr);
     pud_t pud = READ_ONCE(*pudp);

Which is bad pointer math because it only works if p4dp points to a real
page table entry inside a page table, not a single u64 stack address.
Cheers for the explanation; I agree that 6.10 looks like it's affected
in the same way, even though I couldn't reproduce the crash. I think the
root of the problem is that p4d_offset_lockless() returns a stack
address when the p4d level is folded. I wondered about changing the
dummy pXd_offset_lockless() definitions in linux/pgtable.h to pass the
real pointer through instead of the address of the local, but then I
suppose _most_ pXd_offset() implementations are going to dereference
that and it would break the whole point of having _lockless routines
to start with.

What if we provided our own implementation of p4d_offset_lockless()
for the folding case, which could just propagate the page-table pointer?
Diff below.

This causes random oopses in internal_get_user_pages_fast and related
codepaths.
Do you have a reliable way to trigger those? I tried doing some GUPpy
things like strace (access_process_vm()) but it all seemed fine.

Thanks,

Will

--->8

diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
index f8efbc128446..3afe624a39e1 100644
--- a/arch/arm64/include/asm/pgtable.h
+++ b/arch/arm64/include/asm/pgtable.h
@@ -1065,6 +1065,13 @@ static inline bool pgtable_l5_enabled(void) { return false; }
#define p4d_offset_kimg(dir,addr) ((p4d_t *)dir) +static inline
+p4d_t *p4d_offset_lockless(pgd_t *pgdp, pgd_t pgd, unsigned long addr)
+{
+       return p4d_offset(pgdp, addr);
+}
+#define p4d_offset_lockless p4d_offset_lockless
+
  #endif  /* CONFIG_PGTABLE_LEVELS > 4 */
#define pgd_ERROR(e) \






[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux