On 1 May 2024, at 7:38, Ryan Roberts wrote: > Pulling in David, who may be able to advise... > > > On 01/05/2024 12:35, Ryan Roberts wrote: >> Zi Yan, I'm hoping you might have some input on the below... >> >> >> On 30/04/2024 14:31, Ryan Roberts wrote: >>> __split_huge_pmd_locked() can be called for a present THP, devmap or >>> (non-present) migration entry. It calls pmdp_invalidate() >>> unconditionally on the pmdp and only determines if it is present or not >>> based on the returned old pmd. >>> >>> But arm64's pmd_mkinvalid(), called by pmdp_invalidate(), >>> unconditionally sets the PMD_PRESENT_INVALID flag, which causes future >>> pmd_present() calls to return true - even for a swap pmd. Therefore any >>> lockless pgtable walker could see the migration entry pmd in this state >>> and start interpretting the fields (e.g. pmd_pfn()) as if it were >>> present, leading to BadThings (TM). GUP-fast appears to be one such >>> lockless pgtable walker. >>> >>> While the obvious fix is for core-mm to avoid such calls for non-present >>> pmds (pmdp_invalidate() will also issue TLBI which is not necessary for >>> this case either), all other arches that implement pmd_mkinvalid() do it >>> in such a way that it is robust to being called with a non-present pmd. >> >> OK the plot thickens; The tests I wrote to check that pmd_mkinvalid() is safe for swap entries fails on x86_64. See below... >> >>> So it is simpler and safer to make arm64 robust too. This approach means >>> we can even add tests to debug_vm_pgtable.c to validate the required >>> behaviour. >>> >>> This is a theoretical bug found during code review. I don't have any >>> test case to trigger it in practice. >>> >>> Cc: stable@xxxxxxxxxxxxxxx >>> Fixes: 53fa117bb33c ("arm64/mm: Enable THP migration") >>> Signed-off-by: Ryan Roberts <ryan.roberts@xxxxxxx> >>> --- >>> >>> Hi all, >>> >>> v1 of this fix [1] took the approach of fixing core-mm to never call >>> pmdp_invalidate() on a non-present pmd. But Zi Yan highlighted that only arm64 >>> suffers this problem; all other arches are robust. So his suggestion was to >>> instead make arm64 robust in the same way and add tests to validate it. Despite >>> my stated reservations in the context of the v1 discussion, having thought on it >>> for a bit, I now agree with Zi Yan. Hence this post. >>> >>> Andrew has v1 in mm-unstable at the moment, so probably the best thing to do is >>> remove it from there and have this go in through the arm64 tree? Assuming there >>> is agreement that this approach is right one. >>> >>> This applies on top of v6.9-rc5. Passes all the mm selftests on arm64. >>> >>> [1] https://lore.kernel.org/linux-mm/20240425170704.3379492-1-ryan.roberts@xxxxxxx/ >>> >>> Thanks, >>> Ryan >>> >>> >>> arch/arm64/include/asm/pgtable.h | 12 +++++-- >>> mm/debug_vm_pgtable.c | 61 ++++++++++++++++++++++++++++++++ >>> 2 files changed, 71 insertions(+), 2 deletions(-) >>> >>> diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h >>> index afdd56d26ad7..7d580271a46d 100644 >>> --- a/arch/arm64/include/asm/pgtable.h >>> +++ b/arch/arm64/include/asm/pgtable.h >>> @@ -511,8 +511,16 @@ static inline int pmd_trans_huge(pmd_t pmd) >>> >>> static inline pmd_t pmd_mkinvalid(pmd_t pmd) >>> { >>> - pmd = set_pmd_bit(pmd, __pgprot(PMD_PRESENT_INVALID)); >>> - pmd = clear_pmd_bit(pmd, __pgprot(PMD_SECT_VALID)); >>> + /* >>> + * If not valid then either we are already present-invalid or we are >>> + * not-present (i.e. none or swap entry). We must not convert >>> + * not-present to present-invalid. Unbelievably, the core-mm may call >>> + * pmd_mkinvalid() for a swap entry and all other arches can handle it. >>> + */ >>> + if (pmd_valid(pmd)) { >>> + pmd = set_pmd_bit(pmd, __pgprot(PMD_PRESENT_INVALID)); >>> + pmd = clear_pmd_bit(pmd, __pgprot(PMD_SECT_VALID)); >>> + } >>> >>> return pmd; >>> } >>> diff --git a/mm/debug_vm_pgtable.c b/mm/debug_vm_pgtable.c >>> index 65c19025da3d..7e9c387d06b0 100644 >>> --- a/mm/debug_vm_pgtable.c >>> +++ b/mm/debug_vm_pgtable.c >>> @@ -956,6 +956,65 @@ static void __init hugetlb_basic_tests(struct pgtable_debug_args *args) { } >>> #endif /* CONFIG_HUGETLB_PAGE */ >>> >>> #ifdef CONFIG_TRANSPARENT_HUGEPAGE >>> +#if !defined(__HAVE_ARCH_PMDP_INVALIDATE) && defined(CONFIG_ARCH_ENABLE_THP_MIGRATION) >>> +static void __init swp_pmd_mkinvalid_tests(struct pgtable_debug_args *args) >>> +{ >> >> Printing various values at different locations in this function for debug: >> >>> + unsigned long max_swap_offset; >>> + swp_entry_t swp_set, swp_clear, swp_convert; >>> + pmd_t pmd_set, pmd_clear; >>> + >>> + /* >>> + * See generic_max_swapfile_size(): probe the maximum offset, then >>> + * create swap entry will all possible bits set and a swap entry will >>> + * all bits clear. >>> + */ >>> + max_swap_offset = swp_offset(pmd_to_swp_entry(swp_entry_to_pmd(swp_entry(0, ~0UL)))); >>> + swp_set = swp_entry((1 << MAX_SWAPFILES_SHIFT) - 1, max_swap_offset); >>> + swp_clear = swp_entry(0, 0); >>> + >>> + /* Convert to pmd. */ >>> + pmd_set = swp_entry_to_pmd(swp_set); >>> + pmd_clear = swp_entry_to_pmd(swp_clear); >> >> [ 0.702163] debug_vm_pgtable: [swp_pmd_mkinvalid_tests ]: valid: pmd_set=f800000000000000, pmd_clear=7fffffffffffe00 >> >>> + >>> + /* >>> + * Sanity check that the pmds are not-present, not-huge and swap entry >>> + * is recoverable without corruption. >>> + */ >>> + WARN_ON(pmd_present(pmd_set)); >>> + WARN_ON(pmd_trans_huge(pmd_set)); >>> + swp_convert = pmd_to_swp_entry(pmd_set); >>> + WARN_ON(swp_type(swp_set) != swp_type(swp_convert)); >>> + WARN_ON(swp_offset(swp_set) != swp_offset(swp_convert)); >>> + WARN_ON(pmd_present(pmd_clear)); >>> + WARN_ON(pmd_trans_huge(pmd_clear)); >>> + swp_convert = pmd_to_swp_entry(pmd_clear); >>> + WARN_ON(swp_type(swp_clear) != swp_type(swp_convert)); >>> + WARN_ON(swp_offset(swp_clear) != swp_offset(swp_convert)); >>> + >>> + /* Now invalidate the pmd. */ >>> + pmd_set = pmd_mkinvalid(pmd_set); >>> + pmd_clear = pmd_mkinvalid(pmd_clear); >> >> [ 0.704452] debug_vm_pgtable: [swp_pmd_mkinvalid_tests ]: invalid: pmd_set=f800000000000000, pmd_clear=7ffffffffe00e00 >> >>> + >>> + /* >>> + * Since its a swap pmd, invalidation should effectively be a noop and >>> + * the checks we already did should give the same answer. Check the >>> + * invalidation didn't corrupt any fields. >>> + */ >>> + WARN_ON(pmd_present(pmd_set)); >>> + WARN_ON(pmd_trans_huge(pmd_set)); >>> + swp_convert = pmd_to_swp_entry(pmd_set); >> >> [ 0.706461] debug_vm_pgtable: [swp_pmd_mkinvalid_tests ]: set: swp=7c03ffffffffffff (1f, 3ffffffffffff), convert=7c03ffffffffffff (1f, 3ffffffffffff) >> >>> + WARN_ON(swp_type(swp_set) != swp_type(swp_convert)); >>> + WARN_ON(swp_offset(swp_set) != swp_offset(swp_convert)); >>> + WARN_ON(pmd_present(pmd_clear)); >>> + WARN_ON(pmd_trans_huge(pmd_clear)); >>> + swp_convert = pmd_to_swp_entry(pmd_clear); >> >> [ 0.708841] debug_vm_pgtable: [swp_pmd_mkinvalid_tests ]: clear: swp=0 (0, 0), convert=ff8 (0, ff8) >> >>> + WARN_ON(swp_type(swp_clear) != swp_type(swp_convert)); >>> + WARN_ON(swp_offset(swp_clear) != swp_offset(swp_convert)); >> >> This line fails on x86_64. >> >> The logs show that the offset is indeed being corrupted by pmd_mkinvalid(); 0 -> 0xff8. >> >> I think this is due to x86's pmd_mkinvalid() assuming the pmd is present; pmd_flags() and pmd_pfn() do all sorts of weird and wonderful things. >> >> So does this take us full circle? Are we now back to modifying the core-mm to never call pmd_mkinvalid() on a non-present entry? If so, then I guess we should remove the arm64 fix from for-next/fixes. If x86_64's pmd_mkinvalid() also corrupts swap entries, yes, your original fix is better. I will dig into the x86 code more to figure out what goes wrong. Last time, I only checked PAGE_* bits in these pmd|pte_* operations. Sorry for the misinformation. >> >>> +} >>> +#else >>> +static void __init swp_pmd_mkinvalid_tests(struct pgtable_debug_args *args) { } >>> +#endif /* !__HAVE_ARCH_PMDP_INVALIDATE && CONFIG_ARCH_ENABLE_THP_MIGRATION */ >>> + >>> static void __init pmd_thp_tests(struct pgtable_debug_args *args) >>> { >>> pmd_t pmd; >>> @@ -982,6 +1041,8 @@ static void __init pmd_thp_tests(struct pgtable_debug_args *args) >>> WARN_ON(!pmd_trans_huge(pmd_mkinvalid(pmd_mkhuge(pmd)))); >>> WARN_ON(!pmd_present(pmd_mkinvalid(pmd_mkhuge(pmd)))); >>> #endif /* __HAVE_ARCH_PMDP_INVALIDATE */ >>> + >>> + swp_pmd_mkinvalid_tests(args); >>> } >>> >>> #ifdef CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD >>> -- >>> 2.25.1 >>> >> -- Best Regards, Yan, Zi
Attachment:
signature.asc
Description: OpenPGP digital signature