There is a faint risk that __pte_offset_map(), on a 32-bit architecture with a 64-bit pmd_t e.g. x86-32 with CONFIG_X86_PAE=y, would succeed on a pmdval assembled from a pmd_low and a pmd_high which never belonged together: their combination not pointing to a page table at all, perhaps not even a valid pfn. pmdp_get_lockless() is not enough to prevent that. Guard against that (on such configs) by local_irq_save() blocking TLB flush between present updates, as linux/pgtable.h suggests. It's only needed around the pmdp_get_lockless() in __pte_offset_map(): a race when __pte_offset_map_lock() repeats the pmdp_get_lockless() after getting the lock, would just send it back to __pte_offset_map() again. CONFIG_GUP_GET_PXX_LOW_HIGH is enabled when required by mips, sh and x86. It is not enabled by arm-32 CONFIG_ARM_LPAE: my understanding is that Will Deacon's 2020 enhancements to READ_ONCE() are sufficient for arm. It is not enabled by arc, but its pmd_t is 32-bit even when pte_t 64-bit. Limit the IRQ disablement to CONFIG_HIGHPTE? Perhaps, but would need a little more work, to retry if pmd_low good for page table, but pmd_high non-zero from THP (and that might be making x86-specific assumptions). Signed-off-by: Hugh Dickins <hughd@xxxxxxxxxx> --- mm/pgtable-generic.c | 20 ++++++++++++++++++++ 1 file changed, 20 insertions(+) diff --git a/mm/pgtable-generic.c b/mm/pgtable-generic.c index 674671835631..d28b63386cef 100644 --- a/mm/pgtable-generic.c +++ b/mm/pgtable-generic.c @@ -232,12 +232,32 @@ pmd_t pmdp_collapse_flush(struct vm_area_struct *vma, unsigned long address, #endif #endif /* CONFIG_TRANSPARENT_HUGEPAGE */ +#if defined(CONFIG_GUP_GET_PXX_LOW_HIGH) && \ + (defined(CONFIG_SMP) || defined(CONFIG_PREEMPT_RCU)) +/* + * See the comment above ptep_get_lockless() in include/linux/pgtable.h: + * the barriers in pmdp_get_lockless() cannot guarantee that the value in + * pmd_high actually belongs with the value in pmd_low; but holding interrupts + * off blocks the TLB flush between present updates, which guarantees that a + * successful __pte_offset_map() points to a page from matched halves. + */ +#define config_might_irq_save(flags) local_irq_save(flags) +#define config_might_irq_restore(flags) local_irq_restore(flags) +#else +#define config_might_irq_save(flags) +#define config_might_irq_restore(flags) +#endif + pte_t *__pte_offset_map(pmd_t *pmd, unsigned long addr, pmd_t *pmdvalp) { + unsigned long __maybe_unused flags; pmd_t pmdval; rcu_read_lock(); + config_might_irq_save(flags); pmdval = pmdp_get_lockless(pmd); + config_might_irq_restore(flags); + if (pmdvalp) *pmdvalp = pmdval; if (unlikely(pmd_none(pmdval) || is_pmd_migration_entry(pmdval))) -- 2.35.3