I didn't answer your questions further down, sorry, resuming... On Mon, 31 Aug 2020, Jann Horn wrote: > On Mon, Aug 31, 2020 at 8:07 AM Hugh Dickins <hughd@xxxxxxxxxx> wrote: ... > > but the "pmd .. physical page 0" issue is explained better in its parent > > 18e77600f7a1 ("khugepaged: retract_page_tables() remember to test exit") ... > Just to clarify: This is an issue only between GUP's software page Not just GUP's software page table walks: any of our software page table walks that could occur concurrently (notably, unmapping when exiting). > table walks when running without mmap_lock and concurrent page table > modifications from hugepage code, correct? Correct. > Hardware page table walks Have no problem: the necessary TLB flush is already done. > and get_user_pages_fast() are fine because they properly load PTEs > atomically and are written to assume that the page tables can change > arbitrarily under them, and the only guarantee is that disabling > interrupts ensures that pages referenced by PTEs can't be freed, > right? mm/gup.c has changed a lot since I was familiar with it, and I'm out of touch with the history of architectural variants. I think internal_get_user_pages_fast() is now the place to look, and I see local_irq_save(flags); gup_pgd_range(addr, end, fast_flags, pages, &nr_pinned); local_irq_restore(flags); reassuringly there, which is how x86 always used to do it, and the dependence of x86 TLB flush on IPIs made it all safe. Looking at gup_pmd_range(), its operations on pmd (= READ_ONCE(*pmdp)) look correct to me, and where I said "any of our software page table walks" above, there should be an exception for GUP_fast. But the other software page table walks are more loosely coded, and less able to fall back - if gup_pmd_range() catches sight of a fleeting *pmdp 0, it rightly just gives up immediately on !pmd_present(pmd); whereas tearing down a userspace mapping needs to wait or retry on seeing a transient state (but mmap_lock happens to give protection against that particular transient state). I assume that all the architectures which support GUP_fast have now been gathered into the same mechanism (perhaps by an otherwise superfluous IPI on TLB flush?) and are equally safe. Hugh