ptep_get_lockless() does the following under CONFIG_GUP_GET_PTE_LOW_HIGH: pte_t pte; do { pte.pte_low = ptep->pte_low; smp_rmb(); pte.pte_high = ptep->pte_high; smp_rmb(); } while (unlikely(pte.pte_low != ptep->pte_low)); It has a comment above it that argues that this is correct because: 1. A present PTE can't become non-present and then become a present PTE pointing to another page without a TLB flush in between. 2. TLB flushes involve IPIs. As far as I can tell, in particular on x86, _both_ of those assumptions are false; perhaps on mips and sh only one of them is? Number 2 is straightforward: X86 can run under hypervisors, and when it runs under hypervisors, the MMU paravirtualization code (including the KVM version) can implement remote TLB flushes without IPIs. Number 1 is gnarlier, because breaking that assumption implies that there can be a situation where different threads see different memory at the same virtual address because their TLBs are incoherent. But as far as I know, it can happen when MADV_DONTNEED races with an anonymous page fault, because zap_pte_range() does not always flush stale TLB entries before dropping the page table lock. I think that's probably fine, since it's a "garbage in, garbage out" kind of situation - but if a concurrent GUP-fast can then theoretically end up returning a completely unrelated page, that's bad. Sadly, mips and sh don't define arch_cmpxchg_double(), so we can't just change ptep_get_lockless() to use arch_cmpxchg_double() and be done with it...