pte_unmap_same() is making the assumption that the page table are still around because the mmap_sem is held. This is no more the case when running a speculative page fault and additional check must be made to ensure that the final page table are still there. This is now done by calling pte_spinlock() to check for the VMA's consistency while locking for the page tables. This is requiring passing a vm_fault structure to pte_unmap_same() which is containing all the needed parameters. As pte_spinlock() may fail in the case of a speculative page fault, if the VMA has been touched in our back, pte_unmap_same() should now return 3 cases : 1. pte are the same (0) 2. pte are different (VM_FAULT_PTNOTSAME) 3. a VMA's changes has been detected (VM_FAULT_RETRY) The case 2 is handled by the introduction of a new VM_FAULT flag named VM_FAULT_PTNOTSAME which is then trapped in cow_user_page(). If VM_FAULT_RETRY is returned, it is passed up to the callers to retry the page fault while holding the mmap_sem. Signed-off-by: Laurent Dufour <ldufour@xxxxxxxxxxxxxxxxxx> --- include/linux/mm.h | 1 + mm/memory.c | 29 +++++++++++++++++++---------- 2 files changed, 20 insertions(+), 10 deletions(-) diff --git a/include/linux/mm.h b/include/linux/mm.h index 2f3e98edc94a..b6432a261e63 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -1199,6 +1199,7 @@ static inline void clear_page_pfmemalloc(struct page *page) #define VM_FAULT_NEEDDSYNC 0x2000 /* ->fault did not modify page tables * and needs fsync() to complete (for * synchronous page faults in DAX) */ +#define VM_FAULT_PTNOTSAME 0x4000 /* Page table entries have changed */ #define VM_FAULT_ERROR (VM_FAULT_OOM | VM_FAULT_SIGBUS | VM_FAULT_SIGSEGV | \ VM_FAULT_HWPOISON | VM_FAULT_HWPOISON_LARGE | \ diff --git a/mm/memory.c b/mm/memory.c index 21b1212a0892..4bc7b0bdcb40 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -2309,21 +2309,29 @@ static bool pte_map_lock(struct vm_fault *vmf) * parts, do_swap_page must check under lock before unmapping the pte and * proceeding (but do_wp_page is only called after already making such a check; * and do_anonymous_page can safely check later on). + * + * pte_unmap_same() returns: + * 0 if the PTE are the same + * VM_FAULT_PTNOTSAME if the PTE are different + * VM_FAULT_RETRY if the VMA has changed in our back during + * a speculative page fault handling. */ -static inline int pte_unmap_same(struct mm_struct *mm, pmd_t *pmd, - pte_t *page_table, pte_t orig_pte) +static inline int pte_unmap_same(struct vm_fault *vmf) { - int same = 1; + int ret = 0; + #if defined(CONFIG_SMP) || defined(CONFIG_PREEMPT) if (sizeof(pte_t) > sizeof(unsigned long)) { - spinlock_t *ptl = pte_lockptr(mm, pmd); - spin_lock(ptl); - same = pte_same(*page_table, orig_pte); - spin_unlock(ptl); + if (pte_spinlock(vmf)) { + if (!pte_same(*vmf->pte, vmf->orig_pte)) + ret = VM_FAULT_PTNOTSAME; + spin_unlock(vmf->ptl); + } else + ret = VM_FAULT_RETRY; } #endif - pte_unmap(page_table); - return same; + pte_unmap(vmf->pte); + return ret; } static inline void cow_user_page(struct page *dst, struct page *src, unsigned long va, struct vm_area_struct *vma) @@ -2913,7 +2921,8 @@ int do_swap_page(struct vm_fault *vmf) int exclusive = 0; int ret = 0; - if (!pte_unmap_same(vma->vm_mm, vmf->pmd, vmf->pte, vmf->orig_pte)) + ret = pte_unmap_same(vmf); + if (ret) goto out; entry = pte_to_swp_entry(vmf->orig_pte); -- 2.7.4