From: "Kirill A. Shutemov" <kirill.shutemov@xxxxxxxxxxxxxxx> Subject: thp: fix MADV_DONTNEED vs. MADV_FREE race Both MADV_DONTNEED and MADV_FREE handled with down_read(mmap_sem). It's critical to not clear pmd intermittently while handling MADV_FREE to avoid race with MADV_DONTNEED: CPU0: CPU1: madvise_free_huge_pmd() pmdp_huge_get_and_clear_full() madvise_dontneed() zap_pmd_range() pmd_trans_huge(*pmd) == 0 (without ptl) // skip the pmd set_pmd_at(); // pmd is re-established It results in MADV_DONTNEED skipping the pmd, leaving it not cleared. It violates MADV_DONTNEED interface and can result is userspace misbehaviour. Basically it's the same race as with numa balancing in change_huge_pmd(), but a bit simpler to mitigate: we don't need to preserve dirty/young flags here due to MADV_FREE functionality. [kirill.shutemov@xxxxxxxxxxxxxxx: Urgh... Power is special again] Link: http://lkml.kernel.org/r/20170303102636.bhd2zhtpds4mt62a@xxxxxxxxxxxxxxxxxx Link: http://lkml.kernel.org/r/20170302151034.27829-4-kirill.shutemov@xxxxxxxxxxxxxxx Signed-off-by: Kirill A. Shutemov <kirill.shutemov@xxxxxxxxxxxxxxx> Acked-by: Minchan Kim <minchan@xxxxxxxxxx> Cc: Minchan Kim <minchan@xxxxxxxxxx> Cc: Andrea Arcangeli <aarcange@xxxxxxxxxx> Cc: Hillf Danton <hillf.zj@xxxxxxxxxxxxxxx> Cc: <stable@xxxxxxxxxxxxxxx> Signed-off-by: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> --- mm/huge_memory.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff -puN mm/huge_memory.c~thp-fix-madv_dontneed-vs-madv_free-race mm/huge_memory.c --- a/mm/huge_memory.c~thp-fix-madv_dontneed-vs-madv_free-race +++ a/mm/huge_memory.c @@ -1568,8 +1568,7 @@ bool madvise_free_huge_pmd(struct mmu_ga deactivate_page(page); if (pmd_young(orig_pmd) || pmd_dirty(orig_pmd)) { - orig_pmd = pmdp_huge_get_and_clear_full(tlb->mm, addr, pmd, - tlb->fullmm); + pmdp_invalidate(vma, addr, pmd); orig_pmd = pmd_mkold(orig_pmd); orig_pmd = pmd_mkclean(orig_pmd); _