This is the backport of Michal Hocko's fix for EMBARGOED CVE-2016-5195. Its different for RHEL6 because it does not use use pte_dirty() in can_follow_write_pte(). This is because we don't have abf09bed3cce in rhel-6 which is a problem for s390: commit abf09bed3cceadd809f0356065c2ada6cee90d4a Author: Martin Schwidefsky <schwidefsky@xxxxxxxxxx> Date: Wed Nov 7 13:17:37 2012 +0100 s390/mm: implement software dirty bits Brew: https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=11921391 BZ: 1385117 Testing: by me. Subject: [PATCH] mm, gup: close FOLL MAP_PRIVATE race Commit 37619aae4d25088880dd3a49fcde0d8c0c7000a3 Author: Michal Hocko <mhocko@xxxxxxxx> Date: Sun, 16 Oct 2016 11:55:00 +0200 mm, gup: close FOLL MAP_PRIVATE race without pte_dirty() faultin_page drops FOLL_WRITE after the page fault handler did the CoW and then we retry follow_page_mask to get our CoWed page. This is racy, however because the page might have been unmapped by that time and so we would have to do a page fault again, this time without CoW. This would cause the page cache corruption for FOLL_FORCE on MAP_PRIVATE read only mappings with obvious consequences. This is an ancient bug that was actually already fixed once by Linus eleven years ago in commit 4ceb5db9757a ("Fix get_user_pages() race for write access") but that was then undone due to problems on s390 by commit f33ea7f404e5 ("fix get_user_pages bug") because s390 didn't have proper dirty pte tracking until abf09bed3cce ("s390/mm: implement software dirty bits"). This wasn't a problem at the time as pointed out by Hugh Dickins because madvise relied on mmap_sem for write up until 0a27a14a6292 ("mm: madvise avoid exclusive mmap_sem") but since then we can race with madvise which can unmap the fresh COWed page or with KSM and corrupt the content of the shared page. This patch is based on the Linus' approach to not clear FOLL_WRITE after the CoW page fault (aka VM_FAULT_WRITE) but instead introduces FOLL_COW to note this fact. The flag is then rechecked during follow_pfn_pte to enforce the page fault again if we do not see the CoWed page. Linus was suggesting to check pte_dirty again as s390 is OK now. But that would make backporting to some old kernels harder. So instead let's just make sure that vm_normal_page sees a pure anonymous page. This would guarantee we are seeing a real CoW page. Introduce can_follow_write_pte which checks both pte_write and falls back to PageAnon on forced write faults which passed CoW already. Thanks to Hugh to point out that a special care has to be taken for KSM pages because our COWed page might have been merged with a KSM one and keep its PageAnon flag. Fixes: 0a27a14a6292 ("mm: madvise avoid exclusive mmap_sem") Cc: stable@xxxxxxxxxxxxxxx # 2.6.22+ Reported-by: Phil "not Paul" Oester <kernel@xxxxxxxxxxxx> Disclosed-by: Andy Lutomirski <luto@xxxxxxxxxx> Signed-off-by: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> Signed-off-by: Michal Hocko <mhocko@xxxxxxxx> --- include/linux/mm.h | 1 + mm/memory.c | 22 ++++++++++++++++++++-- 2 files changed, 21 insertions(+), 2 deletions(-) diff --git a/include/linux/mm.h b/include/linux/mm.h index f48db81..2759108 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -1420,6 +1420,7 @@ struct page *follow_page(struct vm_area_struct *, unsigned long address, #define FOLL_HWPOISON 0x100 /* check page is hwpoisoned */ #define FOLL_NUMA 0x200 /* force NUMA hinting page fault */ #define FOLL_MIGRATION 0x400 /* wait for page to replace migration entry */ +#define FOLL_COW 0x800 /* internal GUP flag */ typedef int (*pte_fn_t)(pte_t *pte, pgtable_t token, unsigned long addr, void *data); diff --git a/mm/memory.c b/mm/memory.c index 47e7a00..13a36a5 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -1177,6 +1177,24 @@ int zap_vma_ptes(struct vm_area_struct *vma, unsigned long address, } EXPORT_SYMBOL_GPL(zap_vma_ptes); +static inline bool can_follow_write_pte(pte_t pte, struct page *page, + unsigned int flags) +{ + if (pte_write(pte)) + return true; + + /* + * Make sure that we are really following CoWed page. We do not really + * have to care about exclusiveness of the page because we only want + * to ensure that once COWed page hasn't disappeared in the meantime + * or it hasn't been merged to a KSM page. + */ + if ((flags & FOLL_FORCE) && (flags & FOLL_COW)) + return page && PageAnon(page) && !PageKsm(page); + + return false; +} + /* * Do a quick page-table lookup for a single page. */ @@ -1266,7 +1284,7 @@ split_fallthrough: migration_entry_wait(mm, pmd, address); goto split_fallthrough; } - if ((flags & FOLL_WRITE) && !pte_write(pte)) + if ((flags & FOLL_WRITE) && !can_follow_write_pte(pte, page, flags)) goto unlock; page = vm_normal_page(vma, address, pte); @@ -1499,7 +1517,7 @@ int __get_user_pages(struct task_struct *tsk, struct mm_struct *mm, */ if ((ret & VM_FAULT_WRITE) && !(vma->vm_flags & VM_WRITE)) - foll_flags &= ~FOLL_WRITE; + foll_flags |= FOLL_COW; cond_resched(); } -- 1.7.1 -- To unsubscribe from this list: send the line "unsubscribe stable" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html