Re: Patch "mm/huge_memory.c: respect FOLL_FORCE/FOLL_COW for thp" has been added to the 4.4-stable tree

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon 10-04-17 17:09:17, Greg KH wrote:
[...]
> >From 303681d5d538d81b5e23754515202b5b9febd2e9 Mon Sep 17 00:00:00 2001
> From: Keno Fischer <keno@xxxxxxxxxxxxxxxxxx>
> Date: Tue, 24 Jan 2017 15:17:48 -0800
> Subject: mm/huge_memory.c: respect FOLL_FORCE/FOLL_COW for thp
> 
> From: Keno Fischer <keno@xxxxxxxxxxxxxxxxxx>
> 
> commit 8310d48b125d19fcd9521d83b8293e63eb1646aa upstream.

This backport is wrong. See
http://lkml.kernel.org/r/20170328131154.GH18241@xxxxxxxxxxxxxx

> 
> In commit 19be0eaffa3a ("mm: remove gup_flags FOLL_WRITE games from
> __get_user_pages()"), the mm code was changed from unsetting FOLL_WRITE
> after a COW was resolved to setting the (newly introduced) FOLL_COW
> instead.  Simultaneously, the check in gup.c was updated to still allow
> writes with FOLL_FORCE set if FOLL_COW had also been set.
> 
> However, a similar check in huge_memory.c was forgotten.  As a result,
> remote memory writes to ro regions of memory backed by transparent huge
> pages cause an infinite loop in the kernel (handle_mm_fault sets
> FOLL_COW and returns 0 causing a retry, but follow_trans_huge_pmd bails
> out immidiately because `(flags & FOLL_WRITE) && !pmd_write(*pmd)` is
> true.
> 
> While in this state the process is stil SIGKILLable, but little else
> works (e.g.  no ptrace attach, no other signals).  This is easily
> reproduced with the following code (assuming thp are set to always):
> 
>     #include <assert.h>
>     #include <fcntl.h>
>     #include <stdint.h>
>     #include <stdio.h>
>     #include <string.h>
>     #include <sys/mman.h>
>     #include <sys/stat.h>
>     #include <sys/types.h>
>     #include <sys/wait.h>
>     #include <unistd.h>
> 
>     #define TEST_SIZE 5 * 1024 * 1024
> 
>     int main(void) {
>       int status;
>       pid_t child;
>       int fd = open("/proc/self/mem", O_RDWR);
>       void *addr = mmap(NULL, TEST_SIZE, PROT_READ,
>                         MAP_ANONYMOUS | MAP_PRIVATE, 0, 0);
>       assert(addr != MAP_FAILED);
>       pid_t parent_pid = getpid();
>       if ((child = fork()) == 0) {
>         void *addr2 = mmap(NULL, TEST_SIZE, PROT_READ | PROT_WRITE,
>                            MAP_ANONYMOUS | MAP_PRIVATE, 0, 0);
>         assert(addr2 != MAP_FAILED);
>         memset(addr2, 'a', TEST_SIZE);
>         pwrite(fd, addr2, TEST_SIZE, (uintptr_t)addr);
>         return 0;
>       }
>       assert(child == waitpid(child, &status, 0));
>       assert(WIFEXITED(status) && WEXITSTATUS(status) == 0);
>       return 0;
>     }
> 
> Fix this by updating follow_trans_huge_pmd in huge_memory.c analogously
> to the update in gup.c in the original commit.  The same pattern exists
> in follow_devmap_pmd.  However, we should not be able to reach that
> check with FOLL_COW set, so add WARN_ONCE to make sure we notice if we
> ever do.
> 
> [akpm@xxxxxxxxxxxxxxxxxxxx: coding-style fixes]
> Link: http://lkml.kernel.org/r/20170106015025.GA38411@xxxxxxxxxxxxxxxxxx
> Signed-off-by: Keno Fischer <keno@xxxxxxxxxxxxxxxxxx>
> Acked-by: Kirill A. Shutemov <kirill.shutemov@xxxxxxxxxxxxxxx>
> Cc: Greg Thelen <gthelen@xxxxxxxxxx>
> Cc: Nicholas Piggin <npiggin@xxxxxxxxx>
> Cc: Willy Tarreau <w@xxxxxx>
> Cc: Oleg Nesterov <oleg@xxxxxxxxxx>
> Cc: Kees Cook <keescook@xxxxxxxxxxxx>
> Cc: Andy Lutomirski <luto@xxxxxxxxxx>
> Cc: Michal Hocko <mhocko@xxxxxxxx>
> Cc: Hugh Dickins <hughd@xxxxxxxxxx>
> Signed-off-by: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx>
> Signed-off-by: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx>
> [bwh: Backported to 3.16:
>  - Drop change to follow_devmap_pmd()
>  - pmd_dirty() is not available; check the page flags as in older
>    backports of can_follow_write_pte()
>  - Adjust context]
> Signed-off-by: Ben Hutchings <ben@xxxxxxxxxxxxxxx>
> Signed-off-by: Greg Kroah-Hartman <gregkh@xxxxxxxxxxxxxxxxxxx>
> 
> ---
>  mm/huge_memory.c |   19 ++++++++++++++++---
>  1 file changed, 16 insertions(+), 3 deletions(-)
> 
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -1269,6 +1269,18 @@ out_unlock:
>  	return ret;
>  }
>  
> +/*
> + * FOLL_FORCE can write to even unwritable pmd's, but only
> + * after we've gone through a COW cycle and they are dirty.
> + */
> +static inline bool can_follow_write_pmd(pmd_t pmd, struct page *page,
> +					unsigned int flags)
> +{
> +	return pmd_write(pmd) ||
> +		((flags & FOLL_FORCE) && (flags & FOLL_COW) &&
> +		 page && PageAnon(page));
> +}
> +
>  struct page *follow_trans_huge_pmd(struct vm_area_struct *vma,
>  				   unsigned long addr,
>  				   pmd_t *pmd,
> @@ -1279,9 +1291,6 @@ struct page *follow_trans_huge_pmd(struc
>  
>  	assert_spin_locked(pmd_lockptr(mm, pmd));
>  
> -	if (flags & FOLL_WRITE && !pmd_write(*pmd))
> -		goto out;
> -
>  	/* Avoid dumping huge zero page */
>  	if ((flags & FOLL_DUMP) && is_huge_zero_pmd(*pmd))
>  		return ERR_PTR(-EFAULT);
> @@ -1292,6 +1301,10 @@ struct page *follow_trans_huge_pmd(struc
>  
>  	page = pmd_page(*pmd);
>  	VM_BUG_ON_PAGE(!PageHead(page), page);
> +
> +	if (flags & FOLL_WRITE && !can_follow_write_pmd(*pmd, page, flags))
> +		goto out;
> +
>  	if (flags & FOLL_TOUCH) {
>  		pmd_t _pmd;
>  		/*
> 
> 
> Patches currently in stable-queue which might be from keno@xxxxxxxxxxxxxxxxxx are
> 
> queue-4.4/mm-huge_memory.c-respect-foll_force-foll_cow-for-thp.patch

-- 
Michal Hocko
SUSE Labs



[Index of Archives]     [Linux Kernel]     [Kernel Development Newbies]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite Hiking]     [Linux Kernel]     [Linux SCSI]