Re: [PATCH v2] mm/mprotect: Fix soft-dirty check in can_change_pte_writable()

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 21.07.22 00:03, Peter Xu wrote:
> The check wanted to make sure when soft-dirty tracking is enabled we won't
> grant write bit by accident, as a page fault is needed for dirty tracking.
> The intention is correct but we didn't check it right because VM_SOFTDIRTY
> set actually means soft-dirty tracking disabled.  Fix it.

Thanks for digging into this and writing the reproducer. The softdirty
logic was rather confusing for me.

> 
> It wasn't a bug for a long time because we used to only optimize the write
> bit settings in change_pte_range() for page caches, and since we've got a
> higher level check in vma_wants_writenotify(), we will never set the bit
> MM_CP_TRY_CHANGE_WRITABLE for soft-dirty enabled page caches, hence even if
> we checked with the wrong value of VM_SOFTDIRTY in change_pte_range() it'll
> just be an no-op.  Functionally it was still correct, even if cpu cycles
> wasted.

I don't quite follow that explanation and most probably I am missing
something.

Modifying your test to map page from a file MAP_SHARED gives me under
5.18.11-100.fc35.x86_64:


53,54d52
<       FILE *file = fopen("tmpfile", "w+");
<       int file_fd;
56d53
<       assert(file);
59,61d55
<
<       file_fd = fileno(file);
<       ftruncate(file_fd, psize);
63c57
<                   MAP_SHARED, file_fd, 0);
---
>                   MAP_ANONYMOUS|MAP_PRIVATE, -1, 0);


t480s: ~  $ sudo ./tmp
ERROR: Wrote page again, soft-dirty=0 (expect: 1



IMHO, while the check in vma_wants_writenotify() is correct and makes
sure that the pages are kept R/O by the generic machinery. We get
vma_wants_writenotify(), so we activate MM_CP_TRY_CHANGE_WRITABLE. The
wrong logic in can_change_pte_writable(), however, maps the page
writable again without caring about softdirty.

At least that would be my explanation for the failure. But maybe I
messes up something else :)



> 
> However after the recent work of anonymous page optimization on exclusive
> pages we'll start to make it wrong because anonymous page does not require
> the check in vma_wants_writenotify() hence it'll suffer from the wrong
> check here in can_change_pte_writable().
> 
> We can easily verify this with any exclusive anonymous page, like program
> below:
> 
> =======8<======
>   #include <stdio.h>
>   #include <unistd.h>
>   #include <stdlib.h>
>   #include <assert.h>
>   #include <inttypes.h>
>   #include <stdint.h>
>   #include <sys/types.h>
>   #include <sys/mman.h>
>   #include <sys/types.h>
>   #include <sys/stat.h>
>   #include <unistd.h>
>   #include <fcntl.h>
>   #include <stdbool.h>
> 
>   #define BIT_ULL(nr)                   (1ULL << (nr))
>   #define PM_SOFT_DIRTY                 BIT_ULL(55)
> 
>   unsigned int psize;
>   char *page;
> 
>   uint64_t pagemap_read_vaddr(int fd, void *vaddr)
>   {
>       uint64_t value;
>       int ret;
> 
>       ret = pread(fd, &value, sizeof(uint64_t),
>                   ((uint64_t)vaddr >> 12) * sizeof(uint64_t));
>       assert(ret == sizeof(uint64_t));
> 
>       return value;
>   }
> 
>   void clear_refs_write(void)
>   {
>       int fd = open("/proc/self/clear_refs", O_RDWR);
> 
>       assert(fd >= 0);
>       write(fd, "4", 2);
>       close(fd);
>   }
> 
>   #define  check_soft_dirty(str, expect)  do {                            \
>           bool dirty = pagemap_read_vaddr(fd, page) & PM_SOFT_DIRTY;      \
>           if (dirty != expect) {                                          \
>               printf("ERROR: %s, soft-dirty=%d (expect: %d)\n", str, dirty, expect); \
>               exit(-1);                                                   \
>           }                                                               \
>   } while (0)
> 
>   int main(void)
>   {
>       int fd = open("/proc/self/pagemap", O_RDONLY);
> 
>       assert(fd >= 0);
>       psize = getpagesize();
>       page = mmap(NULL, psize, PROT_READ|PROT_WRITE,
>                   MAP_ANONYMOUS|MAP_PRIVATE, -1, 0);
>       assert(page != MAP_FAILED);
> 
>       *page = 1;
>       check_soft_dirty("Just faulted in page", 1);
>       clear_refs_write();
>       check_soft_dirty("Clear_refs written", 0);
>       mprotect(page, psize, PROT_READ);
>       check_soft_dirty("Marked RO", 0);
>       mprotect(page, psize, PROT_READ|PROT_WRITE);
>       check_soft_dirty("Marked RW", 0);
>       *page = 2;
>       check_soft_dirty("Wrote page again", 1);
> 
>       munmap(page, psize);
>       close(fd);
>       printf("Test passed.\n");
> 
>       return 0;
>   }

Can we turn that into a vm selftest in
tools/testing/selftests/vm/soft-dirty.c, and also extend it by
MAP_SHARED froma  file as above?

> =======8<======
> 
> So even if commit 64fe24a3e05e kept the old behavior and didn't attempt to
> change the behavior here, the bug will only be able to be triggered after
> commit 64fe24a3e05e because only anonymous page will suffer from it.
> 
> Fixes: 64fe24a3e05e ("mm/mprotect: try avoiding write faults for exclusive anonymous pages when changing protection")
> Signed-off-by: Peter Xu <peterx@xxxxxxxxxx>
> ---
>  mm/mprotect.c | 7 +++++--
>  1 file changed, 5 insertions(+), 2 deletions(-)
> 
> diff --git a/mm/mprotect.c b/mm/mprotect.c
> index 0420c3ed936c..804807ab14e6 100644
> --- a/mm/mprotect.c
> +++ b/mm/mprotect.c
> @@ -48,8 +48,11 @@ static inline bool can_change_pte_writable(struct vm_area_struct *vma,
>  	if (pte_protnone(pte) || !pte_dirty(pte))
>  		return false;
>  
> -	/* Do we need write faults for softdirty tracking? */
> -	if ((vma->vm_flags & VM_SOFTDIRTY) && !pte_soft_dirty(pte))
> +	/*
> +	 * Do we need write faults for softdirty tracking?  Note,
> +	 * soft-dirty is enabled when !VM_SOFTDIRTY.
> +	 */
> +	if (!(vma->vm_flags & VM_SOFTDIRTY) && !pte_soft_dirty(pte))
>  		return false;

I wonder if we now want, just as in vma_wants_writenotify(), an early
check for IS_ENABLED(CONFIG_MEM_SOFT_DIRTY), to optimize this out
completely.

>  
>  	/* Do we need write faults for uffd-wp tracking? */


-- 
Thanks,

David / dhildenb





[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux