On 21.07.22 00:03, Peter Xu wrote: > The check wanted to make sure when soft-dirty tracking is enabled we won't > grant write bit by accident, as a page fault is needed for dirty tracking. > The intention is correct but we didn't check it right because VM_SOFTDIRTY > set actually means soft-dirty tracking disabled. Fix it. Thanks for digging into this and writing the reproducer. The softdirty logic was rather confusing for me. > > It wasn't a bug for a long time because we used to only optimize the write > bit settings in change_pte_range() for page caches, and since we've got a > higher level check in vma_wants_writenotify(), we will never set the bit > MM_CP_TRY_CHANGE_WRITABLE for soft-dirty enabled page caches, hence even if > we checked with the wrong value of VM_SOFTDIRTY in change_pte_range() it'll > just be an no-op. Functionally it was still correct, even if cpu cycles > wasted. I don't quite follow that explanation and most probably I am missing something. Modifying your test to map page from a file MAP_SHARED gives me under 5.18.11-100.fc35.x86_64: 53,54d52 < FILE *file = fopen("tmpfile", "w+"); < int file_fd; 56d53 < assert(file); 59,61d55 < < file_fd = fileno(file); < ftruncate(file_fd, psize); 63c57 < MAP_SHARED, file_fd, 0); --- > MAP_ANONYMOUS|MAP_PRIVATE, -1, 0); t480s: ~ $ sudo ./tmp ERROR: Wrote page again, soft-dirty=0 (expect: 1 IMHO, while the check in vma_wants_writenotify() is correct and makes sure that the pages are kept R/O by the generic machinery. We get vma_wants_writenotify(), so we activate MM_CP_TRY_CHANGE_WRITABLE. The wrong logic in can_change_pte_writable(), however, maps the page writable again without caring about softdirty. At least that would be my explanation for the failure. But maybe I messes up something else :) > > However after the recent work of anonymous page optimization on exclusive > pages we'll start to make it wrong because anonymous page does not require > the check in vma_wants_writenotify() hence it'll suffer from the wrong > check here in can_change_pte_writable(). > > We can easily verify this with any exclusive anonymous page, like program > below: > > =======8<====== > #include <stdio.h> > #include <unistd.h> > #include <stdlib.h> > #include <assert.h> > #include <inttypes.h> > #include <stdint.h> > #include <sys/types.h> > #include <sys/mman.h> > #include <sys/types.h> > #include <sys/stat.h> > #include <unistd.h> > #include <fcntl.h> > #include <stdbool.h> > > #define BIT_ULL(nr) (1ULL << (nr)) > #define PM_SOFT_DIRTY BIT_ULL(55) > > unsigned int psize; > char *page; > > uint64_t pagemap_read_vaddr(int fd, void *vaddr) > { > uint64_t value; > int ret; > > ret = pread(fd, &value, sizeof(uint64_t), > ((uint64_t)vaddr >> 12) * sizeof(uint64_t)); > assert(ret == sizeof(uint64_t)); > > return value; > } > > void clear_refs_write(void) > { > int fd = open("/proc/self/clear_refs", O_RDWR); > > assert(fd >= 0); > write(fd, "4", 2); > close(fd); > } > > #define check_soft_dirty(str, expect) do { \ > bool dirty = pagemap_read_vaddr(fd, page) & PM_SOFT_DIRTY; \ > if (dirty != expect) { \ > printf("ERROR: %s, soft-dirty=%d (expect: %d)\n", str, dirty, expect); \ > exit(-1); \ > } \ > } while (0) > > int main(void) > { > int fd = open("/proc/self/pagemap", O_RDONLY); > > assert(fd >= 0); > psize = getpagesize(); > page = mmap(NULL, psize, PROT_READ|PROT_WRITE, > MAP_ANONYMOUS|MAP_PRIVATE, -1, 0); > assert(page != MAP_FAILED); > > *page = 1; > check_soft_dirty("Just faulted in page", 1); > clear_refs_write(); > check_soft_dirty("Clear_refs written", 0); > mprotect(page, psize, PROT_READ); > check_soft_dirty("Marked RO", 0); > mprotect(page, psize, PROT_READ|PROT_WRITE); > check_soft_dirty("Marked RW", 0); > *page = 2; > check_soft_dirty("Wrote page again", 1); > > munmap(page, psize); > close(fd); > printf("Test passed.\n"); > > return 0; > } Can we turn that into a vm selftest in tools/testing/selftests/vm/soft-dirty.c, and also extend it by MAP_SHARED froma file as above? > =======8<====== > > So even if commit 64fe24a3e05e kept the old behavior and didn't attempt to > change the behavior here, the bug will only be able to be triggered after > commit 64fe24a3e05e because only anonymous page will suffer from it. > > Fixes: 64fe24a3e05e ("mm/mprotect: try avoiding write faults for exclusive anonymous pages when changing protection") > Signed-off-by: Peter Xu <peterx@xxxxxxxxxx> > --- > mm/mprotect.c | 7 +++++-- > 1 file changed, 5 insertions(+), 2 deletions(-) > > diff --git a/mm/mprotect.c b/mm/mprotect.c > index 0420c3ed936c..804807ab14e6 100644 > --- a/mm/mprotect.c > +++ b/mm/mprotect.c > @@ -48,8 +48,11 @@ static inline bool can_change_pte_writable(struct vm_area_struct *vma, > if (pte_protnone(pte) || !pte_dirty(pte)) > return false; > > - /* Do we need write faults for softdirty tracking? */ > - if ((vma->vm_flags & VM_SOFTDIRTY) && !pte_soft_dirty(pte)) > + /* > + * Do we need write faults for softdirty tracking? Note, > + * soft-dirty is enabled when !VM_SOFTDIRTY. > + */ > + if (!(vma->vm_flags & VM_SOFTDIRTY) && !pte_soft_dirty(pte)) > return false; I wonder if we now want, just as in vma_wants_writenotify(), an early check for IS_ENABLED(CONFIG_MEM_SOFT_DIRTY), to optimize this out completely. > > /* Do we need write faults for uffd-wp tracking? */ -- Thanks, David / dhildenb