> > I'll see if I can confirm that this is indeed possible and send a > > repro if it is. > > I think your analysis above is correct. The key being the failure to unshare > in the non-PUD_SIZE vma after the split. I do indeed hit the WARN_ON_ONCE (repro attached), and the MADV wasn't even needed (the UFFDIO_REGISTER does the VMA split before "unsharing all PMDs"). With the fix, we avoid the WARN_ON_ONCE, but the behavior is still incorrect: I expect the address range to be write-protected, but it isn't. The reason why is that hugetlb_change_protection uses huge_pte_offset, even if it's being called for a UFFDIO_WRITEPROTECT with UFFDIO_WRITEPROTECT_MODE_WP. In that particular case, I'm pretty sure we should be using huge_pte_alloc, but even so, it's not trivial to get an allocation failure back up to userspace. The non-hugetlb implementation of UFFDIO_WRITEPROTECT seems to also have this problem. Peter, what do you think? > > To me, the fact it was somewhat difficult to come up with this scenario is an > argument what we should just unshare at split time as you propose. Who > knows what other issues may exist. > > > 60dfaad65a ("mm/hugetlb: allow uffd wr-protect none ptes") is the > > commit that introduced the WARN_ON_ONCE; perhaps it's a good choice > > for a Fixes: tag (if above is indeed true). > > If the key issue in your above scenario is indeed the failure of > hugetlb_unshare_all_pmds in the non-PUD_SIZE vma, then perhaps we tag? > > 6dfeaff93be1 ("hugetlb/userfaultfd: unshare all pmds for hugetlbfs when > register wp") SGTM. Thanks Mike.
#define _GNU_SOURCE #include <unistd.h> #include <fcntl.h> #include <linux/memfd.h> #include <linux/mman.h> #include <sys/mman.h> #include <linux/userfaultfd.h> #include <linux/errno.h> #include <sys/types.h> #include <sys/wait.h> #include <stdio.h> #include <sys/syscall.h> #include <sys/ioctl.h> #define PAGES_PER_GIG 512 #define HUGE_PAGE_SIZE (2UL << 20) #define GIG_MASK ~(PAGES_PER_GIG * HUGE_PAGE_SIZE - 1) void fault_in_write(char *mapping, size_t len) { volatile char *mapping_v = mapping; for (size_t i = 0; i < len; i += 4096) mapping_v[i] = 1; } int main() { int fd = memfd_create("test", MFD_HUGETLB); size_t len = 2 * PAGES_PER_GIG * HUGE_PAGE_SIZE; char *mapping; char *mapping_to_use; int uffd; int pid; struct uffdio_api uffdio_api; struct uffdio_register uffdio_register; struct uffdio_writeprotect uffdio_writeprotect; if (ftruncate(fd, len)) { perror("ftruncate failed"); return -1; } mapping = mmap(NULL, len, PROT_WRITE | PROT_READ, MAP_SHARED, fd, 0); if (mapping == MAP_FAILED) { perror("mmap failed"); return -1; } mapping_to_use = mapping; if ((unsigned long)mapping & ~GIG_MASK) { mapping_to_use = (char *)((unsigned long)mapping & GIG_MASK) + (len/2); } pid = fork(); if (pid < 0) { perror("fork failed"); return -1; } fault_in_write(mapping, len); if (pid > 0) { int status; waitpid(pid, &status, 0); return WEXITSTATUS(status); } uffd = syscall(SYS_userfaultfd, O_CLOEXEC); uffdio_api.api = UFFD_API; uffdio_api.features = UFFD_FEATURE_SIGBUS; uffdio_api.ioctls = 0; if (ioctl(uffd, UFFDIO_API, &uffdio_api)) { perror("UFFDIO_API failed"); return -1; } uffdio_register.range.start = (unsigned long)mapping_to_use; uffdio_register.range.len = len/4; uffdio_register.mode = UFFDIO_REGISTER_MODE_WP; if (ioctl(uffd, UFFDIO_REGISTER, &uffdio_register)) { perror("UFFDIO_REGISTER failed"); return -1; } uffdio_writeprotect.range.start = (unsigned long)mapping_to_use; uffdio_writeprotect.range.len = len/4; uffdio_writeprotect.mode = UFFDIO_WRITEPROTECT_MODE_WP; if (ioctl(uffd, UFFDIO_WRITEPROTECT, &uffdio_writeprotect)) { perror("UFFDIO_WRITEPROTECT failed"); return -1; } /* If correct: should SIGBUS */ fault_in_write(mapping_to_use, 1); printf("BUG: no sigbus\n"); return 0; }