On 07.03.20 21:33, David Hildenbrand wrote: > On 20.02.20 16:53, Peter Xu wrote: >> [Resend v6] >> >> This is v6 of the series. It is majorly a rebase to 5.6-rc2, nothing >> else to be expected (plus some tests after the rebase). Instead of >> rewrite the cover letter I decided to use what we have for v5. >> >> Adding extra CCs for both Bobby Powers <bobbypowers@xxxxxxxxx> and >> Brian Geffon <bgeffon@xxxxxxxxxx>. >> >> Online repo: https://github.com/xzpeter/linux/tree/mm-pf-signal-retry >> >> Any review comment is appreciated. Thanks, > > If I am not completely missing something (and all my testing today was > wrong) there is a very simple reason why I *LOVE* this series and it made > my weekend. It makes userfaultfd with concurrent discarding (e.g., > MADV_DONTNEED) of pages actually usable. > > The issue in current code is that between placing a page and waking > up a waiter, somebody can zap the new placed page and trigger > re-fault, triggering a SIGBUS and crashing an application where all > memory is supposed to be accessible. And there is no real way to protect > from that, because when the fault handler will be woken up and retry > is not deterministic (e.g., making madvise(MADV_DONTNEED) and > UFFDIO_ZEROPAGE mutually exclusive does not help). > > Find a simple reproducer at the end of this mail. See below for another test case. It's not immediately clear why we can get SIGBUS in this example, UFFD_FEATURE_EVENT_REMOVE was supposed to protect us from this - I thought. I think because the actual discard is delayed after the UFFD_EVENT_REMOVE has been processed until the next UFFDIO_ZEROPAGE takes place. Then we have the same race as in the other test case. It also showcases why the use of UFFD_FEATURE_EVENT_REMOVE in combination with placing of pages in a handler can easily deadlock. Before this series: SIGBUS - FAULT_FLAG_ALLOW_RETRY missing 70 After this series: Keeps on running But the general behavior of UFFD_FEATURE_EVENT_REMOVE is not really useful in practice due to the possible deadlock that I work around in this patch. You can drop the read() etc. from the loop to observe the deadlock. --- #include <string.h> #include <stdbool.h> #include <stdint.h> #include <sys/types.h> #include <stdio.h> #include <pthread.h> #include <errno.h> #include <unistd.h> #include <stdlib.h> #include <fcntl.h> #include <poll.h> #include <linux/userfaultfd.h> #include <sys/mman.h> #include <sys/syscall.h> #include <sys/ioctl.h> static int page_size; static void *fault_handler_thread(void *arg) { const long uffd = (long) arg; struct pollfd pollfd = { .fd = uffd, .events = POLLIN, }; int ret; while (true) { struct uffdio_zeropage zeropage = {}; struct uffd_msg msg; ssize_t nread; if (poll(&pollfd, 1, -1) == -1) { fprintf(stderr, "POLL failed: %s\n", strerror(errno)); exit(-1); } if (read(uffd, &msg, sizeof(msg)) != sizeof(msg)) { fprintf(stderr, "READ failed\n"); exit(-1); } if (msg.event == UFFD_EVENT_REMOVE) { continue; } if (msg.event != UFFD_EVENT_PAGEFAULT) { fprintf(stderr, "Not UFFD_EVENT_PAGEFAULT\n"); exit(-1); } zeropage.range.start = msg.arg.pagefault.address; zeropage.range.len = page_size; /* * The following loop would deadlock in case we get a MADV_DONTNEED * at the wrong time. UFFDIO_ZEROPAGE won't be able to make progress * until we processed the UFFD_EVENT_REMOVE. So we manually have * to read and process the UFFD_EVENT_REMOVE. This is nasty, because * once we could get multiple UFFD_EVENT_PAGEFAULT, this would no * longer be usable - we would get another pagefault request, which * we cannot process and so on ... */ do { ret = ioctl(uffd, UFFDIO_ZEROPAGE, &zeropage); if (ret && errno != EAGAIN) { fprintf(stderr, "UFFDIO_ZEROPAGE failed:%s\n", strerror(errno)); exit(-1); } /* * We're expecting a UFFD_EVENT_REMOVE event Soemtimes we * get none ... ? */ if (read(uffd, &msg, sizeof(msg)) != sizeof(msg)) { continue; } if (msg.event != UFFD_EVENT_REMOVE) { fprintf(stderr, "Not a remove event\n"); continue; } } while (ret); } } static void *discard_thread(void *arg) { while (true) { if (madvise(arg, page_size, MADV_DONTNEED)) { fprintf(stderr, "MADV_DONTNEED failed:%s\n", strerror(errno)); exit(-1); } printf("Discard!\n"); usleep(1000); } } int main(void) { struct uffdio_register reg; struct uffdio_api api = { .api = UFFD_API, .features = UFFD_FEATURE_EVENT_REMOVE, }; pthread_t fault, discard; long uffd; char *area; page_size = sysconf(_SC_PAGE_SIZE); uffd = syscall(__NR_userfaultfd, O_CLOEXEC | O_NONBLOCK); if (uffd == -1) { fprintf(stderr, "Could not create uffd: %s\n", strerror(errno)); exit(-1); } if (ioctl(uffd, UFFDIO_API, &api) == -1) { fprintf(stderr, "UFFDIO_API failed: %s\n", strerror(errno)); exit(-1); } area = mmap(NULL, page_size, PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_ANONYMOUS, -1, 0); if (area == MAP_FAILED) { fprintf(stderr, "Could not allocate memory"); exit(-1); } reg.range.start = (uint64_t) area; reg.range.len = page_size, reg.mode = UFFDIO_REGISTER_MODE_MISSING; if (ioctl(uffd, UFFDIO_REGISTER, ®) == -1) { fprintf(stderr, "UFFDIO_REGISTER failed: %s\n", strerror(errno)); exit(-1); } /* thread to provide zeropages */ if (pthread_create(&fault, NULL, fault_handler_thread, (void *) uffd)) { fprintf(stderr, "Could not create fault handing thread"); exit(-1); } /* thread to discard the page */ if (pthread_create(&discard, NULL, discard_thread, (void *) area)) { fprintf(stderr, "Could not create discard thread"); exit(-1); } /* keep reading/writing the page */ while (true) { area[7] = area[1]; usleep(10000); printf("Progress!\n"); } return 0; } -- Thanks, David / dhildenb