Hello Mike, On Wed, Sep 30, 2015 at 02:56:19PM -0700, Mike Kravetz wrote: > On 09/08/2015 01:43 PM, Andrea Arcangeli wrote: > > Here are some pending updates for userfaultfd mostly to the self test, > > the rest are cleanups. > > I have a potential use case for userfualtfd. So, I started experimenting Glad to hear you may have one more use case. On a side note, there's also a patch posted to CRIU to pagein lazily anonymous memory during restore using userfaultfd, that's yet another recent user. > with the self test code. I replaced the posix_memalign() calls to allocate > area_src and area_dst with mmap(). mmap(MAP_PRIVATE | MAP_ANONYMOUS) works > as expected. However, mmap(MAP_SHARED | MAP_ANONYMOUS) causes the test to > fail without any errros from the userfaultfd APIs. > > -------------------- > running userfaultfd > -------------------- > nr_pages: 32768, nr_pages_per_cpu: 8192 > bounces: 31, mode: rnd racing ver poll, page_nr 31523 wrong count 0 1 > > I would expect some type of error from the ioctl() that registers the > range, or perhaps the poll/copy code? Just curious about the expected > behavior. That should return an error during UFFDIO_REGISTER and the testcase shouldn't start, not sure what went wrong. Can you send the modification to the testcase? UFFDIO_REGISTER is the point where userfaultfd is first told which kind of memory you want to manage with userfaults. It was planned to fail there (and it cannot fail any earlier). This check has to fail and return -EINVAL in the ioctl(UFFDIO_REGISTER). /* check not compatible vmas */ ret = -EINVAL; if (cur->vm_ops) goto out_unlock; In the testcase you should get an exit 1 and the fprintf printed: if (ioctl(uffd, UFFDIO_REGISTER, &uffdio_register)) { fprintf(stderr, "register failure\n"); return 1; } Could you double check these two paths to find what's wrong? > FYI - My use case is for hugetlbfs. I would like a mechanism to catch all > new huge page allocations as a result of page faults. I have some very > rough code to extend userfualtfd and add the required functionality to > hugetlbfs. Still working on it. Adding support for hugetlbfs sounds great to me. Only anonymous memory has null vm_ops, so once you extend the code to track hugetlbfs (tracking at least tmpfs and not just anonymous memory is needed for volatile pages which also work on tmpfs) you should relax the above check to accept &hugetlb_vm_ops. You then need to specify which kind of ioctl you supported in the current kernel for that kind of memory you registered on in the uffdio_register->ioctl parameter. /* * Now that we scanned all vmas we can already tell * userland which ioctls methods are guaranteed to * succeed on this range. */ if (put_user(UFFD_API_RANGE_IOCTLS, &user_uffdio_register->ioctls)) ret = -EFAULT; #define UFFD_API_RANGE_IOCTLS \ ((__u64)1 << _UFFDIO_WAKE | \ (__u64)1 << _UFFDIO_COPY | \ (__u64)1 << _UFFDIO_ZEROPAGE) hugetlbfs doesn't seem to support the zeropage. So if vma->vm_ops == &hugetlb_vm_ops, it should return only WAKE|COPY in uffdio_register->ioctl. hugetlbfs is non standard, there's no sysconf(_SC_PAGE_SIZE) to know the minimum granularity supported by the UFFDIO_COPY|WAKE of hugetlbfs. This is a generic issue with hugetlbfs, not really related to userfaultfd. The same constraints of hugetlbfs minimum granularity and alignment applies to all other memory management syscalls too. So the app itself using hugetlbfs will have to know by other means (i.e. sysfs mangling) that the minimum granularity supported by UFFDIO_COPY is 2MB (or 1GB). That is again because it registered userfaultfd on hugetlbfs, and hugetlbfs has non standard constraints. In turn UFFDIO_COPY of hugetlbfs has to fail if len is not a multiple of 2MB (never the case for all other kinds of memory that userfaultfd could ever manage). There's flexibility in the userfaultfd API to gradually expand the coverage to a variety of types of virtual memory while at the same time not risking random behavior from a new app if run on a old kernel. The new app will be able to tell reliably to the user, to upgrade the kernel (or it can fallback to a non-userfaultfd mode with just a warning to the user). We need to handle the write protection faults too as soon as possible (VM_UFFD_WP/UFFD_FEATURE_PAGEFAULT_FLAG_WP). The uffdio_api->features are already prepared to report to userland the availability of the UFFD_FEATURE_PAGEFAULT_FLAG_WP. Then the app can set UFFDIO_REGISTER_MODE_WP in uffdio_register.mode. I mentioned this because while there's flexibility to expand the coverage gradually, it'd be great if all kinds of memory supporting UFFDIO_REGISTER_MODE_MISSING would also support UFFDIO_REGISTER_MODE_WP once that gets available, as it'd keep userfaultfd_register() a bit simpler to maintain. Thanks, Andrea -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>