On Fri, Aug 19, 2016 at 04:52:51PM +0300, Pavel Emelyanov wrote: > Hm... Are you talking about some in-kernel test, or just any? We have > tests in CRIU tree for UFFD (not sure we've wired up the non-cooperative > part though). Nice. I wasn't aware you had uffd specific tests in CRIU, I'll check. I was referring to the tools/testing/selftest/vm/userfault*, but I suppose it's fine in CIRU as well. A self contained test suitable for testing/selftest would be nice too as not everyone will run CRIU tests to test the kernel. Currently what's tested is anon missing, tmpfs missing and hugetlbfs missing and they all work (just fixed two tmpfs bugs yesterday thanks to the tmpfs test that crashed my workstation when I tried it, now it passes fine :). > And my main worry about this is COW-sharing. If we have two tasks that > fork()-ed from each other and we try to lazily restore a page that > is still COW-ed between them, the uffd API doesn't give us anything to > do it. So we effectively break COW on lazy restore. Do you have any > ideas what can be done about it? Building a shared page is tricky, not even khugepaged was doing that for anon. Kirill extended khugepaged to do it, along the THP on tmpfs support, as it's more important for tmpfs (I haven't yet checked if it landed upstream with the rest of tmpfs in 4.8-rc though). The main API problem is the uffd is different between parent and child, fork with your non cooperative patches gives you a new uffd that represents the child mm. To create a shared page among two "mm" the API should be able to specify the two "mm" and two "addresses" atomically in the same ioctl. And the uffd _is_ the "mm" with the current API. So what it takes to do it is to add a UFFDIO_COPY_COW that takes as parameter an address for the current "uffd" and a list of "int uffd, unsigned long address" pairs. Even with the UFFDIO_COPY things should still work solid, it'll just take more memory and it'll break-COW during restore. The important thing is "break" is as in "allocate more memory", not as in "crashing" :). > We have ... readiness to do it :) since once CRIU hits this we'll have to. Ok great. I also thought about it a bit and I think it's just a matter of specifying which uffd should get the notification first. The manager then will take the notification first and it will call an UFFDIO_FAULT_PASS to cascade in the second uffd registered in the region if the page was missing in the source container, without waking up the task blocked in handle_userfault. To find the page is missing in the source container you could use pagemap. Thanks, Andrea -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>