On Tue 30-05-17 16:04:56, Andrea Arcangeli wrote: > On Tue, May 30, 2017 at 12:39:30PM +0200, Michal Hocko wrote: > > On Tue 30-05-17 13:19:22, Mike Rapoport wrote: > > > On Tue, May 30, 2017 at 09:44:08AM +0200, Michal Hocko wrote: > > > > On Wed 24-05-17 17:27:36, Mike Rapoport wrote: > > > > > On Wed, May 24, 2017 at 01:18:00PM +0200, Michal Hocko wrote: > > > > [...] > > > > > > Why cannot khugepaged simply skip over all VMAs which have userfault > > > > > > regions registered? This would sound like a less error prone approach to > > > > > > me. > > > > > > > > > > khugepaged does skip over VMAs which have userfault. We could register the > > > > > regions with userfault before populating them to avoid collapses in the > > > > > transition period. > > > > > > > > Why cannot you register only post-copy regions and "manually" copy the > > > > pre-copy parts? > > > > > > We can register only post-copy regions, but this will cause VMA > > > fragmentation. Now we register the entire VMA with userfaultfd, no matter > > > how many pages were dirtied there since the pre-dump. If we register only > > > post-copy regions, we will split out the VMAs for those regions. > > > > Is this really a problem, though? > > It would eventually get -ENOMEM or at best create lots of unnecessary > vmas (at least UFFDIO_COPY would never risk to trigger -ENOMEM). I sysctl for the mapcount can be increased, right? I also assume that those vmas will get merged after the post copy is done. > The only attractive alternative is to use UFFDIO_COPY for precopy too > after pre-registering the whole range in uffd (which would happen > later anyway to start postcopy). > > > It would be good to measure that though. You are proposing a new user > > API and the THP api is quite convoluted already so there better be a > > very good reason to add a new API. So far I can only see that it would > > be more convinient to add another madvise command and that is rather > > insufficient justification IMHO. Also do you expect somebody else would > > use new madvise? What would be the usecase? > > UFFDIO_COPY while not being a major slowdown for sure, it's likely > measurable at the microbenchmark level because it would add a > enter/exit kernel to every 4k memcpy. It's not hard to imagine that as > measurable. How that impacts the total precopy time I don't know, it > would need to be benchmarked to be sure. Yes, please! > The main benefit of this > madvise is precisely to skip those enter/exit kernel that UFFDIO_COPY > would add. Even if the impact on the total precopy time wouldn't be > measurable (i.e. if it's network bound load), the madvise that allows > using memcpy after setting VM_NOHUGEPAGE, would free up some CPU > cycles in the destination that could be used by other processes. I understand that part but it sounds awfully one purpose thing to me. Are we going to add other MADVISE_RESET_$FOO to clear other flags just because we can race in this specific use case? > About the proposed madvise, it just clear bits, but it doesn't change > at all how those bits are computed in THP code. So I don't see it as > convoluted. But we already have MADV_HUGEPAGE, MADV_NOHUGEPAGE and prctl to enable/disable thp. Doesn't that sound little bit too much for a single feature to you? -- Michal Hocko SUSE Labs -- To unsubscribe from this list: send the line "unsubscribe linux-api" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html