On Thu, Feb 25, 2016 at 08:01:44PM +0100, Andrea Arcangeli wrote: > On Thu, Feb 25, 2016 at 05:12:19PM +0000, Mel Gorman wrote: > > some cases, this will reduce THP usage but the benefit of THP is hard to > > measure and not a universal win where as a stall to reclaim/compaction is > > It depends on the workload: with virtual machines THP is essential > from the start without having to wait half a khugepaged cycle in > average, especially on large systems. Which is a specialised case that does not apply to all users. Remember that the data showed that a basic streaming write of an anon mapping on a freshly booted NUMA system was enough to stall the process for long periods of time. Even in the specialised case, a single VM reaching its peak performance may rely on getting THP but if that's at the cost of reclaiming other pages that may be hot to a second VM then it's an overall loss. Finally, for the specialised case, if it really is that critical then pages could be freed preemptively from userspace before the VM starts. For example, allocate and free X hugetlbfs pages before the migration. Right now, there are numerous tuning guides out there that are suggest disabling THP entirely due to the stalls. On my own desktop, I occasionally see a new process halt the system for a few seconds and it was possible to see that THP allocations were happening at the time. > We see this effect for example > in postcopy live migraiton where --postcopy-after-precopy is essential > to reach peak performance during database workloads in guest, > immediately after postcopy completes. With --postcopy-after-precopy > only those pages that may be triggering userfaults will need to be > collapsed with khugepaged and all the rest that was previously passed > over with precopy has an high probability to be immediately THP backed > also thanks to defrag/direct-compaction. Failing at starting > the destination node largely THP backed is very visible in benchmark > (even if a full precopy pass is done first). Later on the performance > increases again as khugepaged fixes things, but it takes some time. > If it's critical that the performance is identical then I would suggest a pre-migration step of alloc/free of hugetlbfs pages to force the defragmentation. Alternatively trigger compaction from proc and if necessary use memhog to allocate/free the required memory followed by a proc compaction. It's a little less tidy but it solves the corner case while leaving the common case free of stalls. > So unless we've a very good kcompatd or a workqueue doing the job of > providing enough THP for page faults, I'm skeptical of this. Unfortunately, it'll never be perfect. We went through a cycle of having really high success rates of allocations in 3.0 days and the cost in reclaim and disruption was way too high. > Another problem is that khugepaged isn't able to collapse shared > readonly anon pages, mostly because of the rmap complexities. I agree > with Kirill we should be looking into how make this work, although I > doubt the simpler refcounting is going to help much in this regard as > the problem is in dealing with rmap, not so much with refcounts. I think that's important but I'm not seeing right now how it's related to preventing processes stalling for long periods of time in direct reclaim and compaction. -- Mel Gorman SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>