On Fri, Feb 26, 2016 at 12:02:19AM +0100, Andrea Arcangeli wrote: > On Thu, Feb 25, 2016 at 07:56:13PM +0000, Mel Gorman wrote: > > Which is a specialised case that does not apply to all users. Remember > > that the data showed that a basic streaming write of an anon mapping on > > a freshly booted NUMA system was enough to stall the process for long > > periods of time. > > > > Even in the specialised case, a single VM reaching its peak performance > > may rely on getting THP but if that's at the cost of reclaiming other > > pages that may be hot to a second VM then it's an overall loss. > > You're mixing the concern of that THP will use more memory with the > cost of defragmentation. There are three cases 1. THP was allocated when the application only required 4K and consumes more memory. This has always been the case but not the concern here 2. Memory is fragmented but there are enough free pages. In this case, only compaction is required and the memory footprint is the same 3. Memory is fragmentation and pages have to be freed before compaction. It's 3 I was referred to even though all the cases are important. > If you've memory issues and you are ok to > sacrifice performance for swapping less you should disable THP, set it > to never, and that's it. > I want to get to the half-way point where THP is used if easily available without worrying that there will be stalls at some point in the future or requiring application modification for madvise. That's better than the all or nothing approach that users are currently faced with. I wince every time I see a tuning guide suggesting THP be disabled and have handled too many bugs where disabling THP was a workaround. That said, you made a number of important points. I'm not going to respond to them individually because I believe I understand your concerns and now agree with them. I've prototyped a patch that modifies the defrag tunable as follows; 1. By default, "madvise" and direct reclaim/compaction for applications that specifically requested that behaviour. This will avoid breaking MADV_HUGEPAGE which you mentioned in a few places 2. "never" will never reclaim anything and was the default behaviour of version 1 but will not be the default in version 2. 3. "defer" will wake kswapd which will reclaim or wake kcompactd whichever is necessary. This is new but avoids stalls while helping khugepaged do its work quickly in the near future. 4. "always" will direct reclaim/compact just like todays behaviour I'm testing it at the moment to make sure each of the options behave correctly. -- Mel Gorman SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>