On Thu 05-01-17 10:53:59, Vlastimil Babka wrote: > [CC Joonsoo and Johannes] > > On 12/30/2016 03:06 PM, Mel Gorman wrote: > > On Fri, Dec 30, 2016 at 02:14:12PM +0100, Michal Hocko wrote: > >> Hi, > >> I didn't originally want to send this proposal because Vlastimil is > >> planning to do some work in this area so I've expected him to send > >> something similar. But the recent discussion about the THP defrag > >> options pushed me to send out my thoughts. > > No problem. > > >> So what is the problem? The demand for high order pages is growing and > >> that seems to be the general trend. The problem is that while they can > >> bring performance benefit they can get be really expensive to allocate > >> especially when we enter the direct compaction. So we really want to > >> prevent from expensive path and defer as much as possible to the > >> background. A huge step forward was kcompactd introduced by Vlastimil. > >> We are still not there yet though, because it might be already quite > >> late when we wakeup_kcompactd(). The memory might be already fragmented > >> when we hit there. > > Right. > > >> Moreover we do not have any way to actually tell > >> which orders we do care about. > > Who is "we" here? The system admin? yes > >> Therefore I believe we need a watermark based pro-active compaction > >> which would keep the background compaction busy as long as we have > >> less pages of the configured order. > > Again, configured by what, admin? I would rather try to avoid tunables > here, if possible. While THP is quite well known example with stable > order, the pressure for other orders is rather implementation specific > (drivers, SLAB/SLUB) and may change with kernel versions (e.g. virtually > mapped stacks, although that example is about non-costly order). Would > the admin be expected to study the implementation to know which orders > are needed, or react to page allocation failure reports? Neither sounds > nice. That is a good question but I expect that there are more users than THP which use stable orders. E.g. networking stack tends to depend on the packet size. A tracepoint with some histogram output would tell us what is the requested orders distribution. > >> kcompactd should wake up > >> periodically, I think, and check for the status so that we can catch > >> the fragmentation before we get low on memory. > >> The interface could look something like: > >> /proc/sys/vm/compact_wmark > >> time_period order count > > IMHO it would be better if the system could auto-tune this, e.g. by > counting high-order alloc failures/needs for direct compaction per order > between wakeups, and trying to bring them to zero. auto-tunning is usually preferable I am just wondering how the admin can tell what is still the system load price he is willing to pay. I suspect we will see growing number of opportunistic high order requests over time and auto tunning shouldn't try to accomodate with it without any bounds. There is still some cost/benefit to be evaluated from the system level point of view which I am afraid is hard to achive from the kcompactd POV. -- Michal Hocko SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>