On Wed, 29 Jun 2016, Vlastimil Babka wrote: > On 06/29/2016 03:39 AM, David Rientjes wrote: > > It's possible that the freeing scanner can be consistently expensive if > > memory is well compacted toward the end of the zone with few free pages > > available in that area. > > > > If all zone memory is synchronously compacted, say with > > /proc/sys/vm/compact_memory, and thp is faulted, it is possible to > > iterate a massive amount of memory even with the per-zone cached free > > position. > > > > For example, after compacting all memory and faulting thp for heap, it > > was observed that compact_free_scanned increased as much as 892518911 4KB > > pages while compact_stall only increased by 171. The freeing scanner > > iterated ~20GB of memory for each compaction stall. > > > > To address this, if too much memory is spanned on the freeing scanner's > > freelist when releasing back to the system, return the low pfn rather than > > the high pfn. It's declared that the freeing scanner will become too > > expensive if the high pfn is used, so use the low pfn instead. > > > > The amount of memory declared as too expensive to iterate is subjectively > > chosen at COMPACT_CLUSTER_MAX << PAGE_SHIFT, which is 512MB with 4KB > > pages. > > > > Signed-off-by: David Rientjes <rientjes@xxxxxxxxxx> > > Hmm, I don't know. Seems it only works around one corner case of a larger > issue. The cost for the scanning was already paid, the patch prevents it from > being paid again, but only until the scanners are reset. > The only point of the per-zone cached pfn positions is to avoid doing the same work again unnecessarily. Having the last 16GB of memory at the end of a zone being completely unfree is the same as a single page in the last pageblock free. The number of PageBuddy pages in that amount of memory can be irrelevant up to COMPACT_CLUSTER_MAX. We simply can't afford to scan 16GB of memory looking for free pages. > Note also that THP's no longer do direct compaction by default in recent > kernels. > > To fully solve the freepage scanning issue, we should probably pick and finish > one of the proposed reworks from Joonsoo or myself, or the approach that > replaces free scanner with direct freelist allocations. > Feel free to post the patches, but I believe this simple change makes release_freepages() exceedingly better and can better target memory for the freeing scanner. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>