2015-12-14 19:29 GMT+09:00 Vlastimil Babka <vbabka@xxxxxxx>: > On 12/14/2015 06:02 AM, Joonsoo Kim wrote: >> >> There is a performance drop report due to hugepage allocation and in there >> half of cpu time are spent on pageblock_pfn_to_page() in compaction [1]. >> In that workload, compaction is triggered to make hugepage but most of >> pageblocks are un-available for compaction due to pageblock type and >> skip bit so compaction usually fails. Most costly operations in this case >> is to find valid pageblock while scanning whole zone range. To check >> if pageblock is valid to compact, valid pfn within pageblock is required >> and we can obtain it by calling pageblock_pfn_to_page(). This function >> checks whether pageblock is in a single zone and return valid pfn >> if possible. Problem is that we need to check it every time before >> scanning pageblock even if we re-visit it and this turns out to >> be very expensive in this workload. > > > Hm I wonder if this is safe wrt memory hotplug? Shouldn't there be a > rechecking plugged into the appropriate hotplug add/remove callbacks? Which > would make the whole thing generic too, zone->contiguous information doesn't > have to be limited to compaction. And it would remove the rather ugly part > where cached pfn info is used as an indication of zone->contiguous being > already set... Will check it. >> Although we have no way to skip this pageblock check in the system >> where hole exists at arbitrary position, we can use cached value for >> zone continuity and just do pfn_to_page() in the system where hole doesn't >> exist. This optimization considerably speeds up in above workload. >> >> Before vs After >> Max: 1096 MB/s vs 1325 MB/s >> Min: 635 MB/s 1015 MB/s >> Avg: 899 MB/s 1194 MB/s >> >> Avg is improved by roughly 30% [2]. > > > Unless I'm mistaken, these results also include my RFC series (Aaron can you > clarify?). These patches should better be tested standalone on top of base, > as being simpler they will probably be included sooner (the RFC series needs > reviews at the very least :) - although the memory hotplug concerns might > make the "sooner" here relative too. AFAIK, these patches are tested standalone on top of base. When I sent it, I asked to Aaron to test it on top of base. Btw, I missed adding Reported/Tested-by tag for Aaron. I will add it on next spin. > Anyway it's interesting that this patch improved "Min", and variance in > general (on top of my RFC) so much. I would expect the overhead of > pageblock_pfn_to_page() to be quite stable, hmm. Perhaps, pageblock_pfn_to_page() would be stable. Combination of slow scanning and kswapd's skip bit flushing would result in unstable result. Thanks. > >> Not to disturb the system where compaction isn't triggered, checking will >> be done at first compaction invocation. >> >> [1]: http://www.spinics.net/lists/linux-mm/msg97378.html >> [2]: https://lkml.org/lkml/2015/12/9/23 >> >> Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@xxxxxxx> >> --- >> include/linux/mmzone.h | 1 + >> mm/compaction.c | 49 >> ++++++++++++++++++++++++++++++++++++++++++++++++- >> 2 files changed, 49 insertions(+), 1 deletion(-) >> >> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h >> index 68cc063..cd3736e 100644 >> --- a/include/linux/mmzone.h >> +++ b/include/linux/mmzone.h >> @@ -521,6 +521,7 @@ struct zone { >> #if defined CONFIG_COMPACTION || defined CONFIG_CMA >> /* Set to true when the PG_migrate_skip bits should be cleared */ >> bool compact_blockskip_flush; >> + bool contiguous; >> #endif >> >> ZONE_PADDING(_pad3_) >> diff --git a/mm/compaction.c b/mm/compaction.c >> index 56fa321..ce60b38 100644 >> --- a/mm/compaction.c >> +++ b/mm/compaction.c >> @@ -88,7 +88,7 @@ static inline bool migrate_async_suitable(int >> migratetype) >> * the first and last page of a pageblock and avoid checking each >> individual >> * page in a pageblock. >> */ >> -static struct page *pageblock_pfn_to_page(unsigned long start_pfn, >> +static struct page *__pageblock_pfn_to_page(unsigned long start_pfn, >> unsigned long end_pfn, struct zone *zone) >> { >> struct page *start_page; >> @@ -114,6 +114,51 @@ static struct page *pageblock_pfn_to_page(unsigned >> long start_pfn, >> return start_page; >> } >> >> +static inline struct page *pageblock_pfn_to_page(unsigned long start_pfn, >> + unsigned long end_pfn, struct zone *zone) >> +{ >> + if (zone->contiguous) >> + return pfn_to_page(start_pfn); >> + >> + return __pageblock_pfn_to_page(start_pfn, end_pfn, zone); >> +} >> + >> +static void check_zone_contiguous(struct zone *zone) >> +{ >> + unsigned long block_start_pfn = zone->zone_start_pfn; >> + unsigned long block_end_pfn; >> + unsigned long pfn; >> + >> + /* Already initialized if cached pfn is non-zero */ >> + if (zone->compact_cached_migrate_pfn[0] || >> + zone->compact_cached_free_pfn) >> + return; >> + >> + /* Mark that checking is in progress */ >> + zone->compact_cached_free_pfn = ULONG_MAX; >> + >> + block_end_pfn = ALIGN(block_start_pfn + 1, pageblock_nr_pages); >> + for (; block_start_pfn < zone_end_pfn(zone); >> + block_start_pfn = block_end_pfn, >> + block_end_pfn += pageblock_nr_pages) { >> + >> + block_end_pfn = min(block_end_pfn, zone_end_pfn(zone)); >> + >> + if (!__pageblock_pfn_to_page(block_start_pfn, >> + block_end_pfn, zone)) >> + return; >> + >> + /* Check validity of pfn within pageblock */ >> + for (pfn = block_start_pfn; pfn < block_end_pfn; pfn++) { >> + if (!pfn_valid_within(pfn)) >> + return; >> + } >> + } >> + >> + /* We confirm that there is no hole */ >> + zone->contiguous = true; >> +} >> + >> #ifdef CONFIG_COMPACTION >> >> /* Do not skip compaction more than 64 times */ >> @@ -1357,6 +1402,8 @@ static int compact_zone(struct zone *zone, struct >> compact_control *cc) >> ; >> } >> >> + check_zone_contiguous(zone); >> + >> /* >> * Clear pageblock skip if there were failures recently and >> compaction >> * is about to be retried after being deferred. kswapd does not do >> > -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>