On 2016/2/6 0:11, Joonsoo Kim wrote: > 2016-02-05 9:49 GMT+09:00 Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx>: >> On Thu, 4 Feb 2016 15:19:35 +0900 Joonsoo Kim <js1304@xxxxxxxxx> wrote: >> >>> There is a performance drop report due to hugepage allocation and in there >>> half of cpu time are spent on pageblock_pfn_to_page() in compaction [1]. >>> In that workload, compaction is triggered to make hugepage but most of >>> pageblocks are un-available for compaction due to pageblock type and >>> skip bit so compaction usually fails. Most costly operations in this case >>> is to find valid pageblock while scanning whole zone range. To check >>> if pageblock is valid to compact, valid pfn within pageblock is required >>> and we can obtain it by calling pageblock_pfn_to_page(). This function >>> checks whether pageblock is in a single zone and return valid pfn >>> if possible. Problem is that we need to check it every time before >>> scanning pageblock even if we re-visit it and this turns out to >>> be very expensive in this workload. >>> >>> Although we have no way to skip this pageblock check in the system >>> where hole exists at arbitrary position, we can use cached value for >>> zone continuity and just do pfn_to_page() in the system where hole doesn't >>> exist. This optimization considerably speeds up in above workload. >>> >>> Before vs After >>> Max: 1096 MB/s vs 1325 MB/s >>> Min: 635 MB/s 1015 MB/s >>> Avg: 899 MB/s 1194 MB/s >>> >>> Avg is improved by roughly 30% [2]. >>> >>> [1]: http://www.spinics.net/lists/linux-mm/msg97378.html >>> [2]: https://lkml.org/lkml/2015/12/9/23 >>> >>> ... >>> >>> --- a/include/linux/memory_hotplug.h >>> +++ b/include/linux/memory_hotplug.h >>> @@ -196,6 +196,9 @@ void put_online_mems(void); >>> void mem_hotplug_begin(void); >>> void mem_hotplug_done(void); >>> >>> +extern void set_zone_contiguous(struct zone *zone); >>> +extern void clear_zone_contiguous(struct zone *zone); >>> + >>> #else /* ! CONFIG_MEMORY_HOTPLUG */ >>> /* >>> * Stub functions for when hotplug is off >> >> Was it really intended that these declarations only exist if >> CONFIG_MEMORY_HOTPLUG? Seems unrelated. > > These are called for caching memory layout whether it is contiguous > or not. So, they are always called in memory initialization. Then, > hotplug could change memory layout so they should be called > there, too. So, they are defined in page_alloc.c and exported only > if CONFIG_MEMORY_HOTPLUG. > >> The i386 allnocofnig build fails in preditable ways so I fixed that up >> as below, but it seems wrong. > > Yeah, it seems wrong to me. :) > Here goes fix. > > ----------->8------------ >>From ed6add18bc361e00a7ac6746de6eeb62109e6416 Mon Sep 17 00:00:00 2001 > From: Joonsoo Kim <iamjoonsoo.kim@xxxxxxx> > Date: Thu, 10 Dec 2015 17:03:54 +0900 > Subject: [PATCH] mm/compaction: speed up pageblock_pfn_to_page() when zone is > contiguous > > There is a performance drop report due to hugepage allocation and in there > half of cpu time are spent on pageblock_pfn_to_page() in compaction [1]. > In that workload, compaction is triggered to make hugepage but most of > pageblocks are un-available for compaction due to pageblock type and > skip bit so compaction usually fails. Most costly operations in this case > is to find valid pageblock while scanning whole zone range. To check > if pageblock is valid to compact, valid pfn within pageblock is required > and we can obtain it by calling pageblock_pfn_to_page(). This function > checks whether pageblock is in a single zone and return valid pfn > if possible. Problem is that we need to check it every time before > scanning pageblock even if we re-visit it and this turns out to > be very expensive in this workload. > > Although we have no way to skip this pageblock check in the system > where hole exists at arbitrary position, we can use cached value for > zone continuity and just do pfn_to_page() in the system where hole doesn't > exist. This optimization considerably speeds up in above workload. > > Before vs After > Max: 1096 MB/s vs 1325 MB/s > Min: 635 MB/s 1015 MB/s > Avg: 899 MB/s 1194 MB/s > > Avg is improved by roughly 30% [2]. > > [1]: http://www.spinics.net/lists/linux-mm/msg97378.html > [2]: https://lkml.org/lkml/2015/12/9/23 > > v3 > o remove pfn_valid_within() check for all pages in the pageblock > because pageblock_pfn_to_page() is only called with pageblock aligned pfn. I have a question about the zone continuity. because hole exists at arbitrary position in a page block. Therefore, only pageblock_pf_to_page() is insufficiency, whether pageblock aligned pfn or not , the pfn_valid_within() is necessary. eh: 120M-122M is a range of page block, but the 120.5M-121.5M is holes, only by pageblock_pfn_to_page() to conclude in the result is inaccurate Thanks zhongjiang > v2 > o checking zone continuity after initialization > o handle memory-hotplug case > > Reported and Tested-by: Aaron Lu <aaron.lu@xxxxxxxxx> > Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@xxxxxxx> > --- > include/linux/gfp.h | 6 ---- > include/linux/memory_hotplug.h | 3 ++ > include/linux/mmzone.h | 2 ++ > mm/compaction.c | 43 ----------------------- > mm/internal.h | 12 +++++++ > mm/memory_hotplug.c | 9 +++++ > mm/page_alloc.c | 78 +++++++++++++++++++++++++++++++++++++++++- > 7 files changed, 103 insertions(+), 50 deletions(-) > > diff --git a/include/linux/gfp.h b/include/linux/gfp.h > index 28ad5f6..bd7fccc 100644 > --- a/include/linux/gfp.h > +++ b/include/linux/gfp.h > @@ -515,13 +515,7 @@ void drain_zone_pages(struct zone *zone, struct > per_cpu_pages *pcp); > void drain_all_pages(struct zone *zone); > void drain_local_pages(struct zone *zone); > > -#ifdef CONFIG_DEFERRED_STRUCT_PAGE_INIT > void page_alloc_init_late(void); > -#else > -static inline void page_alloc_init_late(void) > -{ > -} > -#endif > > /* > * gfp_allowed_mask is set to GFP_BOOT_MASK during early boot to restrict what > diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h > index 4340599..e960b78 100644 > --- a/include/linux/memory_hotplug.h > +++ b/include/linux/memory_hotplug.h > @@ -196,6 +196,9 @@ void put_online_mems(void); > void mem_hotplug_begin(void); > void mem_hotplug_done(void); > > +extern void set_zone_contiguous(struct zone *zone); > +extern void clear_zone_contiguous(struct zone *zone); > + > #else /* ! CONFIG_MEMORY_HOTPLUG */ > /* > * Stub functions for when hotplug is off > diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h > index 7b6c2cf..f12b950 100644 > --- a/include/linux/mmzone.h > +++ b/include/linux/mmzone.h > @@ -520,6 +520,8 @@ struct zone { > bool compact_blockskip_flush; > #endif > > + bool contiguous; > + > ZONE_PADDING(_pad3_) > /* Zone statistics */ > atomic_long_t vm_stat[NR_VM_ZONE_STAT_ITEMS]; > diff --git a/mm/compaction.c b/mm/compaction.c > index 8ce36eb..93f71d9 100644 > --- a/mm/compaction.c > +++ b/mm/compaction.c > @@ -71,49 +71,6 @@ static inline bool migrate_async_suitable(int migratetype) > return is_migrate_cma(migratetype) || migratetype == MIGRATE_MOVABLE; > } > > -/* > - * Check that the whole (or subset of) a pageblock given by the interval of > - * [start_pfn, end_pfn) is valid and within the same zone, before scanning it > - * with the migration of free compaction scanner. The scanners then need to > - * use only pfn_valid_within() check for arches that allow holes within > - * pageblocks. > - * > - * Return struct page pointer of start_pfn, or NULL if checks were not passed. > - * > - * It's possible on some configurations to have a setup like node0 node1 node0 > - * i.e. it's possible that all pages within a zones range of pages do not > - * belong to a single zone. We assume that a border between node0 and node1 > - * can occur within a single pageblock, but not a node0 node1 node0 > - * interleaving within a single pageblock. It is therefore sufficient to check > - * the first and last page of a pageblock and avoid checking each individual > - * page in a pageblock. > - */ > -static struct page *pageblock_pfn_to_page(unsigned long start_pfn, > - unsigned long end_pfn, struct zone *zone) > -{ > - struct page *start_page; > - struct page *end_page; > - > - /* end_pfn is one past the range we are checking */ > - end_pfn--; > - > - if (!pfn_valid(start_pfn) || !pfn_valid(end_pfn)) > - return NULL; > - > - start_page = pfn_to_page(start_pfn); > - > - if (page_zone(start_page) != zone) > - return NULL; > - > - end_page = pfn_to_page(end_pfn); > - > - /* This gives a shorter code than deriving page_zone(end_page) */ > - if (page_zone_id(start_page) != page_zone_id(end_page)) > - return NULL; > - > - return start_page; > -} > - > #ifdef CONFIG_COMPACTION > > /* Do not skip compaction more than 64 times */ > diff --git a/mm/internal.h b/mm/internal.h > index 9006ce1..9609755 100644 > --- a/mm/internal.h > +++ b/mm/internal.h > @@ -140,6 +140,18 @@ __find_buddy_index(unsigned long page_idx, > unsigned int order) > return page_idx ^ (1 << order); > } > > +extern struct page *__pageblock_pfn_to_page(unsigned long start_pfn, > + unsigned long end_pfn, struct zone *zone); > + > +static inline struct page *pageblock_pfn_to_page(unsigned long start_pfn, > + unsigned long end_pfn, struct zone *zone) > +{ > + if (zone->contiguous) > + return pfn_to_page(start_pfn); > + > + return __pageblock_pfn_to_page(start_pfn, end_pfn, zone); > +} > + > extern int __isolate_free_page(struct page *page, unsigned int order); > extern void __free_pages_bootmem(struct page *page, unsigned long pfn, > unsigned int order); > diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c > index 4af58a3..f06916c 100644 > --- a/mm/memory_hotplug.c > +++ b/mm/memory_hotplug.c > @@ -509,6 +509,8 @@ int __ref __add_pages(int nid, struct zone *zone, > unsigned long phys_start_pfn, > int start_sec, end_sec; > struct vmem_altmap *altmap; > > + clear_zone_contiguous(zone); > + > /* during initialize mem_map, align hot-added range to section */ > start_sec = pfn_to_section_nr(phys_start_pfn); > end_sec = pfn_to_section_nr(phys_start_pfn + nr_pages - 1); > @@ -540,6 +542,8 @@ int __ref __add_pages(int nid, struct zone *zone, > unsigned long phys_start_pfn, > } > vmemmap_populate_print_last(); > > + set_zone_contiguous(zone); > + > return err; > } > EXPORT_SYMBOL_GPL(__add_pages); > @@ -811,6 +815,8 @@ int __remove_pages(struct zone *zone, unsigned > long phys_start_pfn, > } > } > > + clear_zone_contiguous(zone); > + > /* > * We can only remove entire sections > */ > @@ -826,6 +832,9 @@ int __remove_pages(struct zone *zone, unsigned > long phys_start_pfn, > if (ret) > break; > } > + > + set_zone_contiguous(zone); > + > return ret; > } > EXPORT_SYMBOL_GPL(__remove_pages); > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > index d60c860..059f9c0 100644 > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -1105,6 +1105,75 @@ void __init __free_pages_bootmem(struct page > *page, unsigned long pfn, > return __free_pages_boot_core(page, pfn, order); > } > > +/* > + * Check that the whole (or subset of) a pageblock given by the interval of > + * [start_pfn, end_pfn) is valid and within the same zone, before scanning it > + * with the migration of free compaction scanner. The scanners then need to > + * use only pfn_valid_within() check for arches that allow holes within > + * pageblocks. > + * > + * Return struct page pointer of start_pfn, or NULL if checks were not passed. > + * > + * It's possible on some configurations to have a setup like node0 node1 node0 > + * i.e. it's possible that all pages within a zones range of pages do not > + * belong to a single zone. We assume that a border between node0 and node1 > + * can occur within a single pageblock, but not a node0 node1 node0 > + * interleaving within a single pageblock. It is therefore sufficient to check > + * the first and last page of a pageblock and avoid checking each individual > + * page in a pageblock. > + */ > +struct page *__pageblock_pfn_to_page(unsigned long start_pfn, > + unsigned long end_pfn, struct zone *zone) > +{ > + struct page *start_page; > + struct page *end_page; > + > + /* end_pfn is one past the range we are checking */ > + end_pfn--; > + > + if (!pfn_valid(start_pfn) || !pfn_valid(end_pfn)) > + return NULL; > + > + start_page = pfn_to_page(start_pfn); > + > + if (page_zone(start_page) != zone) > + return NULL; > + > + end_page = pfn_to_page(end_pfn); > + > + /* This gives a shorter code than deriving page_zone(end_page) */ > + if (page_zone_id(start_page) != page_zone_id(end_page)) > + return NULL; > + > + return start_page; > +} > + > +void set_zone_contiguous(struct zone *zone) > +{ > + unsigned long block_start_pfn = zone->zone_start_pfn; > + unsigned long block_end_pfn; > + > + block_end_pfn = ALIGN(block_start_pfn + 1, pageblock_nr_pages); > + for (; block_start_pfn < zone_end_pfn(zone); > + block_start_pfn = block_end_pfn, > + block_end_pfn += pageblock_nr_pages) { > + > + block_end_pfn = min(block_end_pfn, zone_end_pfn(zone)); > + > + if (!__pageblock_pfn_to_page(block_start_pfn, > + block_end_pfn, zone)) > + return; > + } > + > + /* We confirm that there is no hole */ > + zone->contiguous = true; > +} > + > +void clear_zone_contiguous(struct zone *zone) > +{ > + zone->contiguous = false; > +} > + > #ifdef CONFIG_DEFERRED_STRUCT_PAGE_INIT > static void __init deferred_free_range(struct page *page, > unsigned long pfn, int nr_pages) > @@ -1255,9 +1324,13 @@ free_range: > pgdat_init_report_one_done(); > return 0; > } > +#endif /* CONFIG_DEFERRED_STRUCT_PAGE_INIT */ > > void __init page_alloc_init_late(void) > { > + struct zone *zone; > + > +#ifdef CONFIG_DEFERRED_STRUCT_PAGE_INIT > int nid; > > /* There will be num_node_state(N_MEMORY) threads */ > @@ -1271,8 +1344,11 @@ void __init page_alloc_init_late(void) > > /* Reinit limits that are based on free pages after the kernel is up */ > files_maxfiles_init(); > +#endif > + > + for_each_populated_zone(zone) > + set_zone_contiguous(zone); > } > -#endif /* CONFIG_DEFERRED_STRUCT_PAGE_INIT */ > > #ifdef CONFIG_CMA > /* Free whole pageblock and set its migration type to MIGRATE_CMA. */ -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>