Re: [PATCH 2/2] mm/compaction: speed up pageblock_pfn_to_page() when zone is contiguous

Joonsoo Kim <js1304@xxxxxxxxx> · Tue, 15 Dec 2015 00:25:44 +0900

2015-12-14 19:29 GMT+09:00 Vlastimil Babka <vbabka@xxxxxxx>:
> On 12/14/2015 06:02 AM, Joonsoo Kim wrote:
>>
>> There is a performance drop report due to hugepage allocation and in there
>> half of cpu time are spent on pageblock_pfn_to_page() in compaction [1].
>> In that workload, compaction is triggered to make hugepage but most of
>> pageblocks are un-available for compaction due to pageblock type and
>> skip bit so compaction usually fails. Most costly operations in this case
>> is to find valid pageblock while scanning whole zone range. To check
>> if pageblock is valid to compact, valid pfn within pageblock is required
>> and we can obtain it by calling pageblock_pfn_to_page(). This function
>> checks whether pageblock is in a single zone and return valid pfn
>> if possible. Problem is that we need to check it every time before
>> scanning pageblock even if we re-visit it and this turns out to
>> be very expensive in this workload.
>
>
> Hm I wonder if this is safe wrt memory hotplug? Shouldn't there be a
> rechecking plugged into the appropriate hotplug add/remove callbacks? Which
> would make the whole thing generic too, zone->contiguous information doesn't
> have to be limited to compaction. And it would remove the rather ugly part
> where cached pfn info is used as an indication of zone->contiguous being
> already set...

Will check it.

>> Although we have no way to skip this pageblock check in the system
>> where hole exists at arbitrary position, we can use cached value for
>> zone continuity and just do pfn_to_page() in the system where hole doesn't
>> exist. This optimization considerably speeds up in above workload.
>>
>> Before vs After
>> Max: 1096 MB/s vs 1325 MB/s
>> Min: 635 MB/s 1015 MB/s
>> Avg: 899 MB/s 1194 MB/s
>>
>> Avg is improved by roughly 30% [2].
>
>
> Unless I'm mistaken, these results also include my RFC series (Aaron can you
> clarify?). These patches should better be tested standalone on top of base,
> as being simpler they will probably be included sooner (the RFC series needs
> reviews at the very least :) - although the memory hotplug concerns might
> make the "sooner" here relative too.

AFAIK, these patches are tested standalone on top of base. When I sent it,
I asked to Aaron to test it on top of base.

Btw, I missed adding Reported/Tested-by tag for Aaron. I will add it
on next spin.

> Anyway it's interesting that this patch improved "Min", and variance in
> general (on top of my RFC) so much. I would expect the overhead of
> pageblock_pfn_to_page() to be quite stable, hmm.

Perhaps, pageblock_pfn_to_page() would be stable. Combination of
slow scanning and kswapd's skip bit flushing would result in unstable result.

Thanks.

>
>> Not to disturb the system where compaction isn't triggered, checking will
>> be done at first compaction invocation.
>>
>> [1]: http://www.spinics.net/lists/linux-mm/msg97378.html
>> [2]: https://lkml.org/lkml/2015/12/9/23
>>
>> Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@xxxxxxx>
>> ---
>>   include/linux/mmzone.h |  1 +
>>   mm/compaction.c        | 49
>> ++++++++++++++++++++++++++++++++++++++++++++++++-
>>   2 files changed, 49 insertions(+), 1 deletion(-)
>>
>> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
>> index 68cc063..cd3736e 100644
>> --- a/include/linux/mmzone.h
>> +++ b/include/linux/mmzone.h
>> @@ -521,6 +521,7 @@ struct zone {
>>   #if defined CONFIG_COMPACTION || defined CONFIG_CMA
>>         /* Set to true when the PG_migrate_skip bits should be cleared */
>>         bool                    compact_blockskip_flush;
>> +       bool                    contiguous;
>>   #endif
>>
>>         ZONE_PADDING(_pad3_)
>> diff --git a/mm/compaction.c b/mm/compaction.c
>> index 56fa321..ce60b38 100644
>> --- a/mm/compaction.c
>> +++ b/mm/compaction.c
>> @@ -88,7 +88,7 @@ static inline bool migrate_async_suitable(int
>> migratetype)
>>    * the first and last page of a pageblock and avoid checking each
>> individual
>>    * page in a pageblock.
>>    */
>> -static struct page *pageblock_pfn_to_page(unsigned long start_pfn,
>> +static struct page *__pageblock_pfn_to_page(unsigned long start_pfn,
>>                                 unsigned long end_pfn, struct zone *zone)
>>   {
>>         struct page *start_page;
>> @@ -114,6 +114,51 @@ static struct page *pageblock_pfn_to_page(unsigned
>> long start_pfn,
>>         return start_page;
>>   }
>>
>> +static inline struct page *pageblock_pfn_to_page(unsigned long start_pfn,
>> +                               unsigned long end_pfn, struct zone *zone)
>> +{
>> +       if (zone->contiguous)
>> +               return pfn_to_page(start_pfn);
>> +
>> +       return __pageblock_pfn_to_page(start_pfn, end_pfn, zone);
>> +}
>> +
>> +static void check_zone_contiguous(struct zone *zone)
>> +{
>> +       unsigned long block_start_pfn = zone->zone_start_pfn;
>> +       unsigned long block_end_pfn;
>> +       unsigned long pfn;
>> +
>> +       /* Already initialized if cached pfn is non-zero */
>> +       if (zone->compact_cached_migrate_pfn[0] ||
>> +               zone->compact_cached_free_pfn)
>> +               return;
>> +
>> +       /* Mark that checking is in progress */
>> +       zone->compact_cached_free_pfn = ULONG_MAX;
>> +
>> +       block_end_pfn = ALIGN(block_start_pfn + 1, pageblock_nr_pages);
>> +       for (; block_start_pfn < zone_end_pfn(zone);
>> +               block_start_pfn = block_end_pfn,
>> +               block_end_pfn += pageblock_nr_pages) {
>> +
>> +               block_end_pfn = min(block_end_pfn, zone_end_pfn(zone));
>> +
>> +               if (!__pageblock_pfn_to_page(block_start_pfn,
>> +                                       block_end_pfn, zone))
>> +                       return;
>> +
>> +               /* Check validity of pfn within pageblock */
>> +               for (pfn = block_start_pfn; pfn < block_end_pfn; pfn++) {
>> +                       if (!pfn_valid_within(pfn))
>> +                               return;
>> +               }
>> +       }
>> +
>> +       /* We confirm that there is no hole */
>> +       zone->contiguous = true;
>> +}
>> +
>>   #ifdef CONFIG_COMPACTION
>>
>>   /* Do not skip compaction more than 64 times */
>> @@ -1357,6 +1402,8 @@ static int compact_zone(struct zone *zone, struct
>> compact_control *cc)
>>                 ;
>>         }
>>
>> +       check_zone_contiguous(zone);
>> +
>>         /*
>>          * Clear pageblock skip if there were failures recently and
>> compaction
>>          * is about to be retried after being deferred. kswapd does not do
>>
>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxx.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>