On Wed, Oct 23, 2013 at 05:01:32PM -0400, kosaki.motohiro@xxxxxxxxx wrote: > From: KOSAKI Motohiro <kosaki.motohiro@xxxxxxxxxxxxxx> > > Yasuaki Ithimatsu reported memory hot-add spent more than 5 _hours_ > on 9TB memory machine and we found out setup_zone_migrate_reserve > spnet >90% time. > > The problem is, setup_zone_migrate_reserve scan all pageblock > unconditionally, but it is only necessary number of reserved block > was reduced (i.e. memory hot remove). > Moreover, maximum MIGRATE_RESERVE per zone are currently 2. It mean, > number of reserved pageblock are almost always unchanged. > > This patch adds zone->nr_migrate_reserve_block to maintain number > of MIGRATE_RESERVE pageblock and it reduce an overhead of > setup_zone_migrate_reserve dramatically. > It seems regrettable to expand the size of struct zone just for this. You are right that the number of blocks does not exceed 2 because of a check made in setup_zone_migrate_reserve so it should be possible to special case this. I didn't test this or think about it particularly carefully and no doubt there is a nicer way but for illustration purposes see the patch below. diff --git a/mm/page_alloc.c b/mm/page_alloc.c index dd886fa..1aedddd 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -3897,6 +3897,8 @@ static int pageblock_is_reserved(unsigned long start_pfn, unsigned long end_pfn) return 0; } +#define MAX_MIGRATE_RESERVE_BLOCKS 2 + /* * Mark a number of pageblocks as MIGRATE_RESERVE. The number * of blocks reserved is based on min_wmark_pages(zone). The memory within @@ -3910,6 +3912,7 @@ static void setup_zone_migrate_reserve(struct zone *zone) struct page *page; unsigned long block_migratetype; int reserve; + int found = 0; /* * Get the start pfn, end pfn and the number of blocks to reserve @@ -3926,11 +3929,11 @@ static void setup_zone_migrate_reserve(struct zone *zone) /* * Reserve blocks are generally in place to help high-order atomic * allocations that are short-lived. A min_free_kbytes value that - * would result in more than 2 reserve blocks for atomic allocations - * is assumed to be in place to help anti-fragmentation for the - * future allocation of hugepages at runtime. + * would result in more than MAX_MIGRATE_RESERVE_BLOCKS reserve blocks + * for atomic allocations is assumed to be in place to help + * anti-fragmentation for the future allocation of hugepages at runtime. */ - reserve = min(2, reserve); + reserve = min(MAX_MIGRATE_RESERVE_BLOCKS, reserve); for (pfn = start_pfn; pfn < end_pfn; pfn += pageblock_nr_pages) { if (!pfn_valid(pfn)) @@ -3956,6 +3959,7 @@ static void setup_zone_migrate_reserve(struct zone *zone) /* If this block is reserved, account for it */ if (block_migratetype == MIGRATE_RESERVE) { reserve--; + found++; continue; } @@ -3970,6 +3974,10 @@ static void setup_zone_migrate_reserve(struct zone *zone) } } + /* If all possible reserve blocks have been found, we're done */ + if (found >= MAX_MIGRATE_RESERVE_BLOCKS) + break; + /* * If the reserve is met and this is a previous reserved block, * take it back -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>