Re: [PATCH][RFC] mm: compaction: handle incorrect Unmovable type pageblocks

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Apr 23, 2012 at 12:02:55PM +0200, Bartlomiej Zolnierkiewicz wrote:
> From: Bartlomiej Zolnierkiewicz <b.zolnierkie@xxxxxxxxxxx>
> Subject: [PATCH][RFC] mm: compaction: handle incorrect Unmovable type pageblocks
> 
> When Unmovable pages are freed from Unmovable type pageblock
> (and some Movable type pages are left in it) the type of
> the pageblock remains unchanged and therefore the pageblock
> cannot be used as a migration target during compaction.
> 

It does not remain unchanged forever. It can get reset a allocation time
although this depends on detecting that much of the pageblock is free.
This depends on high-order pages being freed which your adverse workload
avoids.

> Fix it by recording Unmovable type pages in a separate bitmap
> (which consumes 128 bytes per 4MiB of memory) and actively
> trying to fix the whole pageblock type during compaction
> (so the previously unsuitable pageblocks can now be used as
> a migration targets).
> 
> [ I also tried using counter for Unmovable pages per pageblock
>   but this approach turned out to be insufficient as we don't
>   always have an information about type of the page that we are
>   freeing. ]
> 

I have not read the patch yet but it seems very heavy-handed to add a
whole new bitmap for this. Based on your estimate it is 1 bit per page in a
pageblock which means that every allocation or free is likely to be updating
this bitmap. On machines with many cores that is potentially a lot of dirty
cache line bouncing and may incur significant overhead.  I'll know for sure
when I see the patch but my initial feeling is that this is a big problem.

> My particular test case (on a ARM EXYNOS4 device with 512 MiB,
> which means 131072 standard 4KiB pages in 'Normal' zone) is to:
> - allocate 120000 pages for kernel's usage
> - free every second page (60000 pages) of memory just allocated
> - allocate and use 60000 pages from user space
> - free remaining 60000 pages of kernel memory
> (now we have fragmented memory occupied mostly by user space pages)
> - try to allocate 100 order-9 (2048 KiB) pages for kernel's usage
> 

Ok, that is indeed an adverse workload that the current system will not
properly deal with. I think you are right to try fixing this but may need
a different approach that takes the cost out of the allocation/free path
and moves it the compaction path.

> The results:
> - with compaction disabled I get 11 successful allocations
> - with compaction enabled - 14 successful allocations
> - with this patch I'm able to get all 100 successful allocations
> 
> Cc: Mel Gorman <mgorman@xxxxxxx>
> Cc: Marek Szyprowski <m.szyprowski@xxxxxxxxxxx>
> Signed-off-by: Bartlomiej Zolnierkiewicz <b.zolnierkie@xxxxxxxxxxx>
> Signed-off-by: Kyungmin Park <kyungmin.park@xxxxxxxxxxx>
> ---
> This patch replaces http://marc.info/?l=linux-mm&m=133364363709346&w=2
> 
>  include/linux/mmzone.h |   10 ++
>  mm/compaction.c        |    3 
>  mm/internal.h          |    1 
>  mm/page_alloc.c        |  128 +++++++++++++++++++++++++++++
>  mm/sparse.c            |  216 +++++++++++++++++++++++++++++++++++++++++++++++--
>  5 files changed, 353 insertions(+), 5 deletions(-)
> 
> Index: b/include/linux/mmzone.h
> ===================================================================
> --- a/include/linux/mmzone.h	2012-04-20 16:35:16.894872193 +0200
> +++ b/include/linux/mmzone.h	2012-04-23 09:55:01.845549009 +0200
> @@ -379,6 +379,10 @@
>  	 * In SPARSEMEM, this map is stored in struct mem_section
>  	 */
>  	unsigned long		*pageblock_flags;
> +
> +#ifdef CONFIG_COMPACTION
> +	unsigned long		*unmovable_map;
> +#endif
>  #endif /* CONFIG_SPARSEMEM */
>  
>  #ifdef CONFIG_COMPACTION
> @@ -1033,6 +1037,12 @@
>  
>  	/* See declaration of similar field in struct zone */
>  	unsigned long *pageblock_flags;
> +
> +#ifdef CONFIG_COMPACTION
> +	unsigned long *unmovable_map;
> +	unsigned long pad0; /* Why this is needed? */
> +#endif
> +

You tell us, you added the padding :)

If I had to guess you are trying to avoid sharing a cache line between
unmovable_map and adjacent fields but I doubt it is necessary.

>  #ifdef CONFIG_CGROUP_MEM_RES_CTLR
>  	/*
>  	 * If !SPARSEMEM, pgdat doesn't have page_cgroup pointer. We use
> Index: b/mm/compaction.c
> ===================================================================
> --- a/mm/compaction.c	2012-04-20 16:35:16.910872188 +0200
> +++ b/mm/compaction.c	2012-04-23 09:33:54.525527592 +0200
> @@ -376,6 +376,9 @@
>  	if (migrate_async_suitable(migratetype))
>  		return true;
>  
> +	if (migratetype == MIGRATE_UNMOVABLE && set_unmovable_movable(page))
> +		return true;
> +

Ok, I have a two suggested changes to this

1. compaction currently has sync and async compaction. I suggest you
   make it a three states called async_partial, async_full and sync.
   async_partial would be the current behaviour. async_full and sync
   would both scan within MIGRATE_UNMOVABLE blocks to see if they
   needed to be changed. This will add a new slower path but the
   common path will be as it is today.

2. You maintain a bitmap of unmovable pages. Get rid of it. Instead have
   set_unmovable_movable scan the pageblock and build a free count based
   on finding PageBuddy pages, page_count(page) == 0 or PageLRU pages.
   If all pages within the block are in one of those three sets, call
   set_pageblock_migratetype(MIGRATE_MOVABLE) and call move_freepages_block()
   I also suggest finding a better name than set_unmovable_movable
   although  I do not have a better suggestion myself right now.

>  	/* Otherwise skip the block */
>  	return false;
>  }
> Index: b/mm/internal.h
> ===================================================================
> --- a/mm/internal.h	2012-04-20 16:35:16.898872189 +0200
> +++ b/mm/internal.h	2012-04-20 16:36:45.566872179 +0200
> @@ -95,6 +95,7 @@
>   * in mm/page_alloc.c
>   */
>  extern void __free_pages_bootmem(struct page *page, unsigned int order);
> +extern bool set_unmovable_movable(struct page *page);
>  extern void prep_compound_page(struct page *page, unsigned long order);
>  #ifdef CONFIG_MEMORY_FAILURE
>  extern bool is_free_buddy_page(struct page *page);
> Index: b/mm/page_alloc.c
> ===================================================================
> --- a/mm/page_alloc.c	2012-04-20 16:36:44.054872175 +0200
> +++ b/mm/page_alloc.c	2012-04-23 09:53:27.861547420 +0200
> @@ -257,6 +257,95 @@
>  					PB_migrate, PB_migrate_end);
>  }
>  
> +#ifdef CONFIG_COMPACTION
> +static inline unsigned long *get_unmovable_bitmap(struct zone *zone,
> +						  unsigned long pfn)
> +{
> +#ifdef CONFIG_SPARSEMEM
> +	return __pfn_to_section(pfn)->unmovable_map;
> +#else
> +	return zone->unmovable_map;
> +#endif /* CONFIG_SPARSEMEM */
> +}
> +
> +static inline int pfn_to_idx(struct zone *zone, unsigned long pfn)
> +{
> +#ifdef CONFIG_SPARSEMEM
> +	pfn &= (PAGES_PER_SECTION-1);
> +	return pfn;
> +#else
> +	pfn = pfn - zone->zone_start_pfn;
> +	return pfn;
> +#endif /* CONFIG_SPARSEMEM */
> +}
> +
> +static void set_unmovable_bitmap(struct page *page, int order)
> +{
> +	struct zone *zone;
> +	unsigned long *map;
> +	unsigned long pfn;
> +	int idx, i;
> +
> +	zone = page_zone(page);
> +	pfn = page_to_pfn(page);
> +	map = get_unmovable_bitmap(zone, pfn);
> +	idx = pfn_to_idx(zone, pfn);
> +
> +	for (i = 0; i < (1 << order); i++)
> +		__set_bit(idx + i, map);
> +}
> +
> +static void clear_unmovable_bitmap(struct page *page, int order)
> +{
> +	struct zone *zone;
> +	unsigned long *map;
> +	unsigned long pfn;
> +	int idx, i;
> +
> +	zone = page_zone(page);
> +	pfn = page_to_pfn(page);
> +	map = get_unmovable_bitmap(zone, pfn);
> +	idx = pfn_to_idx(zone, pfn);
> +
> +	for (i = 0; i < (1 << order); i++)
> +		__clear_bit(idx + i, map);
> +
> +}
> +

This stuff is called from page allocator fast paths which means it will
have a system-wide slowdown. This should be avoided.

> +static int move_freepages_block(struct zone *, struct page *, int);
> +
> +bool set_unmovable_movable(struct page *page)
> +{
> +	struct zone *zone;
> +	unsigned long *map;
> +	unsigned long pfn, start_pfn, end_pfn, t = 0;
> +	int idx, i, step = sizeof(unsigned long) * 8;
> +
> +	zone = page_zone(page);
> +	pfn = page_to_pfn(page);
> +	map = get_unmovable_bitmap(zone, pfn);
> +	idx = pfn_to_idx(zone, pfn);
> +
> +	start_pfn = idx & ~(pageblock_nr_pages - 1);
> +	end_pfn = start_pfn + pageblock_nr_pages - 1;
> +
> +	/* check if pageblock is free of unmovable pages */
> +	for (i = start_pfn; i <= end_pfn; i += step)
> +		t |= map[i / step];
> +
> +	if (!t) {
> +		set_pageblock_migratetype(page, MIGRATE_MOVABLE);
> +		move_freepages_block(zone, page, MIGRATE_MOVABLE);
> +		return true;
> +	}
> +
> +	return false;
> +}

And it can be avoided if this thing scans a pageblock looking for PageBuddy,
page_count==0 and PageLRU pages and changing the pageblock if all the
patches are of that type. Compaction may be slower as a result but that's
better than hurting the page allocator fast paths and it will also remove
an awful lot of complexity.

I did not read much of the rest of the patch because large parts of it
deal with this unmovable_bitmap which I do not think is necessary at
this point.

Thanks.

-- 
Mel Gorman
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxx.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>


[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux]     [Linux OMAP]     [Linux MIPS]     [ECOS]     [Asterisk Internet PBX]     [Linux API]