Re: [PATCH 2/6] mm: add get_pageblock_migratetype_nolock() for cases where locking is undesirable

Joonsoo Kim <iamjoonsoo.kim@xxxxxxx> · Mon, 3 Mar 2014 17:22:27 +0900

On Fri, Feb 28, 2014 at 03:15:00PM +0100, Vlastimil Babka wrote:
> In order to prevent race with set_pageblock_migratetype, most of calls to
> get_pageblock_migratetype have been moved under zone->lock. For the remaining
> call sites, the extra locking is undesirable, notably in free_hot_cold_page().
> 
> This patch introduces a _nolock version to be used on these call sites, where
> a wrong value does not affect correctness. The function makes sure that the
> value does not exceed valid migratetype numbers. Such too-high values are
> assumed to be a result of race and caller-supplied fallback value is returned
> instead.
> 
> Signed-off-by: Vlastimil Babka <vbabka@xxxxxxx>
> ---
>  include/linux/mmzone.h | 24 ++++++++++++++++++++++++
>  mm/compaction.c        | 14 +++++++++++---
>  mm/memory-failure.c    |  3 ++-
>  mm/page_alloc.c        | 22 +++++++++++++++++-----
>  mm/vmstat.c            |  2 +-
>  5 files changed, 55 insertions(+), 10 deletions(-)
> 
> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
> index fac5509..7c3f678 100644
> --- a/include/linux/mmzone.h
> +++ b/include/linux/mmzone.h
> @@ -75,6 +75,30 @@ enum {
>  
>  extern int page_group_by_mobility_disabled;
>  
> +/*
> + * When called without zone->lock held, a race with set_pageblock_migratetype
> + * may result in bogus values. Use this variant only when this does not affect
> + * correctness, and taking zone->lock would be costly. Values >= MIGRATE_TYPES
> + * are considered to be a result of this race and the value of race_fallback
> + * argument is returned instead.
> + */
> +static inline int get_pageblock_migratetype_nolock(struct page *page,
> +	int race_fallback)
> +{
> +	int ret = get_pageblock_flags_group(page, PB_migrate, PB_migrate_end);
> +
> +	if (unlikely(ret >= MIGRATE_TYPES))
> +		ret = race_fallback;
> +
> +	return ret;
> +}

Hello, Vlastimil.

First of all, thanks for nice work!
I have another opinion about this implementation. It can be wrong, so if it
is wrong, please let me know.

Although this implementation would close the race which triggers NULL dereference,
I think that this isn't enough if you have a plan to add more
{start,undo}_isolate_page_range().

Consider that there are lots of {start,undo}_isolate_page_range() calls
on the system without CMA.

bit representation of migratetype is like as following.

MIGRATE_MOVABLE = 010
MIGRATE_ISOLATE = 100

We could read following values as migratetype of the page on movable pageblock
if race occurs.

start_isolate_page_range() case: 010 -> 100
010, 000, 100

undo_isolate_page_range() case: 100 -> 010
100, 110, 010

Above implementation prevents us from getting 110, but, it can't prevent us from
getting 000, that is, MIGRATE_UNMOVABLE. If this race occurs in free_hot_cold_page(),
this page would go into unmovable pcp and then allocated for that migratetype.
It results in more fragmented memory.

Consider another case that system enables CONFIG_CMA,

MIGRATE_MOVABLE = 010
MIGRATE_ISOLATE = 101

start_isolate_page_range() case: 010 -> 101
010, 011, 001, 101

undo_isolate_page_range() case: 101 -> 010
101, 100, 110, 010

This can results in totally different values and this also makes the problem
mentioned above. And, although this doesn't cause any problem on CMA for now,
if another migratetype is introduced or some migratetype is removed, it can cause
CMA typed page to go into other migratetype and makes CMA permanently failed.

To close this kind of races without dependency how many pageblock isolation occurs,
I recommend that you use separate pageblock bits for MIGRATE_CMA, MIGRATE_ISOLATE
and use accessor function whenver we need to check migratetype. IMHO, it may not
impose much overhead.

How about it?

Thanks.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxx.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>