Re: [PATCH] mm/hwpoison: fix race between soft_offline_page and unpoison_memory

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 8/13/15 4:53 PM, Naoya Horiguchi wrote:
> On Thu, Aug 13, 2015 at 03:09:07PM +0800, Wanpeng Li wrote:
>> [   61.572584] BUG: Bad page state in process bash  pfn:97000
>> [   61.578106] page:ffffea00025c0000 count:0 mapcount:1 mapping:          (null) index:0x7f4fdbe00
>> [   61.586803] flags: 0x1fffff80080048(uptodate|active|swapbacked)
>> [   61.592809] page dumped because: PAGE_FLAGS_CHECK_AT_FREE flag(s) set
>> [   61.599250] bad because of flags:
>> [   61.602567] flags: 0x40(active)
>> [   61.605746] Modules linked in: snd_hda_codec_hdmi i915 rpcsec_gss_krb5 nfsv4 dns_resolver bnep rfcomm nfsd bluetooth auth_rpcgss nfs_acl nfs rfkill lockd grace sunrpc i2c_algo_bit drm_kms_helper snd_hda_codec_realtek snd_hda_codec_generic drm snd_hda_intel fscache snd_hda_codec x86_pkg_temp_thermal coretemp kvm_intel snd_hda_core snd_hwdep kvm snd_pcm snd_seq_dummy snd_seq_oss crct10dif_pclmul snd_seq_midi crc32_pclmul snd_seq_midi_event ghash_clmulni_intel snd_rawmidi aesni_intel lrw gf128mul snd_seq glue_helper ablk_helper snd_seq_device cryptd fuse snd_timer dcdbas serio_raw mei_me parport_pc snd mei ppdev i2c_core video lp soundcore parport lpc_ich shpchp mfd_core ext4 mbcache jbd2 sd_mod e1000e ahci ptp libahci crc32c_intel libata pps_core
>> [   61.605827] CPU: 3 PID: 2211 Comm: bash Not tainted 4.2.0-rc5-mm1+ #45
>> [   61.605829] Hardware name: Dell Inc. OptiPlex 7020/0F5C5X, BIOS A03 01/08/2015
>> [   61.605832]  ffffffff818b3be8 ffff8800da373ad8 ffffffff8165ceb4 0000000001313ce1
>> [   61.605837]  ffffea00025c0000 ffff8800da373b08 ffffffff8117bdd6 ffff88021edd4b00
>> [   61.605842]  0000000000000001 001fffff80080048 0000000000000000 ffff8800da373b88
>> [   61.605847] Call Trace:
>> [   61.605858]  [<ffffffff8165ceb4>] dump_stack+0x48/0x5c
>> [   61.605865]  [<ffffffff8117bdd6>] bad_page+0xe6/0x140
>> [   61.605870]  [<ffffffff8117dfc9>] free_pages_prepare+0x2f9/0x320
>> [   61.605876]  [<ffffffff811e817d>] ? uncharge_list+0xdd/0x100
>> [   61.605882]  [<ffffffff8117ff20>] free_hot_cold_page+0x40/0x170
>> [   61.605888]  [<ffffffff81185dd0>] __put_single_page+0x20/0x30
>> [   61.605892]  [<ffffffff81186675>] put_page+0x25/0x40
>> [   61.605897]  [<ffffffff811dc276>] unmap_and_move+0x1a6/0x1f0
>> [   61.605908]  [<ffffffff811dc3c0>] migrate_pages+0x100/0x1d0
>> [   61.605914]  [<ffffffff811eb710>] ? kill_procs+0x100/0x100
>> [   61.605918]  [<ffffffff811764af>] ? unlock_page+0x6f/0x90
>> [   61.605923]  [<ffffffff811ecf37>] __soft_offline_page+0x127/0x2a0
>> [   61.605928]  [<ffffffff811ed156>] soft_offline_page+0xa6/0x200
>>
>> There is a race window between soft_offline_page() and unpoison_memory():
>>
>> 		CPU0 					CPU1
>>
>> soft_offline_page
>> __soft_offline_page
>> TestSetPageHWPoison   
>> 					unpoison_memory
>> 					PageHWPoison check (true)
>> 					TestClearPageHWPoison
>> 					put_page    -> release refcount held by get_hwpoison_page in unpoison_memory
>> 					put_page    -> release refcount held by isolate_lru_page in __soft_offline_page
>> migrate_pages
>>
>> The second put_page() releases refcount held by isolate_lru_page() which 
>> will lead to unmap_and_move() releases the last refcount of page and w/ 
>> mapcount still 1 since try_to_unmap() is not called if there is only 
>> one user map the page. Anyway, the page refcount and mapcount will 
>> still mess if the page is mapped by multiple users. Commit (4491f712606: 
>> mm/memory-failure: set PageHWPoison before migrate_pages()) is introduced 
>> to avoid to reuse just successful migrated page, however, it also incurs 
>> this race window.
>>
>> Fix it by continue to use migratetype to guarantee the source page which 
>> is successful migration does not reused before PG_hwpoison is set.
>>
>> Signed-off-by: Wanpeng Li <wanpeng.li@xxxxxxxxxxx>
> Thank you for the report, Wanpeng.
>
> I think that unpoison is used only in testing so this race never affects
> our end-users/customers, so going back to this migratetype change stuff
> looks unworthy to me.

Migratetype stuff is just removed by you two months ago, then this bug
occurs recently since the more and more patches which you fix some races.

>
> If I read correctly, the old migratetype approach has a few problems:
>   1) it doesn't fix the problem completely, because
>      set_migratetype_isolate() can fail to set MIGRATE_ISOLATE to the
>      target page if the pageblock of the page contains one or more
>      unmovable pages (i.e. has_unmovable_pages() returns true).
>   2) the original code changes migratetype to MIGRATE_ISOLATE forcibly,
>      and sets it to MIGRATE_MOVABLE forcibly after soft offline, regardless
>      of the original migratetype state, which could impact other subsystems
>      like memory hotplug or compaction.

Maybe we can add a "FIXME" comment on the Migratetype stuff, since the
current linus tree calltrace and it should be fixed immediately, and I
don't see obvious bugs appear on migratetype stuffs at least currently,
so "FIXME" is enough. :-)

>
> So in my opinion, the best option is to fix this within unpoison code,
> but it might be hard because we don't know whether the target page is

Exactly.

> under migration. The second option is to add a race check in the if (reason
> == MR_MEMORY_FAILURE) branch in unmap_and_move(), although this looks ad-hoc

You have already add MR_MEMORY_FAILURE, however, that is not enough to
fix unpoison race.

> and need testing. The third option is to leave it with some "FIXME" comment.

I *prefer* add "FIXME" to migratetype stuffs.

Regards,
Wanpeng Li

>
> Thanks,
> Naoya Horiguchi
>
>> ---
>>  include/linux/page-isolation.h |    5 +++++
>>  mm/memory-failure.c            |   16 ++++++++++++----
>>  mm/migrate.c                   |    3 +--
>>  mm/page_isolation.c            |    4 ++--
>>  4 files changed, 20 insertions(+), 8 deletions(-)
>>
>> diff --git a/include/linux/page-isolation.h b/include/linux/page-isolation.h
>> index 047d647..ff5751e 100644
>> --- a/include/linux/page-isolation.h
>> +++ b/include/linux/page-isolation.h
>> @@ -65,6 +65,11 @@ undo_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn,
>>  int test_pages_isolated(unsigned long start_pfn, unsigned long end_pfn,
>>  			bool skip_hwpoisoned_pages);
>>  
>> +/*
>> + *  Internal functions. Changes pageblock's migrate type.
>> + */
>> +int set_migratetype_isolate(struct page *page, bool skip_hwpoisoned_pages);
>> +void unset_migratetype_isolate(struct page *page, unsigned migratetype);
>>  struct page *alloc_migrate_target(struct page *page, unsigned long private,
>>  				int **resultp);
>>  
>> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
>> index eca613e..0ed3814 100644
>> --- a/mm/memory-failure.c
>> +++ b/mm/memory-failure.c
>> @@ -1647,8 +1647,6 @@ static int __soft_offline_page(struct page *page, int flags)
>>  		inc_zone_page_state(page, NR_ISOLATED_ANON +
>>  					page_is_file_cache(page));
>>  		list_add(&page->lru, &pagelist);
>> -		if (!TestSetPageHWPoison(page))
>> -			atomic_long_inc(&num_poisoned_pages);
>>  		ret = migrate_pages(&pagelist, new_page, NULL, MPOL_MF_MOVE_ALL,
>>  					MIGRATE_SYNC, MR_MEMORY_FAILURE);
>>  		if (ret) {
>> @@ -1663,8 +1661,9 @@ static int __soft_offline_page(struct page *page, int flags)
>>  				pfn, ret, page->flags);
>>  			if (ret > 0)
>>  				ret = -EIO;
>> -			if (TestClearPageHWPoison(page))
>> -				atomic_long_dec(&num_poisoned_pages);
>> +		} else {
>> +			if (!TestSetPageHWPoison(page))
>> +				atomic_long_inc(&num_poisoned_pages);
>>  		}
>>  	} else {
>>  		pr_info("soft offline: %#lx: isolation failed: %d, page count %d, type %lx\n",
>> @@ -1715,6 +1714,14 @@ int soft_offline_page(struct page *page, int flags)
>>  
>>  	get_online_mems();
>>  
>> +	/*
>> +	 * Isolate the page, so that it doesn't get reallocated if it
>> +	 * was free. This flag should be kept set until the source page
>> +	 * is freed and PG_hwpoison on it is set.
>> +	 */
>> +	if (get_pageblock_migratetype(page) != MIGRATE_ISOLATE)
>> +		set_migratetype_isolate(page, false);
>> +
>>  	ret = get_any_page(page, pfn, flags);
>>  	put_online_mems();
>>  	if (ret > 0) { /* for in-use pages */
>> @@ -1733,5 +1740,6 @@ int soft_offline_page(struct page *page, int flags)
>>  				atomic_long_inc(&num_poisoned_pages);
>>  		}
>>  	}
>> +	unset_migratetype_isolate(page, MIGRATE_MOVABLE);
>>  	return ret;
>>  }
>> diff --git a/mm/migrate.c b/mm/migrate.c
>> index 1f8369d..472baf5 100644
>> --- a/mm/migrate.c
>> +++ b/mm/migrate.c
>> @@ -880,8 +880,7 @@ static int __unmap_and_move(struct page *page, struct page *newpage,
>>  	/* Establish migration ptes or remove ptes */
>>  	if (page_mapped(page)) {
>>  		try_to_unmap(page,
>> -			TTU_MIGRATION|TTU_IGNORE_MLOCK|TTU_IGNORE_ACCESS|
>> -			TTU_IGNORE_HWPOISON);
>> +			TTU_MIGRATION|TTU_IGNORE_MLOCK|TTU_IGNORE_ACCESS);
>>  		page_was_mapped = 1;
>>  	}
>>  
>> diff --git a/mm/page_isolation.c b/mm/page_isolation.c
>> index 4568fd5..654662a 100644
>> --- a/mm/page_isolation.c
>> +++ b/mm/page_isolation.c
>> @@ -9,7 +9,7 @@
>>  #include <linux/hugetlb.h>
>>  #include "internal.h"
>>  
>> -static int set_migratetype_isolate(struct page *page,
>> +int set_migratetype_isolate(struct page *page,
>>  				bool skip_hwpoisoned_pages)
>>  {
>>  	struct zone *zone;
>> @@ -73,7 +73,7 @@ out:
>>  	return ret;
>>  }
>>  
>> -static void unset_migratetype_isolate(struct page *page, unsigned migratetype)
>> +void unset_migratetype_isolate(struct page *page, unsigned migratetype)
>>  {
>>  	struct zone *zone;
>>  	unsigned long flags, nr_pages;
>> -- 
>> 1.7.1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxx.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>



[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux]     [Linux OMAP]     [Linux MIPS]     [ECOS]     [Asterisk Internet PBX]     [Linux API]