Re: [PATCH] hwpoison: Fix race with changing page during offlining

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, 26 Jun 2014 15:50:36 -0400 Naoya Horiguchi <n-horiguchi@xxxxxxxxxxxxx> wrote:

> > index 90002ea..e277726a 100644
> > --- a/mm/memory-failure.c
> > +++ b/mm/memory-failure.c
> > @@ -1143,6 +1143,22 @@ int memory_failure(unsigned long pfn, int trapno, int flags)
> >  	lock_page(hpage);
> >  
> >  	/*
> > +	 * The page could have turned into a non LRU page or
> > +	 * changed compound pages during the locking.
> > +	 * If this happens just bail out.
> > +	 */
> > +	if (compound_head(p) != hpage) {
> > +		action_result(pfn, "different compound page after locking", IGNORED);
> > +		res = -EBUSY;
> > +		goto out;
> > +	}
> 
> This is a useful check.
> 
> > +	if (!PageLRU(hpage)) {
> > +		action_result(pfn, "non LRU after locking", IGNORED);
> > +		res = -EBUSY;
> > +		goto out;
> > +	}
> 
> I think this makes sense in v3.14, but maybe redundant if the patch "hwpoison:
> fix the handling path of the victimized page frame that belong to non-LRU"
> from Chen Yucong is merged into mainline (now it's in linux-mmotm).

Andi, can you please check that and test?  If the patch is good I'll
bump it into 3.16 with an enhanced changelog..


From: Chen Yucong <slaoub@xxxxxxxxx>
Subject: hwpoison: fix the handling path of the victimized page frame that belong to non-LRU

Until now, the kernel has the same policy to handle victimized page frames
that belong to kernel-space(reserved/slab-subsystem) or non-LRU(unknown
page state).  In other word, the result of handling either of these
victimized page frames is (IGNORED | FAILED), and the return value of
memory_failure() is -EBUSY.

This patch is to avoid that memory_failure() returns very soon due to the
"true" value of (!PageLRU(p)), and it also ensures that action_result()
can report more precise information("reserved kernel", "kernel slab", and
"unknown page state") instead of "non LRU", especially for memory errors
which are detected by memory-scrubbing.

Signed-off-by: Chen Yucong <slaoub@xxxxxxxxx>
Acked-by: Naoya Horiguchi <n-horiguchi@xxxxxxxxxxxxx>
Signed-off-by: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx>
---

 mm/memory-failure.c |    9 +++++----
 1 file changed, 5 insertions(+), 4 deletions(-)

diff -puN mm/memory-failure.c~hwpoison-fix-the-handling-path-of-the-victimized-page-frame-that-belong-to-non-lur mm/memory-failure.c
--- a/mm/memory-failure.c~hwpoison-fix-the-handling-path-of-the-victimized-page-frame-that-belong-to-non-lur
+++ a/mm/memory-failure.c
@@ -895,7 +895,7 @@ static int hwpoison_user_mappings(struct
 	struct page *hpage = *hpagep;
 	struct page *ppage;
 
-	if (PageReserved(p) || PageSlab(p))
+	if (PageReserved(p) || PageSlab(p) || !PageLRU(p))
 		return SWAP_SUCCESS;
 
 	/*
@@ -1159,9 +1159,6 @@ int memory_failure(unsigned long pfn, in
 					action_result(pfn, "free buddy, 2nd try", DELAYED);
 				return 0;
 			}
-			action_result(pfn, "non LRU", IGNORED);
-			put_page(p);
-			return -EBUSY;
 		}
 	}
 
@@ -1194,6 +1191,9 @@ int memory_failure(unsigned long pfn, in
 		return 0;
 	}
 
+	if (!PageHuge(p) && !PageTransTail(p) && !PageLRU(p))
+		goto identify_page_state;
+
 	/*
 	 * For error on the tail page, we should set PG_hwpoison
 	 * on the head page to show that the hugepage is hwpoisoned
@@ -1243,6 +1243,7 @@ int memory_failure(unsigned long pfn, in
 		goto out;
 	}
 
+identify_page_state:
 	res = -EBUSY;
 	/*
 	 * The first check uses the current page flags which may not have any
_

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxx.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>




[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux]     [Linux OMAP]     [Linux MIPS]     [ECOS]     [Asterisk Internet PBX]     [Linux API]