Re: [PATCH] mm/memory-failure.c: fix memory leak by race between poison and unpoison

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, May 15, 2014 at 08:23:10AM -0400, Naoya Horiguchi wrote:
> On Thu, May 15, 2014 at 11:34:26AM +0800, cyc wrote:
> > 在 2014-05-14三的 11:21 -0400,Naoya Horiguchi写道:
> > > When a memory error happens on an in-use page or (free and in-use) hugepage,
> > > the victim page is isolated with its refcount set to one. When you try to
> > > unpoison it later, unpoison_memory() calls put_page() for it twice in order to
> > > bring the page back to free page pool (buddy or free hugepage list.)
> > > However, if another memory error occurs on the page which we are unpoisoning,
> > > memory_failure() returns without releasing the refcount which was incremented
> > > in the same call at first, which results in memory leak and unconsistent
> > > num_poisoned_pages statistics. This patch fixes it.
> > 
> > We assume that a new memory error occurs on the hugepage which we are
> > unpoisoning. 
> > 
> >           A   unpoisoned  B    poisoned    C          
> > hugepage: |---------------+++++++++++++++++|
> > 
> > There are two cases, so shown.
> >   1. the victim page belongs to A-B, the memory_failure will be blocked
> > by lock_page() until unlock_page() invoked by unpoison_memory().
> 
> No. memory_failure() set PageHWPoison at first before taking page lock.
> This is a design choice based on the idea that we need detect errors ASAP.

I might have not caught you, sorry. With this patch, we can properly cancel
poisoning operation when it races with unpoisoning, so no effect as you said
for both case.

Thanks,
Naoya


> What happens in this race is like below:
> 
>     CPU 0 (poison)                 CPU 1 (unpoison)
>                                    lock_page
>     TestSetPageHWPoison
>                                    TestClearPageHWPoison
>     lock_page (wait)
>                                    unlock_page
>     check PageHWPoison
>       printk("just unpoisoned")
> 
> >   2. the victim page belongs to B-C, the memory_failure() will return
> > very soon at the beginning of this function.
> 
> Right.
> 
> Thanks,
> Naoya Horiguchi
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@xxxxxxxxx.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>
> 
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxx.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href




[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux]     [Linux OMAP]     [Linux MIPS]     [ECOS]     [Asterisk Internet PBX]     [Linux API]