On Thu, May 15, 2014 at 11:34:26AM +0800, cyc wrote: > 在 2014-05-14三的 11:21 -0400,Naoya Horiguchi写道: > > When a memory error happens on an in-use page or (free and in-use) hugepage, > > the victim page is isolated with its refcount set to one. When you try to > > unpoison it later, unpoison_memory() calls put_page() for it twice in order to > > bring the page back to free page pool (buddy or free hugepage list.) > > However, if another memory error occurs on the page which we are unpoisoning, > > memory_failure() returns without releasing the refcount which was incremented > > in the same call at first, which results in memory leak and unconsistent > > num_poisoned_pages statistics. This patch fixes it. > > We assume that a new memory error occurs on the hugepage which we are > unpoisoning. > > A unpoisoned B poisoned C > hugepage: |---------------+++++++++++++++++| > > There are two cases, so shown. > 1. the victim page belongs to A-B, the memory_failure will be blocked > by lock_page() until unlock_page() invoked by unpoison_memory(). No. memory_failure() set PageHWPoison at first before taking page lock. This is a design choice based on the idea that we need detect errors ASAP. What happens in this race is like below: CPU 0 (poison) CPU 1 (unpoison) lock_page TestSetPageHWPoison TestClearPageHWPoison lock_page (wait) unlock_page check PageHWPoison printk("just unpoisoned") > 2. the victim page belongs to B-C, the memory_failure() will return > very soon at the beginning of this function. Right. Thanks, Naoya Horiguchi -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href