On Thu 06-05-21 08:56:11, Aili Yao wrote: > On Wed, 5 May 2021 15:27:39 +0200 > Michal Hocko <mhocko@xxxxxxxx> wrote: > > > On Wed 05-05-21 15:17:53, David Hildenbrand wrote: > > > On 05.05.21 15:13, Michal Hocko wrote: > > > > On Thu 29-04-21 14:25:15, David Hildenbrand wrote: > > > > > Commit d3378e86d182 ("mm/gup: check page posion status for coredump.") > > > > > introduced page_is_poisoned(), however, v5 [1] of the patch used > > > > > "page_is_hwpoison()" and something went wrong while upstreaming. Rename the > > > > > function and move it to page-flags.h, from where it can be used in other > > > > > -- kcore -- context. > > > > > > > > > > Move the comment to the place where it belongs and simplify. > > > > > > > > > > [1] https://lkml.kernel.org/r/20210322193318.377c9ce9@alex-virtual-machine > > > > > > > > > > Signed-off-by: David Hildenbrand <david@xxxxxxxxxx> > > > > > > > > I do agree that being explicit about hwpoison is much better. Poisoned > > > > page can be also an unitialized one and I believe this is the reason why > > > > you are bringing that up. > > > > > > I'm bringing it up because I want to reuse that function as state above :) > > > > > > > > > > > But you've made me look at d3378e86d182 and I am wondering whether this > > > > is really a valid patch. First of all it can leak a reference count > > > > AFAICS. Moreover it doesn't really fix anything because the page can be > > > > marked hwpoison right after the check is done. I do not think the race > > > > is feasible to be closed. So shouldn't we rather revert it? > > > > > > I am not sure if we really care about races here that much here? I mean, > > > essentially we are racing with HW breaking asynchronously. Just because we > > > would be synchronizing with SetPageHWPoison() wouldn't mean we can stop HW > > > from breaking. > > > > Right > > > > > Long story short, this should be good enough for the cases we actually can > > > handle? What am I missing? > > > > I am not sure I follow. My point is that I fail to see any added value > > of the check as it doesn't prevent the race (it fundamentally cannot as > > the page can be poisoned at any time) but the failure path doesn't > > put_page which is incorrect even for hwpoison pages. > > Sorry, I have something to say: > > I have noticed the ref count leak in the previous topic ,but I don't think > it's a really matter. For memory recovery case for user pages, we will keep one > reference to the poison page so the error page will not be freed to buddy allocator. > which can be checked in memory_faulure() function. So what would happen if those pages are hwpoisoned from userspace rather than by HW. And repeatedly so? -- Michal Hocko SUSE Labs