Re: [PATCH v1 3/7] mm: rename and move page_is_poisoned()

Michal Hocko <mhocko@xxxxxxxx> · Thu, 6 May 2021 09:06:14 +0200



On Thu 06-05-21 08:56:11, Aili Yao wrote:
> On Wed, 5 May 2021 15:27:39 +0200
> Michal Hocko <mhocko@xxxxxxxx> wrote:
> 
> > On Wed 05-05-21 15:17:53, David Hildenbrand wrote:
> > > On 05.05.21 15:13, Michal Hocko wrote:  
> > > > On Thu 29-04-21 14:25:15, David Hildenbrand wrote:  
> > > > > Commit d3378e86d182 ("mm/gup: check page posion status for coredump.")
> > > > > introduced page_is_poisoned(), however, v5 [1] of the patch used
> > > > > "page_is_hwpoison()" and something went wrong while upstreaming. Rename the
> > > > > function and move it to page-flags.h, from where it can be used in other
> > > > > -- kcore -- context.
> > > > > 
> > > > > Move the comment to the place where it belongs and simplify.
> > > > > 
> > > > > [1] https://lkml.kernel.org/r/20210322193318.377c9ce9@alex-virtual-machine
> > > > > 
> > > > > Signed-off-by: David Hildenbrand <david@xxxxxxxxxx>  
> > > > 
> > > > I do agree that being explicit about hwpoison is much better. Poisoned
> > > > page can be also an unitialized one and I believe this is the reason why
> > > > you are bringing that up.  
> > > 
> > > I'm bringing it up because I want to reuse that function as state above :)
> > >   
> > > > 
> > > > But you've made me look at d3378e86d182 and I am wondering whether this
> > > > is really a valid patch. First of all it can leak a reference count
> > > > AFAICS. Moreover it doesn't really fix anything because the page can be
> > > > marked hwpoison right after the check is done. I do not think the race
> > > > is feasible to be closed. So shouldn't we rather revert it?  
> > > 
> > > I am not sure if we really care about races here that much here? I mean,
> > > essentially we are racing with HW breaking asynchronously. Just because we
> > > would be synchronizing with SetPageHWPoison() wouldn't mean we can stop HW
> > > from breaking.  
> > 
> > Right
> > 
> > > Long story short, this should be good enough for the cases we actually can
> > > handle? What am I missing?  
> > 
> > I am not sure I follow. My point is that I fail to see any added value
> > of the check as it doesn't prevent the race (it fundamentally cannot as
> > the page can be poisoned at any time) but the failure path doesn't
> > put_page which is incorrect even for hwpoison pages.
> 
> Sorry, I have something to say:
> 
> I have noticed the ref count leak in the previous topic ,but  I don't think
> it's a really matter. For memory recovery case for user pages, we will keep one
> reference to the poison page so the error page will not be freed to buddy allocator.
> which can be checked in memory_faulure() function.

So what would happen if those pages are hwpoisoned from userspace rather
than by HW. And repeatedly so?
-- 
Michal Hocko
SUSE Labs