Re: [PATCH] mm/hugetlb: simplify refs in memfd_alloc_folio

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Sep 04, 2024 at 09:02:59PM +0100, Matthew Wilcox wrote:
> On Wed, Sep 04, 2024 at 12:41:08PM -0700, Steve Sistare wrote:
> > The folio_try_get in memfd_alloc_folio is not necessary.  Delete it, and
> > delete the matching folio_put in memfd_pin_folios.  This also avoids
> > leaking a ref if the memfd_alloc_folio call to hugetlb_add_to_page_cache
> > fails, which would otherwise need an additional folio_put.  This is a
> > continuation of the fix
> >   "mm/hugetlb: fix memfd_pin_folios free_huge_pages leak"
> 
> I think you're right, but don't we also need to get rid of the
> folio_put() call in the 'if (err)' case after calling
> hugetlb_add_to_page_cache()?

After scratching my head about this a bit more, I was trying to preserve
the existing semantics of the code, but I think the code was always
buggy.

The correct code would be:

	folio = alloc_hugetlb_folio_nodemask(...);
	folio_put(folio);

The code as in tree today would trip an assertion:

        VM_BUG_ON_FOLIO(folio_ref_count(folio), folio);

as alloc_hugetlb_folio_nodemask() returns a folio with a refcount 1,
folio_try_get() would increment it to 2, folio_put() would decrement
it to 1, and so we'd call free_huge_folio() with a refcount of 1.

But after your patch, the code _is_ still wrong because we'll
start with a refcount of 1, fail to add to the page cache, call
folio_put() which will decrement the refcount to 0 _and call
free_huge_folio() itself_.  Then we'll call free_huge_folio()
on an already freed and possibly reallocated folio.

So every version suggested so far (current, yours, mine) is wrong,
and the right code looks like:

                folio = alloc_hugetlb_folio_nodemask(...);
		if (folio) {
			err = hugetlb_add_to_page_cache(...);
			if (err) {
				folio_put(folio);
				return ERR_PTR(err);
			}
...

Or have I got something wrong in that analysis?

> > Fixes: 89c1905d9c14 ("mm/gup: introduce memfd_pin_folios() for pinning memfd folios")
> > 
> > Suggested-by: Vivek Kasireddy <vivek.kasireddy@xxxxxxxxx>
> > Signed-off-by: Steve Sistare <steven.sistare@xxxxxxxxxx>
> > ---
> >  mm/gup.c   | 4 +---
> >  mm/memfd.c | 2 +-
> >  2 files changed, 2 insertions(+), 4 deletions(-)
> > 
> > diff --git a/mm/gup.c b/mm/gup.c
> > index bccabaa..947881ff 100644
> > --- a/mm/gup.c
> > +++ b/mm/gup.c
> > @@ -3618,7 +3618,7 @@ long memfd_pin_folios(struct file *memfd, loff_t start, loff_t end,
> >  	pgoff_t start_idx, end_idx, next_idx;
> >  	struct folio *folio = NULL;
> >  	struct folio_batch fbatch;
> > -	struct hstate *h = NULL;
> > +	struct hstate *h;
> >  	long ret = -EINVAL;
> >  
> >  	if (start < 0 || start > end || !max_folios)
> > @@ -3662,8 +3662,6 @@ long memfd_pin_folios(struct file *memfd, loff_t start, loff_t end,
> >  							     &fbatch);
> >  			if (folio) {
> >  				folio_put(folio);
> > -				if (h)
> > -					folio_put(folio);
> >  				folio = NULL;
> >  			}
> >  
> > diff --git a/mm/memfd.c b/mm/memfd.c
> > index bcb131d..f715301 100644
> > --- a/mm/memfd.c
> > +++ b/mm/memfd.c
> > @@ -89,7 +89,7 @@ struct folio *memfd_alloc_folio(struct file *memfd, pgoff_t idx)
> >  						    numa_node_id(),
> >  						    NULL,
> >  						    gfp_mask);
> > -		if (folio && folio_try_get(folio)) {
> > +		if (folio) {
> >  			err = hugetlb_add_to_page_cache(folio,
> >  							memfd->f_mapping,
> >  							idx);
> > -- 
> > 1.8.3.1
> > 
> 




[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux