On 9/4/2024 4:11 PM, Matthew Wilcox wrote:
On Wed, Sep 04, 2024 at 09:02:59PM +0100, Matthew Wilcox wrote:
On Wed, Sep 04, 2024 at 12:41:08PM -0700, Steve Sistare wrote:
The folio_try_get in memfd_alloc_folio is not necessary. Delete it, and
delete the matching folio_put in memfd_pin_folios. This also avoids
leaking a ref if the memfd_alloc_folio call to hugetlb_add_to_page_cache
fails, which would otherwise need an additional folio_put. This is a
continuation of the fix
"mm/hugetlb: fix memfd_pin_folios free_huge_pages leak"
I think you're right, but don't we also need to get rid of the
folio_put() call in the 'if (err)' case after calling
hugetlb_add_to_page_cache()?
After scratching my head about this a bit more, I was trying to preserve
the existing semantics of the code, but I think the code was always
buggy.
The correct code would be:
folio = alloc_hugetlb_folio_nodemask(...);
folio_put(folio);
The code as in tree today would trip an assertion:
VM_BUG_ON_FOLIO(folio_ref_count(folio), folio);
as alloc_hugetlb_folio_nodemask() returns a folio with a refcount 1,
folio_try_get() would increment it to 2, folio_put() would decrement
it to 1, and so we'd call free_huge_folio() with a refcount of 1.
But after your patch, the code _is_ still wrong because we'll
start with a refcount of 1, fail to add to the page cache, call
folio_put() which will decrement the refcount to 0 _and call
free_huge_folio() itself_. Then we'll call free_huge_folio()
on an already freed and possibly reallocated folio.
Indeed. The explicit call to free_huge_folio must be deleted, as you
coded below. I'll send a V2 of the patch.
- Steve
So every version suggested so far (current, yours, mine) is wrong,
and the right code looks like:
folio = alloc_hugetlb_folio_nodemask(...);
if (folio) {
err = hugetlb_add_to_page_cache(...);
if (err) {
folio_put(folio);
return ERR_PTR(err);
}
...
Or have I got something wrong in that analysis?
Fixes: 89c1905d9c14 ("mm/gup: introduce memfd_pin_folios() for pinning memfd folios")
Suggested-by: Vivek Kasireddy <vivek.kasireddy@xxxxxxxxx>
Signed-off-by: Steve Sistare <steven.sistare@xxxxxxxxxx>
---
mm/gup.c | 4 +---
mm/memfd.c | 2 +-
2 files changed, 2 insertions(+), 4 deletions(-)
diff --git a/mm/gup.c b/mm/gup.c
index bccabaa..947881ff 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -3618,7 +3618,7 @@ long memfd_pin_folios(struct file *memfd, loff_t start, loff_t end,
pgoff_t start_idx, end_idx, next_idx;
struct folio *folio = NULL;
struct folio_batch fbatch;
- struct hstate *h = NULL;
+ struct hstate *h;
long ret = -EINVAL;
if (start < 0 || start > end || !max_folios)
@@ -3662,8 +3662,6 @@ long memfd_pin_folios(struct file *memfd, loff_t start, loff_t end,
&fbatch);
if (folio) {
folio_put(folio);
- if (h)
- folio_put(folio);
folio = NULL;
}
diff --git a/mm/memfd.c b/mm/memfd.c
index bcb131d..f715301 100644
--- a/mm/memfd.c
+++ b/mm/memfd.c
@@ -89,7 +89,7 @@ struct folio *memfd_alloc_folio(struct file *memfd, pgoff_t idx)
numa_node_id(),
NULL,
gfp_mask);
- if (folio && folio_try_get(folio)) {
+ if (folio) {
err = hugetlb_add_to_page_cache(folio,
memfd->f_mapping,
idx);
--
1.8.3.1