On 9/19/2024 4:11 AM, David Hildenbrand wrote:
On 18.09.24 16:51, Steven Sistare wrote:
On 9/17/2024 8:25 AM, David Hildenbrand wrote:
On 14.09.24 15:19, Steven Sistare wrote:
cc'ing linux-mm for review of this one patch of the series.
This proposes a new KAPI function repin_folio_unhugely(), for use in this
patch of the iommu_ioas_map_file series:
iommufd: IOMMU_IOAS_MAP_FILE implementation
https://lore.kernel.org/linux-iommu/1726319158-283074-7-git-send-email-steven.sistare@xxxxxxxxxx
- Steve
On 9/14/2024 9:05 AM, Steve Sistare wrote:
Export a function that repins a huge-page folio at small-page granularity.
This allows any range of small pages within the folio to be unpinned later.
For example, pages pinned via memfd_pin_folios and modified by
repin_folio_unhugely could be unpinned via unpin_user_page(s).
Suggested-by: Jason Gunthorpe <jgg@xxxxxxxxxx>
Signed-off-by: Steve Sistare <steven.sistare@xxxxxxxxxx>
---
include/linux/mm.h | 1 +
mm/gup.c | 18 ++++++++++++++++++
2 files changed, 19 insertions(+)
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 1470736..ba8344f 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -2514,6 +2514,7 @@ long pin_user_pages_unlocked(unsigned long start, unsigned long nr_pages,
long memfd_pin_folios(struct file *memfd, loff_t start, loff_t end,
struct folio **folios, unsigned int max_folios,
pgoff_t *offset);
+void repin_folio_unhugely(struct folio *folio, unsigned long npin);
int get_user_pages_fast(unsigned long start, int nr_pages,
unsigned int gup_flags, struct page **pages);
diff --git a/mm/gup.c b/mm/gup.c
index 947881ff..f8f3f2a 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -3720,3 +3720,21 @@ long memfd_pin_folios(struct file *memfd, loff_t start, loff_t end,
return ret;
}
EXPORT_SYMBOL_GPL(memfd_pin_folios);
+
+/**
+ * repin_folio_unhugely() - repin a folio at small page granularity
+ * @folio: the folio to repin
+ * @npin: the number of pages pinned in the folio
+ *
+ * Given a huge page folio that is already pinned, and the number of small
s/huge page folio/large folio/
+ * pages that are pinned in it, adjust the pincount to reflect small-page
+ * granularity. Each small page can later be unpinned individually.
+ */
+void repin_folio_unhugely(struct folio *folio, unsigned long npin)
+{
+ if (!folio_test_large(folio) || is_huge_zero_folio(folio) || npin == 1)
Why not the huge zero folio? That looks very odd here.
The zero page is treated specially here and elsewhere, it can never be deleted so
reference fiddling is skipped.
Please point me in mm/gup.c at that handling.
IIRC is_zero_folio() does *not* include the huge zero page.
Yes, we should likely be special-casing the huge zeropage in mm/gup.c, but it's not that easy because PINs can outlive MMs ... so *not* grabbing a reference could currently be harmful.
But that has do be changed consistently, not with doing things here different compared to other gup.c functions.
folios_put() -> folios_put_refs() -> is_huge_zero_folio()
I will run some tests with huge zero folios to verify the ref and pin
counts behave correctly.
+ return;
+ atomic_add(npin - 1, &folio->_refcount);
+ atomic_add(npin - 1, &folio->_pincount);
+}
+EXPORT_SYMBOL_GPL(repin_folio_unhugely);
Can we ... find a better name? For example, it's "large" folio not "huge"...
And repin is really misleading. We are simply adding more pins to an already pinned one ...
Jason suggests a better name in the other thread.
I would prefer something that simply adds more pins to an already pinned folio. Much easier to get.
How about folio_add_pins()?
- Steve