RE: [PATCH v6 3/5] mm/gup: Introduce memfd_pin_user_pages() for pinning memfd pages (v6)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi David,

> On 05.12.23 06:35, Vivek Kasireddy wrote:
> > For drivers that would like to longterm-pin the pages associated
> > with a memfd, the pin_user_pages_fd() API provides an option to
> > not only pin the pages via FOLL_PIN but also to check and migrate
> > them if they reside in movable zone or CMA block. This API
> > currently works with memfds but it should work with any files
> > that belong to either shmemfs or hugetlbfs. Files belonging to
> > other filesystems are rejected for now.
> >
> > The pages need to be located first before pinning them via FOLL_PIN.
> > If they are found in the page cache, they can be immediately pinned.
> > Otherwise, they need to be allocated using the filesystem specific
> > APIs and then pinned.
> >
> > v2:
> > - Drop gup_flags and improve comments and commit message (David)
> > - Allocate a page if we cannot find in page cache for the hugetlbfs
> >    case as well (David)
> > - Don't unpin pages if there is a migration related failure (David)
> > - Drop the unnecessary nr_pages <= 0 check (Jason)
> > - Have the caller of the API pass in file * instead of fd (Jason)
> >
> > v3: (David)
> > - Enclose the huge page allocation code with #ifdef
> CONFIG_HUGETLB_PAGE
> >    (Build error reported by kernel test robot <lkp@xxxxxxxxx>)
> > - Don't forget memalloc_pin_restore() on non-migration related errors
> > - Improve the readability of the cleanup code associated with
> >    non-migration related errors
> > - Augment the comments by describing FOLL_LONGTERM like behavior
> > - Include the R-b tag from Jason
> >
> > v4:
> > - Remove the local variable "page" and instead use 3 return statements
> >    in alloc_file_page() (David)
> > - Add the R-b tag from David
> >
> > v5: (David)
> > - For hugetlb case, ensure that we only obtain head pages from the
> >    mapping by using __filemap_get_folio() instead of find_get_page_flags()
> > - Handle -EEXIST when two or more potential users try to simultaneously
> >    add a huge page to the mapping by forcing them to retry on failure
> >
> > v6: (Christoph)
> > - Rename this API to memfd_pin_user_pages() to make it clear that it
> >    is intended for memfds
> > - Move the memfd page allocation helper from gup.c to memfd.c
> > - Fix indentation errors in memfd_pin_user_pages()
> > - For contiguous ranges of folios, use a helper such as
> >    filemap_get_folios_contig() to lookup the page cache in batches
> >
> > Cc: David Hildenbrand <david@xxxxxxxxxx>
> > Cc: Christoph Hellwig <hch@xxxxxxxxxxxxx>
> > Cc: Daniel Vetter <daniel.vetter@xxxxxxxx>
> > Cc: Mike Kravetz <mike.kravetz@xxxxxxxxxx>
> > Cc: Hugh Dickins <hughd@xxxxxxxxxx>
> > Cc: Peter Xu <peterx@xxxxxxxxxx>
> > Cc: Gerd Hoffmann <kraxel@xxxxxxxxxx>
> > Cc: Dongwon Kim <dongwon.kim@xxxxxxxxx>
> > Cc: Junxiao Chang <junxiao.chang@xxxxxxxxx>
> > Suggested-by: Jason Gunthorpe <jgg@xxxxxxxxxx>
> > Reviewed-by: Jason Gunthorpe <jgg@xxxxxxxxxx> (v2)
> > Reviewed-by: David Hildenbrand <david@xxxxxxxxxx> (v3)
> > Signed-off-by: Vivek Kasireddy <vivek.kasireddy@xxxxxxxxx>
> > ---
> >   include/linux/memfd.h |   5 +++
> >   include/linux/mm.h    |   2 +
> >   mm/gup.c              | 102 ++++++++++++++++++++++++++++++++++++++++++
> >   mm/memfd.c            |  34 ++++++++++++++
> >   4 files changed, 143 insertions(+)
> >
> > diff --git a/include/linux/memfd.h b/include/linux/memfd.h
> > index e7abf6fa4c52..6fc0d1282151 100644
> > --- a/include/linux/memfd.h
> > +++ b/include/linux/memfd.h
> > @@ -6,11 +6,16 @@
> >
> >   #ifdef CONFIG_MEMFD_CREATE
> >   extern long memfd_fcntl(struct file *file, unsigned int cmd, unsigned int
> arg);
> > +extern struct page *memfd_alloc_page(struct file *memfd, pgoff_t idx);
> >   #else
> >   static inline long memfd_fcntl(struct file *f, unsigned int c, unsigned int a)
> >   {
> >   	return -EINVAL;
> >   }
> > +static inline struct page *memfd_alloc_page(struct file *memfd, pgoff_t
> idx)
> > +{
> > +	return ERR_PTR(-EINVAL);
> > +}
> >   #endif
> >
> >   #endif /* __LINUX_MEMFD_H */
> > diff --git a/include/linux/mm.h b/include/linux/mm.h
> > index 418d26608ece..ac69db45509f 100644
> > --- a/include/linux/mm.h
> > +++ b/include/linux/mm.h
> > @@ -2472,6 +2472,8 @@ long get_user_pages_unlocked(unsigned long
> start, unsigned long nr_pages,
> >   		    struct page **pages, unsigned int gup_flags);
> >   long pin_user_pages_unlocked(unsigned long start, unsigned long
> nr_pages,
> >   		    struct page **pages, unsigned int gup_flags);
> > +long memfd_pin_user_pages(struct file *file, pgoff_t start,
> > +			  unsigned long nr_pages, struct page **pages);
> >
> >   int get_user_pages_fast(unsigned long start, int nr_pages,
> >   			unsigned int gup_flags, struct page **pages);
> > diff --git a/mm/gup.c b/mm/gup.c
> > index 231711efa390..eb93d1ec9dc6 100644
> > --- a/mm/gup.c
> > +++ b/mm/gup.c
> > @@ -5,6 +5,7 @@
> >   #include <linux/spinlock.h>
> >
> >   #include <linux/mm.h>
> > +#include <linux/memfd.h>
> >   #include <linux/memremap.h>
> >   #include <linux/pagemap.h>
> >   #include <linux/rmap.h>
> > @@ -17,6 +18,7 @@
> >   #include <linux/hugetlb.h>
> >   #include <linux/migrate.h>
> >   #include <linux/mm_inline.h>
> > +#include <linux/pagevec.h>
> >   #include <linux/sched/mm.h>
> >   #include <linux/shmem_fs.h>
> >
> > @@ -3410,3 +3412,103 @@ long pin_user_pages_unlocked(unsigned long
> start, unsigned long nr_pages,
> >   				     &locked, gup_flags);
> >   }
> >   EXPORT_SYMBOL(pin_user_pages_unlocked);
> > +
> > +/**
> > + * memfd_pin_user_pages() - pin user pages associated with a memfd
> > + * @memfd:      the memfd whose pages are to be pinned
> > + * @start:      starting memfd offset
> > + * @nr_pages:   number of pages from start to pin
> > + * @pages:      array that receives pointers to the pages pinned.
> > + *              Should be at-least nr_pages long.
> > + *
> > + * Attempt to pin pages associated with a memfd; given that a memfd is
> either
> > + * backed by shmem or hugetlb, the pages can either be found in the page
> cache
> > + * or need to be allocated if necessary. Once the pages are located, they
> are
> > + * all pinned via FOLL_PIN. And, these pinned pages need to be released
> either
> > + * using unpin_user_pages() or unpin_user_page().
> > + *
> > + * It must be noted that the pages may be pinned for an indefinite amount
> > + * of time. And, in most cases, the duration of time they may stay pinned
> > + * would be controlled by the userspace. This behavior is effectively the
> > + * same as using FOLL_LONGTERM with other GUP APIs.
> > + *
> > + * Returns number of pages pinned. This would be equal to the number of
> > + * pages requested. If no pages were pinned, it returns -errno.
> > + */
> > +long memfd_pin_user_pages(struct file *memfd, pgoff_t start,
> > +			  unsigned long nr_pages, struct page **pages)
> > +{
> > +	pgoff_t start_idx, end_idx = start + nr_pages - 1;
> > +	unsigned int flags, nr_folios, i, j;
> > +	struct folio_batch fbatch;
> > +	struct page *page = NULL;
> > +	struct folio *folio;
> > +	long ret;
> > +
> > +	if (!nr_pages)
> > +		return -EINVAL;
> > +
> > +	if (!memfd)
> > +		return -EINVAL;
> > +
> > +	if (!shmem_file(memfd) && !is_file_hugepages(memfd))
> > +		return -EINVAL;
> > +
> > +	flags = memalloc_pin_save();
> > +	do {
> > +		folio_batch_init(&fbatch);
> > +		start_idx = start;
> > +		i = 0;
> > +
> > +		while (start_idx <= end_idx) {
> > +			/*
> > +			 * In most cases, we should be able to find the page
> > +			 * in the page cache. If we cannot find it for some
> > +			 * reason, we try to allocate one and add it to the
> > +			 * page cache.
> > +			 */
> > +			nr_folios = filemap_get_folios_contig(memfd-
> >f_mapping,
> > +							      &start_idx,
> > +							      end_idx,
> > +							      &fbatch);
> > +			if (page) {
> > +				put_page(page);
> > +				page = NULL;
> > +			}
> > +			for (j = 0; j < nr_folios; j++) {
> > +				folio = fbatch.folios[j];
> > +				ret = try_grab_page(&folio->page, FOLL_PIN);
> > +				if (unlikely(ret)) {
> > +					folio_batch_release(&fbatch);
> > +					goto err;
> > +				}
> > +
> > +				pages[i++] = &folio->page;
> > +			}
> 
> I might be wrong, but that interface is still inconsistent. I think your
> intention is to always return folios (head pages), but why are we
> returning pages from this interface then?
> 
> It would be more consistent regarding the other GUP interfaces to return
> the actual tail pages that fit the given "pgoff_t start". So if you
> punch in "nr_pages" you expect to get "nr_pages" pages, and not some
> other number of folios.
> 
> Otherwise, this interface is highly confusing.
> 
> If you always want to return folios, then better name it
> "memfd_pin_user_folios" (or just "memfd_pin_folios") and pass in a range
> (instead of a nr_pages parameter), and somehow indicate to the caller
> how many folio were in that range, and if that range was fully covered.
I think it makes sense to return folios from this interface; and considering my
use-case, I'd like have this API return an error if it cannot pin (or allocate)
the exact number of folios the caller requested. 

> 
> Or am I missing something?
I can make the udmabuf driver use folios instead of pages too but the function
check_and_migrate_movable_pages() in GUP still takes a list of pages. Do you
think it is ok to use a local variable to collect all the head pages for this?

Thanks,
Vivek

> 
> --
> Cheers,
> 
> David / dhildenb
> 





[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux