RE: [PATCH v1 1/3] mm/gup: Introduce pin_user_pages_fd() for pinning shmem/hugetlbfs file pages

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi David,

> > For drivers that would like to longterm-pin the pages associated
> > with a file, the pin_user_pages_fd() API provides an option to
> > not only FOLL_PIN the pages but also to check and migrate them
> > if they reside in movable zone or CMA block. For now, this API
> > can only work with files belonging to shmem or hugetlbfs given
> > that the udmabuf driver is the only user.
> 
> Maybe add "Other files are rejected.". Wasn't clear to me before I
> looked into the code.
Ok, will add it in v2.

> 
> >
> > It must be noted that the pages associated with hugetlbfs files
> > are expected to be found in the page cache. An error is returned
> > if they are not found. However, shmem pages can be swapped in or
> > allocated if they are not present in the page cache.
> >
> > Cc: David Hildenbrand <david@xxxxxxxxxx>
> > Cc: Daniel Vetter <daniel.vetter@xxxxxxxx>
> > Cc: Mike Kravetz <mike.kravetz@xxxxxxxxxx>
> > Cc: Hugh Dickins <hughd@xxxxxxxxxx>
> > Cc: Peter Xu <peterx@xxxxxxxxxx>
> > Cc: Gerd Hoffmann <kraxel@xxxxxxxxxx>
> > Cc: Dongwon Kim <dongwon.kim@xxxxxxxxx>
> > Cc: Junxiao Chang <junxiao.chang@xxxxxxxxx>
> > Suggested-by: Jason Gunthorpe <jgg@xxxxxxxxxx>
> > Signed-off-by: Vivek Kasireddy <vivek.kasireddy@xxxxxxxxx>
> > ---
> >   include/linux/mm.h |  2 ++
> >   mm/gup.c           | 87
> ++++++++++++++++++++++++++++++++++++++++++++++
> >   2 files changed, 89 insertions(+)
> >
> > diff --git a/include/linux/mm.h b/include/linux/mm.h
> > index bf5d0b1b16f4..af2121fb8101 100644
> > --- a/include/linux/mm.h
> > +++ b/include/linux/mm.h
> > @@ -2457,6 +2457,8 @@ long get_user_pages_unlocked(unsigned long
> start, unsigned long nr_pages,
> >   		    struct page **pages, unsigned int gup_flags);
> >   long pin_user_pages_unlocked(unsigned long start, unsigned long
> nr_pages,
> >   		    struct page **pages, unsigned int gup_flags);
> > +long pin_user_pages_fd(int fd, pgoff_t start, unsigned long nr_pages,
> > +		       unsigned int gup_flags, struct page **pages);
> >
> >   int get_user_pages_fast(unsigned long start, int nr_pages,
> >   			unsigned int gup_flags, struct page **pages);
> > diff --git a/mm/gup.c b/mm/gup.c
> > index 2f8a2d89fde1..e34b77a15fa8 100644
> > --- a/mm/gup.c
> > +++ b/mm/gup.c
> > @@ -3400,3 +3400,90 @@ long pin_user_pages_unlocked(unsigned long
> start, unsigned long nr_pages,
> >   				     &locked, gup_flags);
> >   }
> >   EXPORT_SYMBOL(pin_user_pages_unlocked);
> > +
> 
> This does look quite neat, nice! Let's take a closer look ...
> 
> > +/**
> > + * pin_user_pages_fd() - pin user pages associated with a file
> > + * @fd:         the fd whose pages are to be pinned
> > + * @start:      starting file offset
> > + * @nr_pages:   number of pages from start to pin
> > + * @gup_flags:  flags modifying pin behaviour
> 
> ^ I assume we should drop that. At least for now the flags are
> completely unused. And most likely we would want a different set of
> flags later (GUPFD_ ...).
Right now, FOLL_LONGTERM is the only accepted value for gup_flags but
yes, as you suggest, this can be made implicit by dropping gup_flags.

> 
> > + * @pages:      array that receives pointers to the pages pinned.
> > + *              Should be at least nr_pages long.
> > + *
> > + * Attempt to pin (and migrate) pages associated with a file belonging to
> 
> I'd drop the "and migrate" part, it's more of an implementation detail.
> 
> > + * either shmem or hugetlbfs. An error is returned if pages associated with
> > + * hugetlbfs files are not present in the page cache. However, shmem
> pages
> > + * are swapped in or allocated if they are not present in the page cache.
> 
> Why don't we do the same for hugetlbfs? Would make the interface more
> streamlined.
I am going off of what Mike has stated previously:
"It may not matter to your users, but the semantics for hugetlb and shmem
pages is different.  hugetlb requires the pages exist in the page cache
while shmem will create/add pages to the cache if necessary."

However, if we were to allocate a hugepage (assuming one is not present in the
page cache at a given index), what needs to be done in addition to calling these APIs?
folio = alloc_hugetlb_folio_nodemask(h, NUMA_NO_NODE, NULL, GFP_USER)
hugetlb_add_to_page_cache(folio, mapping, idx)

> 
> Certainly add that pinned pages have to be released using
> unpin_user_pages().
Sure, will include that in v2.

> 
> > + *
> > + * Returns number of pages pinned. This would be equal to the number of
> > + * pages requested.
> > + * If nr_pages is 0 or negative, returns 0. If no pages were pinned, returns
> > + * -errno.
> > + */
> > +long pin_user_pages_fd(int fd, pgoff_t start, unsigned long nr_pages,
> > +		       unsigned int gup_flags, struct page **pages)
> > +{
> > +	struct page *page;
> > +	struct file *filep;
> > +	unsigned int flags, i;
> > +	long ret;
> > +
> > +	if (nr_pages <= 0)
> > +		return 0;
> 
> I think we should just forbid that and use a WARN_ON_ONCE() here /
> return -EINVAL. So we'll never end up returning 0.
I think I'll drop this check in v2 as Jason suggested.

> 
> > +	if (!is_valid_gup_args(pages, NULL, &gup_flags, FOLL_PIN))
> > +		return 0;
> > +
> > +	if (start < 0)
> > +		return -EINVAL;
> > +
> > +	filep = fget(fd);
> > +	if (!filep)
> > +	    return -EINVAL;
> > +
> > +	if (!shmem_file(filep) && !is_file_hugepages(filep))
> > +	    return -EINVAL;
> > +
> > +	flags = memalloc_pin_save();
> > +	do {
> > +		for (i = 0; i < nr_pages; i++) {
> > +			if (shmem_mapping(filep->f_mapping)) {
> > +				page = shmem_read_mapping_page(filep-
> >f_mapping,
> > +							       start + i);
> > +				if (IS_ERR(page)) {
> > +					ret = PTR_ERR(page);
> > +					goto err;
> > +				}
> > +			} else {
> > +				page = find_get_page_flags(filep->f_mapping,
> > +							   start + i,
> > +							   FGP_ACCESSED);
> > +				if (!page) {
> > +					ret = -EINVAL;
> > +					goto err;
> > +				}
> > +			}
> > +			ret = try_grab_page(page, FOLL_PIN);
> > +			if (unlikely(ret))
> > +				goto err;
> > +
> > +			pages[i] = page;
> > +			put_page(pages[i]);
> > +		}
> > +
> > +		ret = check_and_migrate_movable_pages(nr_pages, pages);
> > +	} while (ret == -EAGAIN);
> > +
> > +err:
> > +	memalloc_pin_restore(flags);
> > +	fput(filep);
> > +	if (!ret)
> > +		return nr_pages;
> > +
> > +	while (i > 0 && pages[--i]) {
> > +		unpin_user_page(pages[i]);
> > +		pages[i] = NULL;
> 
> If migrate_longterm_unpinnable_pages() failed, say with -ENOMEM, the
> pages were already unpinned, but pages[i] has not been cleared, no?
You are right; the above while should not be executed in that case. I added
this chunk to cleanup after any errors thrown in the for loop above. I guess
I need to add a new error label to cleanup after errors thrown by
check_and_migrate_movable_pages().

Thanks,
Vivek
> 
> > +	}
> > +	return ret;
> > +}
> > +EXPORT_SYMBOL_GPL(pin_user_pages_fd);
> > +
> 
> --
> Cheers,
> 
> David / dhildenb





[Index of Archives]     [Linux DRI Users]     [Linux Intel Graphics]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [XFree86]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Linux Kernel]     [Linux SCSI]     [XFree86]
  Powered by Linux