On Tue, Mar 19, 2024 at 12:36:20PM -0300, Jason Gunthorpe wrote: > On Sat, Mar 09, 2024 at 05:14:18PM +0100, Christoph Hellwig wrote: > > On Fri, Mar 08, 2024 at 04:23:42PM -0400, Jason Gunthorpe wrote: > > > > The DMA API callers really need to know what is P2P or not for > > > > various reasons. And they should generally have that information > > > > available, either from pin_user_pages that needs to special case > > > > it or from the in-kernel I/O submitter that build it from P2P and > > > > normal memory. > > > > > > I think that is a BIO thing. RDMA just calls with FOLL_PCI_P2PDMA and > > > shoves the resulting page list into in a scattertable. It never checks > > > if any returned page is P2P - it has no reason to care. dma_map_sg() > > > does all the work. > > > > Right now it does, but that's not really a good interface. If we have > > a pin_user_pages variant that only pins until the next relevant P2P > > boundary and tells you about we can significantly simplify the overall > > interface. > > Sorry for the delay, I was away.. <...> > Can we tweak what Leon has done to keep the hmm_range_fault support > and non-uniformity for RDMA but add a uniformity optimized flow for > BIO? Something like this will do the trick. >From 45e739e7073fb04bc168624f77320130bb3f9267 Mon Sep 17 00:00:00 2001 Message-ID: <45e739e7073fb04bc168624f77320130bb3f9267.1710924764.git.leonro@xxxxxxxxxx> From: Leon Romanovsky <leonro@xxxxxxxxxx> Date: Mon, 18 Mar 2024 11:16:41 +0200 Subject: [PATCH] mm/gup: add strict interface to pin user pages according to FOLL flag All pin_user_pages*() and get_user_pages*() callbacks allocate user pages by partially taking into account their p2p vs. non-p2p properties. In case, user sets FOLL_PCI_P2PDMA flag, the allocated pages will include both p2p and "regular" pages, while if FOLL_PCI_P2PDMA flag is not provided, only regular pages are returned. In order to make sure that with FOLL_PCI_P2PDMA flag, only p2p pages are returned, let's introduce new internal FOLL_STRICT flag and provide special pin_user_pages_fast_strict() API call. Signed-off-by: Leon Romanovsky <leonro@xxxxxxxxxx> --- include/linux/mm.h | 3 +++ mm/gup.c | 36 +++++++++++++++++++++++++++++++++++- mm/internal.h | 4 +++- 3 files changed, 41 insertions(+), 2 deletions(-) diff --git a/include/linux/mm.h b/include/linux/mm.h index f5a97dec5169..910b65dde24a 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -2491,6 +2491,9 @@ int pin_user_pages_fast(unsigned long start, int nr_pages, unsigned int gup_flags, struct page **pages); void folio_add_pin(struct folio *folio); +int pin_user_pages_fast_strict(unsigned long start, int nr_pages, + unsigned int gup_flags, struct page **pages); + int account_locked_vm(struct mm_struct *mm, unsigned long pages, bool inc); int __account_locked_vm(struct mm_struct *mm, unsigned long pages, bool inc, struct task_struct *task, bool bypass_rlim); diff --git a/mm/gup.c b/mm/gup.c index df83182ec72d..11b5c626a4ab 100644 --- a/mm/gup.c +++ b/mm/gup.c @@ -133,6 +133,10 @@ struct folio *try_grab_folio(struct page *page, int refs, unsigned int flags) if (unlikely(!(flags & FOLL_PCI_P2PDMA) && is_pci_p2pdma_page(page))) return NULL; + if (flags & FOLL_STRICT) + if (flags & FOLL_PCI_P2PDMA && !is_pci_p2pdma_page(page)) + return NULL; + if (flags & FOLL_GET) return try_get_folio(page, refs); @@ -232,6 +236,10 @@ int __must_check try_grab_page(struct page *page, unsigned int flags) if (unlikely(!(flags & FOLL_PCI_P2PDMA) && is_pci_p2pdma_page(page))) return -EREMOTEIO; + if (flags & FOLL_STRICT) + if (flags & FOLL_PCI_P2PDMA && !is_pci_p2pdma_page(page)) + return -EREMOTEIO; + if (flags & FOLL_GET) folio_ref_inc(folio); else if (flags & FOLL_PIN) { @@ -2243,6 +2251,8 @@ static bool is_valid_gup_args(struct page **pages, int *locked, * - FOLL_TOUCH/FOLL_PIN/FOLL_TRIED/FOLL_FAST_ONLY are internal only * - FOLL_REMOTE is internal only and used on follow_page() * - FOLL_UNLOCKABLE is internal only and used if locked is !NULL + * - FOLL_STRICT is internal only and used to distinguish between p2p + * and "regular" pages. */ if (WARN_ON_ONCE(gup_flags & INTERNAL_GUP_FLAGS)) return false; @@ -3187,7 +3197,8 @@ static int internal_get_user_pages_fast(unsigned long start, if (WARN_ON_ONCE(gup_flags & ~(FOLL_WRITE | FOLL_LONGTERM | FOLL_FORCE | FOLL_PIN | FOLL_GET | FOLL_FAST_ONLY | FOLL_NOFAULT | - FOLL_PCI_P2PDMA | FOLL_HONOR_NUMA_FAULT))) + FOLL_PCI_P2PDMA | FOLL_HONOR_NUMA_FAULT | + FOLL_STRICT))) return -EINVAL; if (gup_flags & FOLL_PIN) @@ -3322,6 +3333,29 @@ int pin_user_pages_fast(unsigned long start, int nr_pages, } EXPORT_SYMBOL_GPL(pin_user_pages_fast); +/** + * pin_user_pages_fast_strict() - this is pin_user_pages_fast() variant, which + * makes sure that only pages with same properties are pinned. + * + * @start: starting user address + * @nr_pages: number of pages from start to pin + * @gup_flags: flags modifying pin behaviour + * @pages: array that receives pointers to the pages pinned. + * Should be at least nr_pages long. + * + * Nearly the same as pin_user_pages_fastt(), except that FOLL_STRICT is set. + * + * FOLL_STRICT means that the pages are allocated with specific FOLL_* properties. + */ +int pin_user_pages_fast_strict(unsigned long start, int nr_pages, + unsigned int gup_flags, struct page **pages) +{ + if (!is_valid_gup_args(pages, NULL, &gup_flags, FOLL_PIN | FOLL_STRICT)) + return -EINVAL; + return internal_get_user_pages_fast(start, nr_pages, gup_flags, pages); +} +EXPORT_SYMBOL_GPL(pin_user_pages_fast_strict); + /** * pin_user_pages_remote() - pin pages of a remote process * diff --git a/mm/internal.h b/mm/internal.h index f309a010d50f..7578837a0444 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -1031,10 +1031,12 @@ enum { FOLL_FAST_ONLY = 1 << 20, /* allow unlocking the mmap lock */ FOLL_UNLOCKABLE = 1 << 21, + /* don't mix pages with different properties, e.g. p2p with "regular" ones */ + FOLL_STRICT = 1 << 22, }; #define INTERNAL_GUP_FLAGS (FOLL_TOUCH | FOLL_TRIED | FOLL_REMOTE | FOLL_PIN | \ - FOLL_FAST_ONLY | FOLL_UNLOCKABLE) + FOLL_FAST_ONLY | FOLL_UNLOCKABLE | FOLL_STRICT) /* * Indicates for which pages that are write-protected in the page table, -- 2.44.0