Re: [PATCH v1 2/4] mm/gup: Make follow_page() succeed again on PROT_NONE PTEs/PMDs

David Hildenbrand <david@xxxxxxxxxx> · Fri, 28 Jul 2023 12:12:57 +0200

On 28.07.23 11:08, David Hildenbrand wrote:
On 28.07.23 04:30, John Hubbard wrote:
On 7/27/23 14:28, David Hildenbrand wrote:
We accidentally enforced PROT_NONE PTE/PMD permission checks for
follow_page() like we do for get_user_pages() and friends. That was
undesired, because follow_page() is usually only used to lookup a currently
mapped page, not to actually access it. Further, follow_page() does not
actually trigger fault handling, but instead simply fails.

I see that follow_page() is also completely undocumented. And that
reduces us to deducing how it should be used...these things that
change follow_page()'s behavior maybe should have a go at documenting
it too, perhaps.

I can certainly be motivated to do that. :)



Let's restore that behavior by conditionally setting FOLL_FORCE if
FOLL_WRITE is not set. This way, for example KSM and migration code will
no longer fail on PROT_NONE mapped PTEs/PMDS.

Handling this internally doesn't require us to add any new FOLL_FORCE
usage outside of GUP code.

While at it, refuse to accept FOLL_FORCE: we don't even perform VMA
permission checks like in check_vma_flags(), so especially
FOLL_FORCE|FOLL_WRITE would be dodgy.

This issue was identified by code inspection. We'll add some
documentation regarding FOLL_FORCE next.

Reported-by: Peter Xu <peterx@xxxxxxxxxx>
Fixes: 474098edac26 ("mm/gup: replace FOLL_NUMA by gup_can_follow_protnone()")
Cc: <stable@xxxxxxxxxxxxxxx>
Signed-off-by: David Hildenbrand <david@xxxxxxxxxx>
---
    mm/gup.c | 10 +++++++++-
    1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/mm/gup.c b/mm/gup.c
index 2493ffa10f4b..da9a5cc096ac 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -841,9 +841,17 @@ struct page *follow_page(struct vm_area_struct *vma, unsigned long address,
    	if (vma_is_secretmem(vma))
    		return NULL;
    
-	if (WARN_ON_ONCE(foll_flags & FOLL_PIN))
+	if (WARN_ON_ONCE(foll_flags & (FOLL_PIN | FOLL_FORCE)))
    		return NULL;

This is not a super happy situation: follow_page() is now prohibited
(see above: we should document that interface) from passing in
FOLL_FORCE...

I guess you saw my patch #4.

If you take a look at the existing callers (that are fortunately very
limited), you'll see that nobody cares.

Most of the FOLL flags don't make any sense for follow_page(), and
limiting further (ab)use is at least to me very appealing.


    
+	/*
+	 * Traditionally, follow_page() succeeded on PROT_NONE-mapped pages
+	 * but failed follow_page(FOLL_WRITE) on R/O-mapped pages. Let's
+	 * keep these semantics by setting FOLL_FORCE if FOLL_WRITE is not set.
+	 */
+	if (!(foll_flags & FOLL_WRITE))
+		foll_flags |= FOLL_FORCE;
+

...but then we set it anyway, for special cases. It's awkward because
FOLL_FORCE is not an "internal to gup" flag (yet?).

I don't yet have suggestions, other than:

1) Yes, the FOLL_NUMA made things bad.

2) And they are still very confusing, especially the new use of
      FOLL_FORCE.

...I'll try to let this soak in and maybe recommend something
in a more productive way. :)

What I can offer that might be very appealing is the following:

Get rid of the flags parameter for follow_page() *completely*. Yes, then
we can even rename FOLL_ to something reasonable in the context where it
is nowadays used ;)


Internally, we'll then set

FOLL_GET | FOLL_DUMP | FOLL_FORCE

and document exactly what this functions does. Any user that needs
something different should just look into using get_user_pages() instead.

I can prototype that on top of this work easily.

The end result looks something like:

/**
 * follow_page - look up and reference a page descriptor from a user-virtual
 * 		 address
 * @vma: vm_area_struct mapping @address
 * @address: virtual address to look up
 *
 * follow_page() will look up the page mapped at the given address and
 * take a reference on the page. The returned page has to be released using
 * put_page().
 *
 * follow_page() will not return special (like zero) pages and does not check
 * PTE protection: the returned page might be mapped PROT_NONE, R/O or R/W.
 * Consequently, follow_page() will not trigger NUMA hinting faults.
 *
 * follow_page() does not trigger page faults. If no page is mapped, or
 * a special (like zero) page is mapped, it returns %NULL or an error pointer.
 *
 * Note: new users with different requirements are probably better off using
 *       one of the get_user_pages() variants or one of the walk_page_range()
 *       variants.
 *
 * Return: the mapped (struct page *), %NULL if no mapping exists, or
 * an error pointer if there is a mapping to something not represented
 * by a page descriptor (see also vm_normal_page()) or the zero page.
 */
struct page *follow_page(struct vm_area_struct *vma, unsigned long address)
{
	struct follow_page_context ctx = { NULL };
	unsigned long gup_flags;
	struct page *page;

	if (vma_is_secretmem(vma))
		return NULL;

	/*
	 * FOLL_GET: We always want a reference on the returned page.
	 * FOL_DUMP: Ignore special (like zero) pages.
	 * FOLL_FORCE: Succeeded on PROT_NONE-mapped pages.
	 */
	gup_flags = FOLL_GET | FOLL_DUMP | FOLL_FORCE;

	page = follow_page_mask(vma, address, gup_flags, &ctx);
	if (ctx.pgmap)
		put_dev_pagemap(ctx.pgmap);
	return page;
}

--
Cheers,

David / dhildenb