On 27.07.23 20:59, Peter Xu wrote:
On Thu, Jul 27, 2023 at 07:27:02PM +0200, David Hildenbrand wrote:
This was wrong from the very start. If we're not in GUP, we shouldn't call
GUP functions.
My understanding is !GET && !PIN is also called gup.. otherwise we don't
need GET and it can just be always implied.
That's not the point. The point is that _arbitrary_ code shouldn't call into
GUP internal helper functions, where they bypass, for example, any sanity
checks.
What's the sanity checks that you're referring to?
For example in follow_page()
if (vma_is_secretmem(vma))
return NULL;
if (WARN_ON_ONCE(foll_flags & FOLL_PIN))
return NULL;
Maybe you can elaborate why you think we should *not* be using
vm_normal_page_pmd() and instead some arbitrary GUP internal helper? I
don't get it.
The other proof is try_grab_page() doesn't fail hard on !GET && !PIN. So I
don't know whether that's "wrong" to be used..
To me, that is arbitrary code using a GUP internal helper and, therefore,
wrong.
Back to the topic: I'd say either of the patches look good to solve the
problem. If p2pdma pages are mapped as PFNMAP/MIXEDMAP (?), I guess
vm_normal_page_pmd() proposed here will also work on it, so nothing I see
wrong on 2nd one yet.
It looks nicer indeed to not have FOLL_FORCE here, but it also makes me
just wonder whether we should document NUMA behavior for FOLL_* somewhere,
because we have an implication right now on !FOLL_FORCE over NUMA, which is
not obvious to me..
Yes, we probably should. For get_use_pages() and friends that behavior was
always like that and it makes sense: usually it represent application
behavior.
And to look more over that aspect, see follow_page(): previously we can
follow a page for protnone (as it never applies FOLL_NUMA) but now it won't
(it never applies FOLL_FORCE, either, so it seems "accidentally" implies
FOLL_NUMA now). Not sure whether it's intended, though..
That was certainly an oversight, thanks for spotting that. That patch was
not supposed to change semantics:
diff --git a/mm/gup.c b/mm/gup.c
index 76d222ccc3ff..ac926e19ff72 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -851,6 +851,13 @@ struct page *follow_page(struct vm_area_struct *vma,
unsigned long address,
if (WARN_ON_ONCE(foll_flags & FOLL_PIN))
return NULL;
+ /*
+ * In contrast to get_user_pages() and friends, we don't want to
+ * fail if the PTE is PROT_NONE: see gup_can_follow_protnone().
+ */
+ if (!(foll_flags & FOLL_WRITE))
+ foll_flags |= FOLL_FORCE;
+
page = follow_page_mask(vma, address, foll_flags, &ctx);
if (ctx.pgmap)
put_dev_pagemap(ctx.pgmap);
This seems to be slightly against your other solution though for smaps,
where we want to avoid abusing FOLL_FORCE.. isn't it..
This is GUP internal, not some arbitrary code, so to me a *completely*
different discussion.
Why read only? That'll always attach FOLL_FORCE to all follow page call
sites indeed for now, but just curious - logically "I want to fetch the
page even if protnone" is orthogonal to do with write permission here to
me.
Historical these were not the semantics, so I won't change them.
FOLL_FORCE | FOLL_WRITE always had a special taste to it (COW ...).
I still worry about further abuse of FOLL_FORCE, I believe you also worry
that so you proposed the other way for the smaps issue.
Do you think we can just revive FOLL_NUMA? That'll be very clear to me
from that aspect that we do still have valid use cases for it.
FOLL_NUMA naming was nowadays wrong to begin with (not to mention,
confusing a we learned). There are other reasons why we have PROT_NONE
-- mprotect(), for example.
We could have a flag that goes the other way around:
FOLL_IGNORE_PROTNONE ... which surprisingly then ends up being exactly
what FOLL_FORCE means without FOLL_WRITE, and what this patch does.
Does that make sense to you?
The very least is if with above we should really document FOLL_FORCE - we
should mention NUMA effects. But that's ... really confusing. Thinking
about that I personally prefer a revival of FOLL_NUMA, then smaps issue all
go away.
smaps needs to be changed in any case IMHO. And I'm absolutely not in
favor of revicing FOLL_NUMA.
--
Cheers,
David / dhildenb