On 09.08.24 18:54, Peter Xu wrote:
On Fri, Aug 09, 2024 at 06:20:06PM +0200, David Hildenbrand wrote:
On 09.08.24 18:08, Peter Xu wrote:
Pfnmaps can always be identified with special bits in the ptes/pmds/puds.
However that's unnecessary if the vma is stable, and when it's mapped under
VM_PFNMAP | VM_IO.
Instead of adding similar checks in all the levels for huge pfnmaps, let
folio_walk_start() fail even earlier for these mappings. It's also
something gup-slow already does, so make them match.
Cc: David Hildenbrand <david@xxxxxxxxxx>
Signed-off-by: Peter Xu <peterx@xxxxxxxxxx>
---
mm/pagewalk.c | 5 +++++
1 file changed, 5 insertions(+)
diff --git a/mm/pagewalk.c b/mm/pagewalk.c
index cd79fb3b89e5..fd3965efe773 100644
--- a/mm/pagewalk.c
+++ b/mm/pagewalk.c
@@ -727,6 +727,11 @@ struct folio *folio_walk_start(struct folio_walk *fw,
p4d_t *p4dp;
mmap_assert_locked(vma->vm_mm);
+
+ /* It has no folio backing the mappings at all.. */
+ if (vma->vm_flags & (VM_IO | VM_PFNMAP))
+ return NULL;
+
That is in general not what we want, and we still have some places that
wrongly hard-code that behavior.
In a MAP_PRIVATE mapping you might have anon pages that we can happily walk.
vm_normal_page() / vm_normal_page_pmd() [and as commented as a TODO,
vm_normal_page_pud()] should be able to identify PFN maps and reject them,
no?
Yep, I think we can also rely on special bit.
When I was working on this whole series I must confess I am already
confused on the real users of MAP_PRIVATE pfnmaps. E.g. we probably don't
need either PFNMAP for either mprotect/fork/... at least for our use case,
then VM_PRIVATE is even one step further.
Yes, it's rather a corner case indeed.
Here I chose to follow gup-slow, and I suppose you meant that's also wrong?
I assume just nobody really noticed, just like nobody noticed that
walk_page_test() skips VM_PFNMAP (but not VM_IO :) ).
Your process memory stats will likely miss anon folios on COW PFNMAP
mappings ... in the rare cases where they exist (e.g., mmap() of /dev/mem).
If so, would it make sense we keep them aligned as of now, and change them
altogether? Or do you think we should just rely on the special bits?
GUP already refuses to work on a lot of other stuff, so likely not a
good use of time unless somebody complains.
But yes, long-term we should make all code either respect that it could
happen (and bury less awkward checks in page table walkers) or rip
support for MAP_PRIVATE PFNMAP out completely.
And, just curious: is there any use case you're aware of that can benefit
from caring PRIVATE pfnmaps yet so far, especially in this path?
In general MAP_PRIVATE pfnmaps is not really useful on things like MMIO.
There was a discussion (in VM_PAT) some time ago whether we could remove
MAP_PRIVATE PFNMAPs completely [1]. At least some users still use COW
mappings on /dev/mem, although not many (and they might not actually
write to these areas).
I'm happy if someone wants to try ripping that out, I'm not brave enough :)
[1]
https://lkml.kernel.org/r/1f2a8ed4-aaff-4be7-b3b6-63d2841a2908@xxxxxxxxxx
As far as I read, none of folio_walk_start() users so far should even
stumble on top of a pfnmap, share or private. But that's a fairly quick
glimps only.
do_pages_stat()->do_pages_stat_array() should be able to trigger it, if
you pass "nodes=NULL" to move_pages().
Maybe s390x could be tricked into it, but likely as you say, most code
shouldn't trigger it. The function itself should be handling it
correctly as of today, though.
--
Cheers,
David / dhildenb