On 14.08.24 15:05, Jason Gunthorpe wrote:
On Fri, Aug 09, 2024 at 07:25:36PM +0200, David Hildenbrand wrote:
That is in general not what we want, and we still have some places that
wrongly hard-code that behavior.
In a MAP_PRIVATE mapping you might have anon pages that we can happily walk.
vm_normal_page() / vm_normal_page_pmd() [and as commented as a TODO,
vm_normal_page_pud()] should be able to identify PFN maps and reject them,
no?
Yep, I think we can also rely on special bit.
It is more than just relying on the special bit..
VM_PFNMAP/VM_MIXEDMAP should really only be used inside
vm_normal_page() because thay are, effectively, support for a limited
emulation of the special bit on arches that don't have them. There are
a bunch of weird rules that are used to try and make that work
properly that have to be followed.
On arches with the sepcial bit they should possibly never be checked
since the special bit does everything you need.
Arguably any place reading those flags out side of vm_normal_page/etc
is suspect.
IIUC, your opinion matches mine: VM_PFNMAP/VM_MIXEDMAP and
pte_special()/... usage should be limited to
vm_normal_page/vm_normal_page_pmd/ ... of course, GUP-fast is special
(one of the reason for "pte_special()" and friends after all).
Here I chose to follow gup-slow, and I suppose you meant that's also wrong?
I assume just nobody really noticed, just like nobody noticed that
walk_page_test() skips VM_PFNMAP (but not VM_IO :) ).
Like here..
And, just curious: is there any use case you're aware of that can benefit
from caring PRIVATE pfnmaps yet so far, especially in this path?
In general MAP_PRIVATE pfnmaps is not really useful on things like MMIO.
There was a discussion (in VM_PAT) some time ago whether we could remove
MAP_PRIVATE PFNMAPs completely [1]. At least some users still use COW
mappings on /dev/mem, although not many (and they might not actually write
to these areas).
I've squashed many bugs where kernel drivers don't demand userspace
use MAP_SHARED when asking for a PFNMAP, and of course userspace has
gained the wrong flags too. I don't know if anyone needs this, but it
has crept wrongly into the API.
Maybe an interesting place to start is a warning printk about using an
obsolete feature and see where things go from there??
Maybe we should start with some way to pr_warn_ONCE() whenever we get a
COW/unshare-fault in such a MAP_PRIVATE mapping, and essentially
populate the fresh anon folio.
Then we don't only know who mmaps() something like that, but who
actually relies on getting anon folios in there.
--
Cheers,
David / dhildenb