Re: [PATCH 06/19] mm/pagewalk: Check pfnmap early for folio_walk_start()

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 16.08.24 16:21, Peter Xu wrote:
On Fri, Aug 16, 2024 at 11:30:31AM +0200, David Hildenbrand wrote:
On 14.08.24 15:05, Jason Gunthorpe wrote:
On Fri, Aug 09, 2024 at 07:25:36PM +0200, David Hildenbrand wrote:

That is in general not what we want, and we still have some places that
wrongly hard-code that behavior.

In a MAP_PRIVATE mapping you might have anon pages that we can happily walk.

vm_normal_page() / vm_normal_page_pmd() [and as commented as a TODO,
vm_normal_page_pud()] should be able to identify PFN maps and reject them,
no?

Yep, I think we can also rely on special bit.

It is more than just relying on the special bit..

VM_PFNMAP/VM_MIXEDMAP should really only be used inside
vm_normal_page() because thay are, effectively, support for a limited
emulation of the special bit on arches that don't have them. There are
a bunch of weird rules that are used to try and make that work
properly that have to be followed.

On arches with the sepcial bit they should possibly never be checked
since the special bit does everything you need.

Arguably any place reading those flags out side of vm_normal_page/etc
is suspect.

IIUC, your opinion matches mine: VM_PFNMAP/VM_MIXEDMAP and pte_special()/...
usage should be limited to vm_normal_page/vm_normal_page_pmd/ ... of course,
GUP-fast is special (one of the reason for "pte_special()" and friends after
all).

The issue is at least GUP currently doesn't work with pfnmaps, while
there're potentially users who wants to be able to work on both page +
!page use cases.  Besides access_process_vm(), KVM also uses similar thing,
and maybe more; these all seem to be valid use case of reference the vma
flags for PFNMAP and such, so they can identify "it's pfnmap" or more
generic issues like "permission check error on pgtable".

What at least VFIO does is first try GUP, and if that fails, try follow_fault_pfn()->follow_pte(). There is a VM_PFNMAP check in there, yes.

Ideally, follow_pte() would never return refcounted/normal pages, then the PFNMAP check might only be a performance improvement (maybe).


The whole private mapping thing definitely made it complicated.

Yes, and follow_pte() for now could even return ordinary anon pages. I spotted that when I was working on that VM_PAT stuff, but I was to unsure what to do (see below that KVM with MAP_PRIVATE /dev/mem might just work, no idea if there are use cases?).

Fortunately, vfio calls is_invalid_reserved_pfn() and refuses anything that has a struct page.

I think KVM does something nasty: if it something with a "struct page", and it's not PageReserved, it would take a reference (if I get kvm_pfn_to_refcounted_page()) independent if it's a "normal" or "not normal" page -- it essentially ignores the vm_normal_page() information in the page tables ...

So anon pages in pivate mappings from follow_pte() might currently work with KVM ... because of the way KVM uses follow_pte().

I did not play with it, so I'm not sure if I am missing some detail.

--
Cheers,

David / dhildenb





[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux