On Wed, Jun 02, 2021 at 05:51:02PM +0000, Sean Christopherson wrote: > > Omitting FOLL_GUEST for shared memory doesn't look like a right approach. > > IIUC, it would require the kernel to track what memory is share and what > > private, which defeat the purpose of the rework. I would rather enforce > > !PageGuest() when share SEPT is populated in addition to enforcing > > PageGuest() fro private SEPT. > > Isn't that what omitting FOLL_GUEST would accomplish? For shared memory, > including mapping memory into the shared EPT, KVM will omit FOLL_GUEST and thus > require the memory to be readable/writable according to the guest access type. Ah. I guess I see what you're saying: we can pipe down the shared bit from GPA from direct_page_fault() (or whatever handles the fault) down to hva_to_pfn_slow() and omit FOLL_GUEST if the shared bit is set. Right? I guest it's doable, but codeshuffling going to be ugly. > By definition, that excludes PageGuest() because PageGuest() pages must always > be unmapped, e.g. PROTNONE. And for private EPT, because PageGuest() is always > PROTNONE or whatever, it will require FOLL_GUEST to retrieve the PTE/PMD/Pxx. > > On a semi-related topic, I don't think can_follow_write_pte() is the correct > place to hook PageGuest(). TDX's S-EPT has a quirk where all private guest > memory must be mapped writable, but that quirk doesn't hold true for non-TDX > guests. It should be legal to map private guest memory as read-only. Hm. The point of the change in can_follow_write_pte() is to only allow to write to a PageGuest() page if FOLL_GUEST is used and the mapping is writable. Without the change gup(FOLL_GUEST|FOLL_WRITE) would fail. It doesn't prevent using read-only guest mappings as read-only. But if you want to write to it it has to writable (in addtion to FOLL_GUEST). > And I believe the below snippet in follow_page_pte() will be problematic > too, since FOLL_NUMA is added unless FOLL_FORCE is set. I suspect the > correct approach is to handle FOLL_GUEST as an exception to > pte_protnone(), though that might require adjusting pte_protnone() to be > meaningful even when CONFIG_NUMA_BALANCING=n. > > if ((flags & FOLL_NUMA) && pte_protnone(pte)) > goto no_page; > if ((flags & FOLL_WRITE) && !can_follow_write_pte(pte, flags)) { > pte_unmap_unlock(ptep, ptl); > return NULL; > } Good catch. I'll look into how to untangle NUMA balancing and PageGuest(). It shouldn't be hard. PageGuest() pages should be subject for balancing. > > Do you see any problems with this? > > > > > Oh, and the other nicety is that I think it would avoid having to explicitly > > > handle PageGuest() memory that is being accessed from kernel/KVM, i.e. if all > > > memory exposed to KVM must be !PageGuest(), then it is also eligible for > > > copy_{to,from}_user(). > > > > copy_{to,from}_user() enforce by setting PTE entries to PROT_NONE. > > But KVM does _not_ want those PTEs PROT_NONE. If KVM is accessing memory that > is also accessible by the the guest, then it must be shared. And if it's shared, > it must also be accessible to host userspace, i.e. something other than PROT_NONE, > otherwise the memory isn't actually shared with anything. > > As above, any guest-accessible memory that is accessed by the host must be > shared, and so must be mapped with the required permissions. I don't see contradiction here: copy_{to,from}_user() would fail with -EFAULT on PROT_NONE PTE. By saying in initial posting that inserting PageGuest() into shared is fine, I didn't mean it's usefule, just allowed. -- Kirill A. Shutemov