Re: [RFC PATCH v1 00/26] KVM: Restricted mapping of guest_memfd at the host and pKVM/arm64 support

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

I have a question regarding memory shared between the host and a protected
guest. I scanned the series, and the pKVM patches this series is based on,
but I couldn't easily find the answer.

When a page is shared, that page is not mapped in the stage 2 tables that
the host maintains for a regular VM (kvm->arch.mmu), right? It wouldn't
make much sense for KVM to maintain its own stage 2 that is never used, but
I thought I should double check that to make sure I'm not missing
something.

Thanks,
Alex

On Thu, Feb 22, 2024 at 04:10:21PM +0000, Fuad Tabba wrote:
> This series adds restricted mmap() support to guest_memfd [1], as
> well as support guest_memfd on pKVM/arm64.
> 
> This series is based on Linux 6.8-rc4 + our pKVM core series [2].
> The KVM core patches apply to Linux 6.8-rc4 (patches 1-6), but
> the remainder (patches 7-26) require the pKVM core series. A git
> repo with this series applied can be found here [3]. We have a
> (WIP) kvmtool port capable of running the code in this series
> [4]. For a technical deep dive into pKVM, please refer to Quentin
> Perret's KVM Forum Presentation [5, 6].
> 
> I've covered some of the issues presented here in my LPC 2023
> presentation [7].
> 
> We haven't started using this in Android yet, but we aim to move
> away from anonymous memory to guest_memfd once we have the
> necessary support merged upstream. Others (e.g., Gunyah [8]) are
> also looking into guest_memfd for similar reasons as us.
> 
> By design, guest_memfd cannot be mapped, read, or written by the
> host userspace. In pKVM, memory shared between a protected guest
> and the host is shared in-place, unlike the other confidential
> computing solutions that guest_memfd was originally envisaged for
> (e.g, TDX). When initializing a guest, as well as when accessing
> memory shared by the guest to the host, it would be useful to
> support mapping that memory at the host to avoid copying its
> contents.
> 
> One of the benefits of guest_memfd is that it prevents a
> misbehaving host process from crashing the system when attempting
> to access (deliberately or accidentally) protected guest memory,
> since this memory isn't mapped to begin with. Without
> guest_memfd, the hypervisor would still prevent such accesses,
> but in certain cases the host kernel wouldn't be able to recover,
> causing the system to crash.
> 
> Support for mmap() in this patch series maintains the invariant
> that only memory shared with the host, either explicitly by the
> guest or implicitly before the guest has started running (in
> order to populate its memory) is allowed to be mapped. At no time
> should private memory be mapped at the host.
> 
> This patch series is divided into two parts:
> 
> The first part is to the KVM core code (patches 1-6), and is
> based on guest_memfd as of Linux 6.8-rc4. It adds opt-in support
> for mapping guest memory only as long as it is shared. For that,
> the host needs to know the sharing status of guest memory.
> Therefore, the series adds a new KVM memory attribute, accessible
> only by the host kernel, that specifies whether the memory is
> allowed to be mapped by the host userspace.
> 
> The second part of the series (patches 7-26) adds guest_memfd
> support for pKVM/arm64, and is based on the latest version of our
> pKVM series [2]. It uses guest_memfd instead of the current
> approach in Android (not upstreamed) of maintaining a long-term
> GUP on anonymous memory donated to the guest. These patches
> handle faulting in guest memory for a guest, as well as handling
> sharing and unsharing of guest memory while maintaining the
> invariant mentioned earlier.
> 
> In addition to general feedback, we would like feedback on how we
> handle mmap() and faulting-in guest pages at the host (KVM: Add
> restricted support for mapping guest_memfd by the host).
> 
> We don't enforce the invariant that only memory shared with the
> host can be mapped by the host userspace in
> file_operations::mmap(), but in vm_operations_struct:fault(). On
> vm_operations_struct::fault(), we check whether the page is
> shared with the host. If not, we deliver a SIGBUS to the current
> task. The reason for enforcing this at fault() is that mmap()
> does not elevate the pagecount(); it's the faulting in of the
> page which does. Even if we were to check at mmap() whether an
> address can be mapped, we would still need to check again on
> fault(), since between mmap() and fault() the status of the page
> can change.
> 
> This creates the situation where access to successfully mmap()'d
> memory might SIGBUS at page fault. There is precedence for
> similar behavior in the kernel I believe, with MADV_HWPOISON and
> the hugetlbfs cgroups controller, which could SIGBUS at page
> fault time depending on the accounting limit.
> 
> Another pKVM specific aspect we would like feedback on, is how to
> handle memory mapped by the host being unshared by a guest. The
> approach we've taken is that on an unshare call from the guest,
> the host userspace is notified that the memory has been unshared,
> in order to allow it to unmap it and mark it as PRIVATE as
> acknowledgment. If the host does not unmap the memory, the
> unshare call issued by the guest fails, which the guest is
> informed about on return.
> 
> Cheers,
> /fuad
> 
> [1] https://lore.kernel.org/all/20231105163040.14904-1-pbonzini@xxxxxxxxxx/
> 
> [2] https://android-kvm.googlesource.com/linux/+/refs/heads/for-upstream/pkvm-core
> 
> [3] https://android-kvm.googlesource.com/linux/+/refs/heads/tabba/guestmem-6.8-rfc-v1
> 
> [4] https://android-kvm.googlesource.com/kvmtool/+/refs/heads/tabba/guestmem-6.8
> 
> [5] Protected KVM on arm64 (slides)
> https://static.sched.com/hosted_files/kvmforum2022/88/KVM%20forum%202022%20-%20pKVM%20deep%20dive.pdf
> 
> [6] Protected KVM on arm64 (video)
> https://www.youtube.com/watch?v=9npebeVFbFw
> 
> [7] Supporting guest private memory in Protected KVM on Android (presentation)
> https://lpc.events/event/17/contributions/1487/
> 
> [8] Drivers for Gunyah (patch series)
> https://lore.kernel.org/all/20240109-gunyah-v16-0-634904bf4ce9@xxxxxxxxxxx/
> 
> Fuad Tabba (20):
>   KVM: Split KVM memory attributes into user and kernel attributes
>   KVM: Introduce kvm_gmem_get_pfn_locked(), which retains the folio lock
>   KVM: Add restricted support for mapping guestmem by the host
>   KVM: Don't allow private attribute to be set if mapped by host
>   KVM: Don't allow private attribute to be removed for unmappable memory
>   KVM: Implement kvm_(read|/write)_guest_page for private memory slots
>   KVM: arm64: Create hypercall return handler
>   KVM: arm64: Refactor code around handling return from host to guest
>   KVM: arm64: Rename kvm_pinned_page to kvm_guest_page
>   KVM: arm64: Add a field to indicate whether the guest page was pinned
>   KVM: arm64: Do not allow changes to private memory slots
>   KVM: arm64: Skip VMA checks for slots without userspace address
>   KVM: arm64: Handle guest_memfd()-backed guest page faults
>   KVM: arm64: Track sharing of memory from protected guest to host
>   KVM: arm64: Mark a protected VM's memory as unmappable at
>     initialization
>   KVM: arm64: Handle unshare on way back to guest entry rather than exit
>   KVM: arm64: Check that host unmaps memory unshared by guest
>   KVM: arm64: Add handlers for kvm_arch_*_set_memory_attributes()
>   KVM: arm64: Enable private memory support when pKVM is enabled
>   KVM: arm64: Enable private memory kconfig for arm64
> 
> Keir Fraser (3):
>   KVM: arm64: Implement MEM_RELINQUISH SMCCC hypercall
>   KVM: arm64: Strictly check page type in MEM_RELINQUISH hypercall
>   KVM: arm64: Avoid unnecessary unmap walk in MEM_RELINQUISH hypercall
> 
> Quentin Perret (1):
>   KVM: arm64: Turn llist of pinned pages into an rb-tree
> 
> Will Deacon (2):
>   KVM: arm64: Add initial support for KVM_CAP_EXIT_HYPERCALL
>   KVM: arm64: Allow userspace to receive SHARE and UNSHARE notifications
> 
>  arch/arm64/include/asm/kvm_host.h             |  17 +-
>  arch/arm64/include/asm/kvm_pkvm.h             |   1 +
>  arch/arm64/kvm/Kconfig                        |   2 +
>  arch/arm64/kvm/arm.c                          |  32 ++-
>  arch/arm64/kvm/hyp/include/nvhe/mem_protect.h |   2 +
>  arch/arm64/kvm/hyp/include/nvhe/pkvm.h        |   1 +
>  arch/arm64/kvm/hyp/nvhe/hyp-main.c            |  24 +-
>  arch/arm64/kvm/hyp/nvhe/mem_protect.c         |  67 +++++
>  arch/arm64/kvm/hyp/nvhe/pkvm.c                |  89 +++++-
>  arch/arm64/kvm/hyp/nvhe/switch.c              |   1 +
>  arch/arm64/kvm/hypercalls.c                   | 117 +++++++-
>  arch/arm64/kvm/mmu.c                          | 138 +++++++++-
>  arch/arm64/kvm/pkvm.c                         |  83 +++++-
>  include/linux/arm-smccc.h                     |   7 +
>  include/linux/kvm_host.h                      |  34 +++
>  include/uapi/linux/kvm.h                      |   4 +
>  virt/kvm/Kconfig                              |   4 +
>  virt/kvm/guest_memfd.c                        |  89 +++++-
>  virt/kvm/kvm_main.c                           | 260 ++++++++++++++++--
>  19 files changed, 904 insertions(+), 68 deletions(-)
> 
> -- 
> 2.44.0.rc1.240.g4c46232300-goog
> 
> 




[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux