Re: [RFC PATCH v1 00/26] KVM: Restricted mapping of guest_memfd at the host and pKVM/arm64 support

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]


On Thu, Feb 22, 2024 at 04:10:21PM +0000, Fuad Tabba wrote:
> This series adds restricted mmap() support to guest_memfd [1], as
> well as support guest_memfd on pKVM/arm64.
> We haven't started using this in Android yet, but we aim to move
> away from anonymous memory to guest_memfd once we have the
> necessary support merged upstream. Others (e.g., Gunyah [8]) are
> also looking into guest_memfd for similar reasons as us.

I'm especially interested to see if we can factor out much of the common
implementation bits between KVM and Gunyah. In principle, we're doing
same thing: the difference is the exact mechanics to interact with the
hypervisor which (I think) could be easily extracted into an ops


> In addition to general feedback, we would like feedback on how we
> handle mmap() and faulting-in guest pages at the host (KVM: Add
> restricted support for mapping guest_memfd by the host).
> We don't enforce the invariant that only memory shared with the
> host can be mapped by the host userspace in
> file_operations::mmap(), but in vm_operations_struct:fault(). On
> vm_operations_struct::fault(), we check whether the page is
> shared with the host. If not, we deliver a SIGBUS to the current
> task. The reason for enforcing this at fault() is that mmap()
> does not elevate the pagecount(); it's the faulting in of the
> page which does. Even if we were to check at mmap() whether an
> address can be mapped, we would still need to check again on
> fault(), since between mmap() and fault() the status of the page
> can change.
> This creates the situation where access to successfully mmap()'d
> memory might SIGBUS at page fault. There is precedence for
> similar behavior in the kernel I believe, with MADV_HWPOISON and
> the hugetlbfs cgroups controller, which could SIGBUS at page
> fault time depending on the accounting limit.

I added a test: folio_mmapped() [1] which checks if there's a vma
covering the corresponding offset into the guest_memfd. I use this
test before trying to make page private to guest and I've been able to
ensure that userspace can't even mmap() private guest memory. If I try
to make memory private, I can test that it's not mmapped and not allow
memory to become private. In my testing so far, this is enough to
prevent SIGBUS from happening.

This test probably should be moved outside Gunyah specific code, and was
looking for maintainer to suggest the right home for it :)


> Another pKVM specific aspect we would like feedback on, is how to
> handle memory mapped by the host being unshared by a guest. The
> approach we've taken is that on an unshare call from the guest,
> the host userspace is notified that the memory has been unshared,
> in order to allow it to unmap it and mark it as PRIVATE as
> acknowledgment. If the host does not unmap the memory, the
> unshare call issued by the guest fails, which the guest is
> informed about on return.
> Cheers,
> /fuad
> [1]
> [2]
> [3]
> [4]
> [5] Protected KVM on arm64 (slides)
> [6] Protected KVM on arm64 (video)
> [7] Supporting guest private memory in Protected KVM on Android (presentation)
> [8] Drivers for Gunyah (patch series)

As of 5 min ago when I send this, there's a v17:


[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux