Hi Elliot, On Thu, Feb 22, 2024 at 11:44 PM Elliot Berman <quic_eberman@xxxxxxxxxxx> wrote: > > On Thu, Feb 22, 2024 at 04:10:21PM +0000, Fuad Tabba wrote: > > This series adds restricted mmap() support to guest_memfd [1], as > > well as support guest_memfd on pKVM/arm64. > > > > We haven't started using this in Android yet, but we aim to move > > away from anonymous memory to guest_memfd once we have the > > necessary support merged upstream. Others (e.g., Gunyah [8]) are > > also looking into guest_memfd for similar reasons as us. > > I'm especially interested to see if we can factor out much of the common > implementation bits between KVM and Gunyah. In principle, we're doing > same thing: the difference is the exact mechanics to interact with the > hypervisor which (I think) could be easily extracted into an ops > structure. I agree. We should share and reuse as much code as possible. I'll sync with you before the V2 of this series. > [...] > > > In addition to general feedback, we would like feedback on how we > > handle mmap() and faulting-in guest pages at the host (KVM: Add > > restricted support for mapping guest_memfd by the host). > > > > We don't enforce the invariant that only memory shared with the > > host can be mapped by the host userspace in > > file_operations::mmap(), but in vm_operations_struct:fault(). On > > vm_operations_struct::fault(), we check whether the page is > > shared with the host. If not, we deliver a SIGBUS to the current > > task. The reason for enforcing this at fault() is that mmap() > > does not elevate the pagecount(); it's the faulting in of the > > page which does. Even if we were to check at mmap() whether an > > address can be mapped, we would still need to check again on > > fault(), since between mmap() and fault() the status of the page > > can change. > > > > This creates the situation where access to successfully mmap()'d > > memory might SIGBUS at page fault. There is precedence for > > similar behavior in the kernel I believe, with MADV_HWPOISON and > > the hugetlbfs cgroups controller, which could SIGBUS at page > > fault time depending on the accounting limit. > > I added a test: folio_mmapped() [1] which checks if there's a vma > covering the corresponding offset into the guest_memfd. I use this > test before trying to make page private to guest and I've been able to > ensure that userspace can't even mmap() private guest memory. If I try > to make memory private, I can test that it's not mmapped and not allow > memory to become private. In my testing so far, this is enough to > prevent SIGBUS from happening. > > This test probably should be moved outside Gunyah specific code, and was > looking for maintainer to suggest the right home for it :) > > [1]: https://lore.kernel.org/all/20240222-gunyah-v17-20-1e9da6763d38@xxxxxxxxxxx/ Let's see what the mm-folks think about this [*]. Thanks! /fuad [*] https://lore.kernel.org/all/ZdfoR3nCEP3HTtm1@xxxxxxxxxxxxxxxxxxxx/ > > > > Another pKVM specific aspect we would like feedback on, is how to > > handle memory mapped by the host being unshared by a guest. The > > approach we've taken is that on an unshare call from the guest, > > the host userspace is notified that the memory has been unshared, > > in order to allow it to unmap it and mark it as PRIVATE as > > acknowledgment. If the host does not unmap the memory, the > > unshare call issued by the guest fails, which the guest is > > informed about on return. > > > > Cheers, > > /fuad > > > > [1] https://lore.kernel.org/all/20231105163040.14904-1-pbonzini@xxxxxxxxxx/ > > > > [2] https://android-kvm.googlesource.com/linux/+/refs/heads/for-upstream/pkvm-core > > > > [3] https://android-kvm.googlesource.com/linux/+/refs/heads/tabba/guestmem-6.8-rfc-v1 > > > > [4] https://android-kvm.googlesource.com/kvmtool/+/refs/heads/tabba/guestmem-6.8 > > > > [5] Protected KVM on arm64 (slides) > > https://static.sched.com/hosted_files/kvmforum2022/88/KVM%20forum%202022%20-%20pKVM%20deep%20dive.pdf > > > > [6] Protected KVM on arm64 (video) > > https://www.youtube.com/watch?v=9npebeVFbFw > > > > [7] Supporting guest private memory in Protected KVM on Android (presentation) > > https://lpc.events/event/17/contributions/1487/ > > > > [8] Drivers for Gunyah (patch series) > > https://lore.kernel.org/all/20240109-gunyah-v16-0-634904bf4ce9@xxxxxxxxxxx/ > > As of 5 min ago when I send this, there's a v17: > https://lore.kernel.org/all/20240222-gunyah-v17-0-1e9da6763d38@xxxxxxxxxxx/ > > Thanks, > Elliot >