Hello all, There is significant architecture change in the latest version to support demand paging. Demand paging allows Linux to provide memory to the guest when it wants -- only when guest tries to access the memory. I wanted to post a version to collect some early feedback. In particular, our implementation of guestmemfd is similar to KVM's guest memfd: I'm curious to get community's thoughts on how (if at all) to refactor both to share the common, generic bits. In particular, the primary difference I see between KVM and Gunyah's implementation is how we determine whether a page can be mapped to userspace, so I think there is strong case here to share at least some of the implementation. Also, I wanted to get any feedback if how folios are tracked for a virtual machine seems correct: folios are initially allocated using the filemap associated with the guest memfd. When guest tries to access them, I use a maple tree to track folios that have been shares: the indices are guest frame number and values are folio pointers. The folio's ->private field is used to note whether the folio is available to be mapped by Linux. Some areas that still need testing/work is multi-threading and memory reclaim while VM is running; the important bits are present in the implementation and design, but not fully tested. The series is capable of booting a Linux virtual machine, so while some areas aren't finished, it is fairly complete. - Gunyah is a Type-1 hypervisor independent of any high-level OS kernel, and runs in a higher CPU privilege level. It does not depend on any lower-privileged OS kernel/code for its core functionality. This increases its security and can support a much smaller trusted computing base than a Type-2 hypervisor. Gunyah is designed for isolated virtual machine use cases and to support launching trusted+isolated virtual machines from a relatively less trusted host virtual machine. Gunyah is an open source hypervisor. The source repo is available at https://github.com/quic/gunyah-hypervisor. The diagram below shows the architecture. :: VM A VM B +-----+ +-----+ | +-----+ +-----+ +-----+ | | | | | | | | | | | EL0 | APP | | APP | | | APP | | APP | | APP | | | | | | | | | | | | +-----+ +-----+ | +-----+ +-----+ +-----+ ---------------------|------------------------- +--------------+ | +----------------------+ | | | | | EL1 | Linux Kernel | | |Linux kernel/Other OS | ... | | | | | +--------------+ | +----------------------+ --------hvc/smc------|------hvc/smc------------ +----------------------------------------+ | | EL2 | Gunyah Hypervisor | | | +----------------------------------------+ Gunyah provides these following features. - Threads and Scheduling: The scheduler schedules virtual CPUs (VCPUs) on physical CPUs and enables time-sharing of the CPUs. - Memory Management: Gunyah tracks memory ownership and use of all memory under its control. Memory partitioning between VMs is a fundamental security feature. - Interrupt Virtualization: All interrupts are handled in the hypervisor and routed to the assigned VM. - Inter-VM Communication: There are several different mechanisms provided for communicating between VMs. - Device Virtualization: Para-virtualization of devices is supported using inter-VM communication. Low level system features and devices such as interrupt controllers are supported with emulation where required. This series adds the basic framework for detecting that Linux is running under Gunyah as a virtual machine, communication with the Gunyah Resource Manager, and a sample virtual machine manager capable of launching virtual machines. Changes in v15: - First implementation of virtual machines backed by guestmemfd and using demand paging to provide memory instead of all up front. - Use message queue hypercalls directly instead of traversing through mailbox framework. Changes in v14: https://lore.kernel.org/all/20230613172054.3959700-1-quic_eberman@xxxxxxxxxxx/ - Coding/cosmetic tweaks suggested by Alex - Mark IRQs as wake-up capable Changes in v13: https://lore.kernel.org/all/20230509204801.2824351-1-quic_eberman@xxxxxxxxxxx/ - Tweaks to message queue driver to address race condition between IRQ and mailbox registration - Allow removal of VM functions by function-specific comparison -- specifically to allow removing irqfd by label only and not requiring original FD to be provided. Changes in v12: https://lore.kernel.org/all/20230424231558.70911-1-quic_eberman@xxxxxxxxxxx/ - Stylistic/cosmetic tweaks suggested by Alex - Remove patch "virt: gunyah: Identify hypervisor version" and squash the check that we're running under a reasonable Gunyah hypervisor into RM driver - Refactor platform hooks into a separate module per suggestion from Srini - GFP_KERNEL_ACCOUNT and account_locked_vm() for page pinning - enum-ify related constants Changes in v11: https://lore.kernel.org/all/20230304010632.2127470-1-quic_eberman@xxxxxxxxxxx/ - Rename struct gh_vm_dtb_config:gpa -> guest_phys_addr & overflow checks for this - More docstrings throughout - Make resp_buf and resp_buf_size optional - Replace deprecated idr with xarray - Refconting on misc device instead of RM's platform device - Renaming variables, structs, etc. from gunyah_ -> gh_ - Drop removal of user mem regions - Drop mem_lend functionality; to converge with restricted_memfd later Changes in v10: https://lore.kernel.org/all/20230214211229.3239350-1-quic_eberman@xxxxxxxxxxx/ - Fix bisectability (end result of series is same, --fixups applied to wrong commits) - Convert GH_ERROR_* and GH_RM_ERROR_* to enums - Correct race condition between allocating/freeing user memory - Replace offsetof with struct_size - Series-wide renaming of functions to be more consistent - VM shutdown & restart support added in vCPU and VM Manager patches - Convert VM function name (string) to type (number) - Convert VM function argument to value (which could be a pointer) to remove memory wastage for arguments - Remove defensive checks of hypervisor correctness - Clean ups to ioeventfd as suggested by Srivatsa Changes in v9: https://lore.kernel.org/all/20230120224627.4053418-1-quic_eberman@xxxxxxxxxxx/ - Refactor Gunyah API flags to be exposed as feature flags at kernel level - Move mbox client cleanup into gunyah_msgq_remove() - Simplify gh_rm_call return value and response payload - Missing clean-up/error handling/little endian fixes as suggested by Srivatsa and Alex in v8 series Changes in v8: https://lore.kernel.org/all/20221219225850.2397345-1-quic_eberman@xxxxxxxxxxx/ - Treat VM manager as a library of RM - Add patches 21-28 as RFC to support proxy-scheduled vCPUs and necessary bits to support virtio from Gunyah userspace Changes in v7: https://lore.kernel.org/all/20221121140009.2353512-1-quic_eberman@xxxxxxxxxxx/ - Refactor to remove gunyah RM bus - Refactor allow multiple RM device instances - Bump UAPI to start at 0x0 - Refactor QCOM SCM's platform hooks to allow CONFIG_QCOM_SCM=Y/CONFIG_GUNYAH=M combinations Changes in v6: https://lore.kernel.org/all/20221026185846.3983888-1-quic_eberman@xxxxxxxxxxx/ - *Replace gunyah-console with gunyah VM Manager* - Move include/asm-generic/gunyah.h into include/linux/gunyah.h - s/gunyah_msgq/gh_msgq/ - Minor tweaks and documentation tidying based on comments from Jiri, Greg, Arnd, Dmitry, and Bagas. Changes in v5 https://lore.kernel.org/all/20221011000840.289033-1-quic_eberman@xxxxxxxxxxx/ - Dropped sysfs nodes - Switch from aux bus to Gunyah RM bus for the subdevices - Cleaning up RM console Changes in v4: https://lore.kernel.org/all/20220928195633.2348848-1-quic_eberman@xxxxxxxxxxx/ - Tidied up documentation throughout based on questions/feedback received - Switched message queue implementation to use mailboxes - Renamed "gunyah_device" as "gunyah_resource" Changes in v3: https://lore.kernel.org/all/20220811214107.1074343-1-quic_eberman@xxxxxxxxxxx/ - /Maintained/Supported/ in MAINTAINERS - Tidied up documentation throughout based on questions/feedback received - Moved hypercalls into arch/arm64/gunyah/; following hyper-v's implementation - Drop opaque typedefs - Move sysfs nodes under /sys/hypervisor/gunyah/ - Moved Gunyah console driver to drivers/tty/ - Reworked gh_device design to drop the Gunyah bus. Changes in v2: https://lore.kernel.org/all/20220801211240.597859-1-quic_eberman@xxxxxxxxxxx/ - DT bindings clean up - Switch hypercalls to follow SMCCC v1: https://lore.kernel.org/all/20220223233729.1571114-1-quic_eberman@xxxxxxxxxxx/ Signed-off-by: Elliot Berman <quic_eberman@xxxxxxxxxxx> --- Elliot Berman (30): docs: gunyah: Introduce Gunyah Hypervisor dt-bindings: Add binding for gunyah hypervisor gunyah: Common types and error codes for Gunyah hypercalls virt: gunyah: Add hypercalls to identify Gunyah virt: gunyah: Add hypervisor driver virt: gunyah: msgq: Add hypercalls to send and receive messages gunyah: rsc_mgr: Add resource manager RPC core gunyah: rsc_mgr: Add VM lifecycle RPC gunyah: vm_mgr: Introduce basic VM Manager gunyah: vm_mgr: Add ioctls to support basic non-proxy VM boot gunyah: vm_mgr: Add framework for VM Functions virt: gunyah: Translate gh_rm_hyp_resource into gunyah_resource virt: gunyah: Add resource tickets virt: gunyah: Add IO handlers gunyah: Add hypercalls for demand paging virt: gunyah: Add interfaces to map memory into guest address space gunyah: rsc_mgr: Add platform ops on mem_lend/mem_reclaim virt: gunyah: Add IO handlers virt: gunyah: Add proxy-scheduled vCPUs virt: gunyah: Implement guestmemfd virt: gunyah: Add ioctl to bind guestmem to VMs virt: gunyah: guestmem: Initialize RM mem parcels from guestmem virt: gunyah: Allow userspace to initialize context of primary vCPU virt: gunyah: Share guest VM dtb configuration to Gunyah virt: gunyah: Enable demand paging virt: gunyah: Add Qualcomm Gunyah platform ops virt: gunyah: Add hypercalls for sending doorbell virt: gunyah: Add irqfd interface virt: gunyah: Add ioeventfd MAINTAINERS: Add Gunyah hypervisor drivers section .../bindings/firmware/gunyah-hypervisor.yaml | 82 ++ Documentation/userspace-api/ioctl/ioctl-number.rst | 1 + Documentation/virt/gunyah/index.rst | 121 +++ Documentation/virt/gunyah/message-queue.rst | 69 ++ Documentation/virt/index.rst | 1 + MAINTAINERS | 13 + arch/arm64/Kbuild | 1 + arch/arm64/gunyah/Makefile | 3 + arch/arm64/gunyah/gunyah_hypercall.c | 209 +++++ arch/arm64/include/asm/gunyah.h | 57 ++ drivers/virt/Kconfig | 2 + drivers/virt/Makefile | 1 + drivers/virt/gunyah/Kconfig | 47 + drivers/virt/gunyah/Makefile | 9 + drivers/virt/gunyah/guest_memfd.c | 826 ++++++++++++++++++ drivers/virt/gunyah/gunyah.c | 52 ++ drivers/virt/gunyah/gunyah_ioeventfd.c | 132 +++ drivers/virt/gunyah/gunyah_irqfd.c | 191 ++++ drivers/virt/gunyah/gunyah_platform_hooks.c | 115 +++ drivers/virt/gunyah/gunyah_qcom.c | 218 +++++ drivers/virt/gunyah/gunyah_vcpu.c | 579 +++++++++++++ drivers/virt/gunyah/rsc_mgr.c | 948 ++++++++++++++++++++ drivers/virt/gunyah/rsc_mgr.h | 28 + drivers/virt/gunyah/rsc_mgr_rpc.c | 584 +++++++++++++ drivers/virt/gunyah/vm_mgr.c | 963 +++++++++++++++++++++ drivers/virt/gunyah/vm_mgr.h | 104 +++ drivers/virt/gunyah/vm_mgr_mem.c | 326 +++++++ include/linux/gunyah.h | 250 ++++++ include/linux/gunyah_rsc_mgr.h | 208 +++++ include/linux/gunyah_vm_mgr.h | 165 ++++ include/uapi/linux/gunyah.h | 378 ++++++++ 31 files changed, 6683 insertions(+) --- base-commit: 17cb8a20bde66a520a2ca7aad1063e1ce7382240 change-id: 20231208-gunyah-952aca7668e0 Best regards, -- Elliot Berman <quic_eberman@xxxxxxxxxxx>