On Fri, Mar 01, 2024 at 09:28:42AM -0800, isaku.yamahata@xxxxxxxxx wrote: > From: Isaku Yamahata <isaku.yamahata@xxxxxxxxx> > > The objective of this RFC patch series is to develop a uAPI aimed at > (pre)populating guest memory for various use cases and underlying VM > technologies. > > - Pre-populate guest memory to mitigate excessive KVM page faults during guest > boot [1], a need not limited to any specific technology. > > - Pre-populating guest memory (including encryption and measurement) for > confidential guests [2]. SEV-SNP, TDX, and SW-PROTECTED VM. Potentially > other technologies and pKVM. > > The patches are organized as follows. > - 1: documentation on uAPI KVM_MAP_MEMORY. > - 2: archtechture-independent implementation part. > - 3-4: refactoring of x86 KVM MMU as preparation. > - 5: x86 Helper function to map guest page. > - 6: x86 KVM arch implementation. > - 7: Add x86-ops necessary for TDX and SEV-SNP. > - 8: selftest for validation. > > Discussion point: > > uAPI design: > - access flags > Access flags are needed for the guest memory population. We have options for > their exposure to uAPI. > - option 1. Introduce access flags, possibly with the addition of a private > access flag. > - option 2. Omit access flags from UAPI. > Allow the kernel to deduce the necessary flag based on the memory slot and > its memory attributes. > > - SEV-SNP and byte vs. page size > The SEV correspondence is SEV_LAUNCH_UPDATE_DATA. Which dictates memory > regions to be in 16-byte alignment, not page size. Should we define struct > kvm_memory_mapping in bytes rather than page size? For SNP it's multiples of page size only, and I'm not sure it's worth trying to work in legacy SEV support, since that pull in other requirements like how the backing memory also needs to be pre-pinned via a separate KVM ioctl that would need to be documented as an SEV-specific requirement for this interface... it just doesn't seem worth it. And SEV would still benefit from the more basic functionality of pre-mapping pages just like any other guest so it still benefits in that regard. That said, it would be a shame if we needed to clutter up the API as soon as some user can along that required non-page-sized values. That seems unlikely for the pre-mapping use case, but for CoCo maybe it's worth checking if pKVM would have any requirements like that? Or just going with byte-size parameters to play it safe? > > struct kvm_sev_launch_update_data { > __u64 uaddr; > __u32 len; > }; > > - TDX and measurement > The TDX correspondence is TDH.MEM.PAGE.ADD and TDH.MR.EXTEND. TDH.MEM.EXTEND > extends its measurement by the page contents. > Option 1. Add an additional flag like KVM_MEMORY_MAPPING_FLAG_EXTEND to issue > TDH.MEM.EXTEND > Option 2. Don't handle extend. Let TDX vendor specific API > KVM_EMMORY_ENCRYPT_OP to handle it with the subcommand like > KVM_TDX_EXTEND_MEMORY. For SNP this happens unconditionally via SNP_LAUNCH_UPDATE, and with some additional measurements via SNP_LAUNCH_FINISH, and down the road when live migration support is added that flow will be a bit different. So personally I think it's better to leave separate for now. Maybe down the road some common measurement/finalize interface can deprecate some of the other MEMORY_ENCRYPT ioctls. Even with this more narrowly-defined purpose it's really unfortunate we have to bake any CoCo stuff into this interface at all. It would be great if all it did was simply pre-map entries into nested page tables to boost boot performance, and we had some separate CoCo-specific API to handle intial loading/encryption of guest data. I understand with SecureEPT considerations why we sort of need it here for TDX, but it already ends up being awkward for the SNP_LAUNCH_UPDATE use-case because there's not really any good reason for that handling to live inside KVM MMU hooks like with TDX, so we'll probably end up putting it in a pre/post hook where all the handling is completely separate from TDX flow and in the process complicating the userspace API to with the additional parameters needed for that even though other guest types are likely to never need them. (Alternatively we can handle SNP_LAUNCH_UPDATE via KVM MMU hooks like with tdx_mem_page_add(), but then we'll need to plumb additional parameters down into the KVM MMU code for stuff like the SNP page type. But maybe it's worth it to have some level of commonality for x86 at least?) But I'd be hesitant to bake more requirements into this pre-mapping interface, it feels like we're already overloading it as is. > > - TDX and struct kvm_memory_mapping:source > While the current patch series doesn't utilize > kvm_memory_mapping::source member. TDX needs it to specify the source of > memory contents. SNP will need it too FWIW. -Mike > > Implementation: > - x86 KVM MMU > In x86 KVM MMU, I chose to use kvm_mmu_do_page_fault(). It's not confined to > KVM TDP MMU. We can restrict it to KVM TDP MMU and introduce an optimized > version. > > [1] https://lore.kernel.org/all/65262e67-7885-971a-896d-ad9c0a760907@xxxxxxxxx/ > [2] https://lore.kernel.org/all/6a4c029af70d41b63bcee3d6a1f0c2377f6eb4bd.1690322424.git.isaku.yamahata@xxxxxxxxx > > Thanks, > > Isaku Yamahata (8): > KVM: Document KVM_MAP_MEMORY ioctl > KVM: Add KVM_MAP_MEMORY vcpu ioctl to pre-populate guest memory > KVM: x86/mmu: Introduce initialier macro for struct kvm_page_fault > KVM: x86/mmu: Factor out kvm_mmu_do_page_fault() > KVM: x86/mmu: Introduce kvm_mmu_map_page() for prepopulating guest > memory > KVM: x86: Implement kvm_arch_{, pre_}vcpu_map_memory() > KVM: x86: Add hooks in kvm_arch_vcpu_map_memory() > KVM: selftests: x86: Add test for KVM_MAP_MEMORY > > Documentation/virt/kvm/api.rst | 36 +++++ > arch/x86/include/asm/kvm-x86-ops.h | 2 + > arch/x86/include/asm/kvm_host.h | 6 + > arch/x86/kvm/mmu.h | 3 + > arch/x86/kvm/mmu/mmu.c | 30 ++++ > arch/x86/kvm/mmu/mmu_internal.h | 70 +++++---- > arch/x86/kvm/x86.c | 83 +++++++++++ > include/linux/kvm_host.h | 4 + > include/uapi/linux/kvm.h | 15 ++ > tools/include/uapi/linux/kvm.h | 14 ++ > tools/testing/selftests/kvm/Makefile | 1 + > .../selftests/kvm/x86_64/map_memory_test.c | 136 ++++++++++++++++++ > virt/kvm/kvm_main.c | 74 ++++++++++ > 13 files changed, 448 insertions(+), 26 deletions(-) > create mode 100644 tools/testing/selftests/kvm/x86_64/map_memory_test.c > > > base-commit: 6a108bdc49138bcaa4f995ed87681ab9c65122ad > -- > 2.25.1 > >