On Tue, Mar 19, 2024, Isaku Yamahata wrote: > On Wed, Mar 06, 2024 at 05:51:51PM -0800, > > Yes. We'd like to map exact gpa range for SNP or TDX case. We don't want to map > > zero at around range. For SNP or TDX, we map page to GPA, it's one time > > operation. It updates measurement. > > > > Say, we'd like to populate GPA1 and GPA2 with initial guest memory image. And > > they are within same 2M range. Map GPA1 first. If GPA2 is also mapped with zero > > with 2M page, the following mapping of GPA2 fails. Even if mapping of GPA2 > > succeeds, measurement may be updated when mapping GPA1. > > > > It's user space VMM responsibility to map GPA range only once at most for SNP or > > TDX. Is this too strict requirement for default VM use case to mitigate KVM > > page fault at guest boot up? If so, what about a flag like EXACT_MAPPING or > > something? > > I'm thinking as follows. What do you think? > > - Allow mapping larger than requested with gmem_max_level hook: I don't see any reason to allow userspace to request a mapping level. If the prefetch is defined to have read fault semantics, KVM has all the wiggle room it needs to do the optimal/sane thing, without having to worry reconcile userspace's desired mapping level. > Depend on the following patch. [1] > The gmem_max_level hook allows vendor-backend to determine max level. > By default (for default VM or sw-protected), it allows KVM_MAX_HUGEPAGE_LEVEL > mapping. TDX allows only 4KB mapping. > > [1] https://lore.kernel.org/kvm/20231230172351.574091-31-michael.roth@xxxxxxx/ > [PATCH v11 30/35] KVM: x86: Add gmem hook for determining max NPT mapping level > > - Pure mapping without coco operation: > As Sean suggested at [2], make KVM_MAP_MEMORY pure mapping without coco > operation. In the case of TDX, the API doesn't issue TDX specific operation > like TDH.PAGE.ADD() and TDH.EXTEND.MR(). We need TDX specific API. > > [2] https://lore.kernel.org/kvm/Ze-XW-EbT9vXaagC@xxxxxxxxxx/ > > - KVM_MAP_MEMORY on already mapped area potentially with large page: > It succeeds. Not error. It doesn't care whether the GPA is backed by large > page or not. Because the use case is pre-population before guest running, it > doesn't matter if the given GPA was mapped or not, and what large page level > it backs. > > Do you want error like -EEXIST? No error. As above, I think the ioctl() should behave like a read fault, i.e. be an expensive nop if there's nothing to be done. For VMA-based memory, userspace can operate on the userspace address. E.g. if userspace wants to break CoW, it can do that by writing from userspace. And if userspace wants to "request" a certain mapping level, it can do that by MADV_*. For guest_memfd, there are no protections (everything is RWX, for now), and when hugepage support comes along, userspace can simply manipulate the guest_memfd instance as needed.