On Mon, Apr 15, 2024 at 02:17:02PM -0700, Sean Christopherson <seanjc@xxxxxxxxxx> wrote: > > > - Return error on guest mode or SMM mode: Without this patch. > > > Pros: No additional patch. > > > Cons: Difficult to use. > > > > Hmm... For the non-TDX use cases this is just an optimization, right? For TDX > > there shouldn't be an issue. If so, maybe this last one is not so horrible. > > And the fact there are so variables to control (MAXPHADDR, SMM, and guest_mode) > basically invalidates the argument that returning an error makes the ioctl() hard > to use. I can imagine it might be hard to squeeze this ioctl() into QEMU's > existing code, but I don't buy that the ioctl() itself is hard to use. > > Literally the only thing userspace needs to do is set CPUID to implicitly select > between 4-level and 5-level paging. If userspace wants to pre-map memory during > live migration, or when jump-starting the guest with pre-defined state, simply > pre-map memory before stuffing guest state. In and of itself, that doesn't seem > difficult, e.g. at a quick glance, QEMU could add a hook somewhere in > kvm_vcpu_thread_fn() without too much trouble (though that comes with a huge > disclaimer that I only know enough about how QEMU manages vCPUs to be dangerous). > > I would describe the overall cons for this patch versus returning an error > differently. Switching MMU state puts the complexity in the kernel. Returning > an error punts any complexity to userspace. Specifically, anything that KVM can > do regarding vCPU state to get the right MMU, userspace can do too. > > Add on that silently doing things that effectively ignore guest state usually > ends badly, and I don't see a good argument for this patch (or any variant > thereof). Ok, here is a experimental patch on top of the 7/10 to return error. Is this a direction? or do we want to invoke KVM page fault handler without any check? I can see the following options. - Error if vCPU is in SMM mode or guest mode: This patch Defer the decision until the use cases come up. We can utilize KVM_CAP_MAP_MEMORY and struct kvm_map_memory.flags for future enhancement. Pro: Keep room for future enhancement for unclear use cases to defer the decision. Con: The use space VMM has to check/switch the vCPU mode. - No check of vCPU mode and go on Pro: It works. Con: Unclear how the uAPI should be without concrete use cases. - Always populate with L1 GPA: This is a bad idea. --- arch/x86/kvm/x86.c | 32 +++++++++----------------------- 1 file changed, 9 insertions(+), 23 deletions(-) diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 8ba9c1720ac9..2f3ceda5c225 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -5871,10 +5871,8 @@ static int kvm_vcpu_ioctl_enable_cap(struct kvm_vcpu *vcpu, int kvm_arch_vcpu_map_memory(struct kvm_vcpu *vcpu, struct kvm_memory_mapping *mapping) { - struct kvm_mmu *mmu = NULL, *walk_mmu = NULL; u64 end, error_code = 0; u8 level = PG_LEVEL_4K; - bool is_smm; int r; /* @@ -5884,25 +5882,21 @@ int kvm_arch_vcpu_map_memory(struct kvm_vcpu *vcpu, if (!tdp_enabled) return -EOPNOTSUPP; - /* Force to use L1 GPA despite of vcpu MMU mode. */ - is_smm = !!(vcpu->arch.hflags & HF_SMM_MASK); - if (is_smm || - vcpu->arch.mmu != &vcpu->arch.root_mmu || - vcpu->arch.walk_mmu != &vcpu->arch.root_mmu) { - vcpu->arch.hflags &= ~HF_SMM_MASK; - mmu = vcpu->arch.mmu; - walk_mmu = vcpu->arch.walk_mmu; - vcpu->arch.mmu = &vcpu->arch.root_mmu; - vcpu->arch.walk_mmu = &vcpu->arch.root_mmu; - kvm_mmu_reset_context(vcpu); - } + /* + * SMM mode results in populating SMM memory space with memslots id = 1. + * guest mode results in populating with L2 GPA. + * Don't support those cases for now and punt them for the future + * discussion. + */ + if (is_smm(vcpu) || is_guest_mode(vcpu)) + return -EOPNOTSUPP; /* reload is optimized for repeated call. */ kvm_mmu_reload(vcpu); r = kvm_tdp_map_page(vcpu, mapping->base_address, error_code, &level); if (r) - goto out; + return r; /* mapping->base_address is not necessarily aligned to level-hugepage. */ end = (mapping->base_address & KVM_HPAGE_MASK(level)) + @@ -5910,14 +5904,6 @@ int kvm_arch_vcpu_map_memory(struct kvm_vcpu *vcpu, mapping->size -= end - mapping->base_address; mapping->base_address = end; -out: - /* Restore MMU state. */ - if (is_smm || mmu) { - vcpu->arch.hflags |= is_smm ? HF_SMM_MASK : 0; - vcpu->arch.mmu = mmu; - vcpu->arch.walk_mmu = walk_mmu; - kvm_mmu_reset_context(vcpu); - } return r; } -- 2.43.2 -- Isaku Yamahata <isaku.yamahata@xxxxxxxxx>