On Mon, Mar 25, 2024 at 03:18:36PM -0700, Isaku Yamahata <isaku.yamahata@xxxxxxxxx> wrote: > On Mon, Mar 25, 2024 at 07:55:04PM +0000, > "Edgecombe, Rick P" <rick.p.edgecombe@xxxxxxxxx> wrote: > > > On Mon, 2024-03-25 at 12:05 -0700, Isaku Yamahata wrote: > > > Right, the guest has to accept it on VE. If the unmap was intentional by guest, > > > that's fine. The unmap is unintentional (with vMTRR), the guest doesn't expect > > > VE with the GPA. > > > > > > > > > > But, I guess we should punt to userspace is the guest tries to use > > > > MTRRs, not that userspace can handle it happening in a TD... But it > > > > seems cleaner and safer then skipping zapping some pages inside the > > > > zapping code. > > > > > > > > I'm still not sure if I understand the intention and constraints fully. > > > > So please correct. This (the skipping the zapping for some operations) > > > > is a theoretical correctness issue right? It doesn't resolve a TD > > > > crash? > > > > > > For lapic, it's safe guard. Because TDX KVM disables APICv with > > > APICV_INHIBIT_REASON_TDX, apicv won't call kvm_zap_gfn_range(). > > Ah, I see it: > > https://lore.kernel.org/lkml/38e2f8a77e89301534d82325946eb74db3e47815.1708933498.git.isaku.yamahata@xxxxxxxxx/ > > > > Then it seems a warning would be more appropriate if we are worried there might be a way to still > > call it. If we are confident it can't, then we can just ignore this case. > > > > > > > > For MTRR, the purpose is to make the guest boot (without the guest kernel > > > command line like clearcpuid=mtrr) . > > > If we can assume the guest won't touch MTRR registers somehow, KVM can return an > > > error to TDG.VP.VMCALL<RDMSR, WRMSR>(MTRR registers). So it doesn't call > > > kvm_zap_gfn_range(). Or we can use KVM_EXIT_X86_{RDMSR, WRMSR} as you suggested. > > > > My understanding is that Sean prefers to exit to userspace when KVM can't handle something, versus > > making up behavior that keeps known guests alive. So I would think we should change this patch to > > only be about not using the zapping roots optimization. Then a separate patch should exit to > > userspace on attempt to use MTRRs. And we ignore the APIC one. > > > > This is trying to guess what maintainers would want here. I'm less sure what Paolo prefers. > > When we hit KVM_MSR_FILTER, the current implementation ignores it and makes it > error to guest. Surely we should make it KVM_EXIT_X86_{RDMSR, WRMSR}, instead. > It's aligns with the existing implementation(default VM and SW-protected) and > more flexible. Something like this for "112/130 KVM: TDX: Handle TDX PV rdmsr/wrmsr hypercall" Compile only tested at this point. diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c index f891de30a2dd..4d9ae5743e24 100644 --- a/arch/x86/kvm/vmx/tdx.c +++ b/arch/x86/kvm/vmx/tdx.c @@ -1388,31 +1388,67 @@ static int tdx_emulate_mmio(struct kvm_vcpu *vcpu) return 1; } +static int tdx_complete_rdmsr(struct kvm_vcpu *vcpu) +{ + if (vcpu->run->msr.error) + tdvmcall_set_return_code(vcpu, TDVMCALL_INVALID_OPERAND); + else { + tdvmcall_set_return_code(vcpu, TDVMCALL_SUCCESS); + tdvmcall_set_return_val(vcpu, vcpu->run->msr.data); + } + return 1; +} + static int tdx_emulate_rdmsr(struct kvm_vcpu *vcpu) { u32 index = tdvmcall_a0_read(vcpu); u64 data; - if (!kvm_msr_allowed(vcpu, index, KVM_MSR_FILTER_READ) || - kvm_get_msr(vcpu, index, &data)) { + if (!kvm_msr_allowed(vcpu, index, KVM_MSR_FILTER_READ)) { + trace_kvm_msr_read_ex(index); + tdvmcall_set_return_code(vcpu, TDVMCALL_INVALID_OPERAND); + return kvm_msr_user_space(vcpu, index, KVM_EXIT_X86_RDMSR, 0, + tdx_complete_rdmsr, + KVM_MSR_RET_FILTERED); + } + + if (kvm_get_msr(vcpu, index, &data)) { trace_kvm_msr_read_ex(index); tdvmcall_set_return_code(vcpu, TDVMCALL_INVALID_OPERAND); return 1; } - trace_kvm_msr_read(index, data); + trace_kvm_msr_read(index, data); tdvmcall_set_return_code(vcpu, TDVMCALL_SUCCESS); tdvmcall_set_return_val(vcpu, data); return 1; } +static int tdx_complete_wrmsr(struct kvm_vcpu *vcpu) +{ + if (vcpu->run->msr.error) + tdvmcall_set_return_code(vcpu, TDVMCALL_INVALID_OPERAND); + else + tdvmcall_set_return_code(vcpu, TDVMCALL_SUCCESS); + return 1; +} + static int tdx_emulate_wrmsr(struct kvm_vcpu *vcpu) { u32 index = tdvmcall_a0_read(vcpu); u64 data = tdvmcall_a1_read(vcpu); - if (!kvm_msr_allowed(vcpu, index, KVM_MSR_FILTER_WRITE) || - kvm_set_msr(vcpu, index, data)) { + if (!kvm_msr_allowed(vcpu, index, KVM_MSR_FILTER_WRITE)) { + trace_kvm_msr_write_ex(index, data); + tdvmcall_set_return_code(vcpu, TDVMCALL_INVALID_OPERAND); + if (kvm_msr_user_space(vcpu, index, KVM_EXIT_X86_WRMSR, data, + tdx_complete_wrmsr, + KVM_MSR_RET_FILTERED)) + return 1; + return 0; + } + + if (kvm_set_msr(vcpu, index, data)) { trace_kvm_msr_write_ex(index, data); tdvmcall_set_return_code(vcpu, TDVMCALL_INVALID_OPERAND); return 1; -- Isaku Yamahata <isaku.yamahata@xxxxxxxxx>