Re: [PATCH v12 20/22] x86/virt/tdx: Allow SEAMCALL to handle #UD and #GP

Peter Zijlstra <peterz@xxxxxxxxxxxxx> · Wed, 5 Jul 2023 14:19:21 +0200

On Wed, Jul 05, 2023 at 11:34:53AM +0000, Huang, Kai wrote:

> Yeah I think from long-term's view, since SEAMCALLs to support live migration
> pretty much uses all RCX/RDX/R8-R15 as input/output, it seems reasonable to
> unify all of them, although I guess there might be some special handling to
> VP.VMCALL and/or VP.ENTER, e.g., below:
> 
>         /* TDVMCALL leaf return code is in R10 */                              
>         movq %r10, %rax
> 
> So long-termly, I don't have objection to that.  But my thinking is for the
> first version of TDX host support, we don't have to support all SEAMCALLs but
> only those involved in basic TDX support. 

Since those calls are out now, we should look at them now, there is no
point in delaying the pain. That then gives us two options:

 - we accept them and their wonky calling convention and our code should
   be ready for it.

 - we reject them and send the TDX team a message to please try again
   but with a saner calling convention.

Sticking our head in the sand and pretending like they don't exist isn't
really a viable option at this point.

> Also, the new SEAMCALLs to handle live migration all seem to have below
> statement:
> 
> 	AVX, AVX2	May be reset to the architectural INIT state
> 	and
> 	AVX512
> 	state
> 
> Which means those SEAMCALLs need to preserve AVX* states too?

Yes, we need to ensure the userspace 'FPU' state is saved before
we call them. But I _think_ that KVM already does much of that.

> And reading the spec, the VP.VMCALL and VP.ENTER also can use XMM0 - XMM15 as
> input/output.  Linux VP.VMCALL seems doesn't support using XMM0 - XMM15 as
> input/output, but KVM can run other guest OSes too so I think KVM VP.ENTER needs
> to handle XMM0-XMM15 as input/output too.

Why would KVM accept VMCALLs it doesn't know about? Just trash the
guest and call it a day.

> That being said, I think although we can provide a common asm macro to cover
> VP.ENTER, I suspect KVM still needs to do additional assembly around the macro
> too.  So I am not sure whether we should try to cover VP.ENTER.

Not sure about asm, we have interfaces to save the XMM/AVX regs.
kernel_fpu_begin() comes to mind, but I know there's more of that,
including some for KVM specifically.

> > I don't think they should be special, they're really just yet another
> > leaf call. Yes, they have a shit calling convention, and yes VP.ENTER is
> > terminally broken for unconditionally clobbering BP :-(
> > 
> > That really *must* be fixed.
> 
> Sure I don't have objection to this, and for VP.ENTER please see above.
> 
> But I'd like to say that, generally speaking, from virtualization's point of
> view, guest has its own BP and conceptually the hypervisor needs to restore
> guest's BP before jumping to the guest.  E.g., for normal VMX guest, KVM always
> restores guest's BP before VMENTER (arch/x86/kvm/vmx/vmenter.S):
> 
> SYM_FUNC_START(__vmx_vcpu_run)
>         push %_ASM_BP
>         mov  %_ASM_SP, %_ASM_BP
> 	
> 	...
> 	mov VCPU_RBP(%_ASM_AX), %_ASM_BP
> 	...
> 	vmenter/vmresume
> 	...
> SYM_INNER_LABEL(vmx_vmexit, SYM_L_GLOBAL)
> 	.....
> 	mov %_ASM_BP, VCPU_RBP(%_ASM_AX)
> 	...
> 	pop %_ASM_BP
>         RET

That's disgusting :/ So what happens if we get an NMI after VMENTER and
before POP? Then it sees a garbage BP value.

Why is all this stuff such utter crap?