Re: [PATCH 1/5] KVM: nSVM: Sync next_rip field from vmcb12 to vmcb02

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, 2022-04-01 at 21:51 +0000, Sean Christopherson wrote:
> On Fri, Apr 01, 2022, Maciej S. Szmigiero wrote:
> > On 1.04.2022 20:32, Sean Christopherson wrote:
> > > On Thu, Mar 10, 2022, Maciej S. Szmigiero wrote:
> > > > +	/* The return address pushed on stack by the CPU for some injected events */
> > > > +	svm->vmcb->control.next_rip            = svm->nested.ctl.next_rip;
> > > 
> > > This needs to be gated by nrips being enabled _and_ exposed to L1, i.e.
> > > 
> > > 	if (svm->nrips_enabled)
> > > 		vmcb02->control.next_rip    = svm->nested.ctl.next_rip;
> > 
> > It can be done, however what if we run on a nrips-capable CPU,
> > but don't expose this capability to the L1?
> 
> Oh, right, because the field will be populated by the CPU on VM-Exit.  Ah, the
> correct behavior is to grab RIP from vmcb12 to emulate nrips=0 hardware simply
> not updating RIP.  E.g. zeroing it out would send L2 into the weeds on IRET due
> the CPU pushing '0' on the stack when vectoring the injected event.
> 
> 	if (svm->nrips_enabled)
> 		vmcb02->control.next_rip    = svm->nested.ctl.next_rip;
> 	else if (boot_cpu_has(X86_FEATURE_NRIPS))
> 		vmcb02->control.next_rip    = vmcb12_rip;
> 
> > The CPU will then push whatever value was left in this field as
> > the return address for some L1 injected events.

This makes sense.

Note that even AMD's PRM has a note about this:

"
15.20 Event Injection
...
Software interrupts cannot be properly injected if the processor does not support the NextRIP field.
Support is indicated by CPUID Fn8000_000A_EDX[NRIPS] = 1. Hypervisor software should
emulate the event injection of software interrupts if NextRIP is not supported
"



> > 
> > Although without nrips feature the L1 shouldn't even attempt event
> > injection, copying this field anyway will make it work if L1 just
> > expects this capability based on the current CPU model rather than
> > by checking specific CPUID feature bits.

The guest really ought to check CPUID bits. Plus the CPU model is also
usually virtualized (for named machine types in Qemu for example).

> 
> L1 may still inject the exception, it just advances the RIP manually.  As above,
> the really messy thing is that, because there's no flag to say "don't use NextRIP!",
> the CPU will still consume NextRIP and push '0' on the stack for the return RIP
> from the INTn/INT3/INTO.  Yay.
> 
> I found that out the hard way (patch in-progress).  The way to handle event
> injection if KVM is loaded with nrips=0 but nrips is supported in hardware is to
> stuff NextRIP on event injection even if nrips=0, otherwise the guest is hosed.
> 
> > > > +	u64 next_rip;
> > > >   	u64 nested_cr3;
> > > >   	u64 virt_ext;
> > > >   	u32 clean;
> > > 
> > > I don't know why this struct has
> > > 
> > > 	u8 reserved_sw[32];
> > > 
> > > but presumably it's for padding, i.e. probably should be reduced to 24 bytes.
> > 
> > Apparently the "reserved_sw" field stores Hyper-V enlightenments state -
> > see commit 66c03a926f18 ("KVM: nSVM: Implement Enlightened MSR-Bitmap feature")
> > and nested_svm_vmrun_msrpm() in nested.c.
> 
> Argh, that's a terrible name.  Thanks for doing the homework, I was being lazy.


That was added around the commit 

1183646a67d01 ("KVM: SVM: hyper-v: Direct Virtual Flush support")

Seems to be used by HV to store 'struct hv_enlightenments',
but I don't know 100% if that is the only thing that can be stored
in this area.



Best regards,
	Maxim Levitsky


> 





[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux