On Wed, Aug 03, 2011 at 01:34:58AM -0700, Zachary Amsden wrote: > Caution: this requires more care. > > Pretty sure this breaks userspace suspend at the cost of supporting a > not-so-reasonable hardware feature. > > On Tue, Aug 2, 2011 at 5:55 AM, Nadav Har'El <nyh@xxxxxxxxxx> wrote: > > When the TSC MSR is read by an L2 guest (when L1 allowed this MSR to be > > read without exit), we need to return L2's notion of the TSC, not L1's. > > > > The current code incorrectly returned L1 TSC, because svm_get_msr() was also > > used in x86.c where this was assumed, but now that these places call the new > > svm_read_l1_tsc(), the MSR read can be fixed. > > > > Signed-off-by: Nadav Har'El <nyh@xxxxxxxxxx> > > --- > > arch/x86/kvm/svm.c | 4 +--- > > 1 file changed, 1 insertion(+), 3 deletions(-) > > > > --- .before/arch/x86/kvm/svm.c 2011-08-02 15:51:02.000000000 +0300 > > +++ .after/arch/x86/kvm/svm.c 2011-08-02 15:51:02.000000000 +0300 > > @@ -2907,9 +2907,7 @@ static int svm_get_msr(struct kvm_vcpu * > > > > switch (ecx) { > > case MSR_IA32_TSC: { > > - struct vmcb *vmcb = get_host_vmcb(svm); > > - > > - *data = vmcb->control.tsc_offset + > > + *data = svm->vmcb->control.tsc_offset + > > svm_scale_tsc(vcpu, native_read_tsc()); > > > > break; > > > > > This is correct. Now you properly return the correct MSR value for > the TSC to the guest in all cases. > > However, there is a BIG PROBLEM (yes, it is that bad). Sorry I did > not think of it before. > > When the guest gets suspended and frozen by userspace, to be restarted > later, what qemu is going to do is come along and read all of the MSRs > as part of the saved state. One of those happens to be the TSC MSR. > > Since you can't guarantee suspend will take place when only L1 is > running, you may have mixed L1/L2 TSC MSRs now being returned to > userspace. Now, when you resume this guest, you will have mixed L1/L2 > TSC values written into the MSRs. > > Those will almost certainly fail to be matched by the TSC offset > matching code, and thus, with multiple VCPUs, you will end up with > slightly unsynchronized TSCs, and with that, all the problems > associated with unstable TSC come back to haunt you again. Basically, > all bets for stability are off. TSC synchronization is the least of your problems if you attempt to save/restore state a guest while any vcpu is in L2 mode. So to keep consistency between the remaining MSRs, i agree with Nadav's patch. Apparently SVM's original patches to return L1 TSC were aimed at fixing the problem VMX is facing now, which is fixed by introduction read_l1_tsc helpers. > Apart from recording the L1 and L2 TSC through separate MSRs, there > isn't a good way to solve this. Walking through all the steps again, > I'm pretty sure this is why I thought it would be better for L0 > kvm_get_msr() to return the L1 values even if L2 was running. > > In the end, it may not be worth the hassle to support this mode of > operation that to our current knowledge, is in fact, unused. I would > much prefer to see a bad virtualization of a known insecure and unused > guest mode than to have a well used feature such as L1 guest suspend / > resume continue to cause clock de-synchronization. > > Zach -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html