Re: [PATCH 3/3] Fix TSC MSR read in nested SVM

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Caution: this requires more care.

Pretty sure this breaks userspace suspend at the cost of supporting a
not-so-reasonable hardware feature.

On Tue, Aug 2, 2011 at 5:55 AM, Nadav Har'El <nyh@xxxxxxxxxx> wrote:
> When the TSC MSR is read by an L2 guest (when L1 allowed this MSR to be
> read without exit), we need to return L2's notion of the TSC, not L1's.
>
> The current code incorrectly returned L1 TSC, because svm_get_msr() was also
> used in x86.c where this was assumed, but now that these places call the new
> svm_read_l1_tsc(), the MSR read can be fixed.
>
> Signed-off-by: Nadav Har'El <nyh@xxxxxxxxxx>
> ---
>  arch/x86/kvm/svm.c |    4 +---
>  1 file changed, 1 insertion(+), 3 deletions(-)
>
> --- .before/arch/x86/kvm/svm.c  2011-08-02 15:51:02.000000000 +0300
> +++ .after/arch/x86/kvm/svm.c   2011-08-02 15:51:02.000000000 +0300
> @@ -2907,9 +2907,7 @@ static int svm_get_msr(struct kvm_vcpu *
>
>        switch (ecx) {
>        case MSR_IA32_TSC: {
> -               struct vmcb *vmcb = get_host_vmcb(svm);
> -
> -               *data = vmcb->control.tsc_offset +
> +               *data = svm->vmcb->control.tsc_offset +
>                        svm_scale_tsc(vcpu, native_read_tsc());
>
>                break;
>


This is correct.  Now you properly return the correct MSR value for
the TSC to the guest in all cases.

However, there is a BIG PROBLEM (yes, it is that bad).  Sorry I did
not think of it before.

When the guest gets suspended and frozen by userspace, to be restarted
later, what qemu is going to do is come along and read all of the MSRs
as part of the saved state.  One of those happens to be the TSC MSR.

Since you can't guarantee suspend will take place when only L1 is
running, you may have mixed L1/L2 TSC MSRs now being returned to
userspace.  Now, when you resume this guest, you will have mixed L1/L2
TSC values written into the MSRs.

Those will almost certainly fail to be matched by the TSC offset
matching code, and thus, with multiple VCPUs, you will end up with
slightly unsynchronized TSCs, and with that, all the problems
associated with unstable TSC come back to haunt you again.  Basically,
all bets for stability are off.

Apart from recording the L1 and L2 TSC through separate MSRs, there
isn't a good way to solve this.  Walking through all the steps again,
I'm pretty sure this is why I thought it would be better for L0
kvm_get_msr() to return the L1 values even if L2 was running.

In the end, it may not be worth the hassle to support this mode of
operation that to our current knowledge, is in fact, unused.  I would
much prefer to see a bad virtualization of a known insecure and unused
guest mode than to have a well used feature such as L1 guest suspend /
resume continue to cause clock de-synchronization.

Zach
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]
  Powered by Linux