On Thu, Apr 23, 2015 at 01:46:55PM +0200, Paolo Bonzini wrote: > From: Radim Krčmář <rkrcmar@xxxxxxxxxx> > > The kvmclock spec says that the host will increment a version field to > an odd number, then update stuff, then increment it to an even number. > The host is buggy and doesn't do this, and the result is observable > when one vcpu reads another vcpu's kvmclock data. > > There's no good way for a guest kernel to keep its vdso from reading > a different vcpu's kvmclock data, but we don't need to care about > changing VCPUs as long as we read a consistent data from kvmclock. > (VCPU can change outside of this loop too, so it doesn't matter if we > return a value not fit for this VCPU.) > > Based on a patch by Radim Krčmář. > > Signed-off-by: Paolo Bonzini <pbonzini@xxxxxxxxxx> > --- > arch/x86/kvm/x86.c | 33 ++++++++++++++++++++++++++++----- > 1 file changed, 28 insertions(+), 5 deletions(-) > > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c > index ed31c31b2485..c73efcd03e29 100644 > --- a/arch/x86/kvm/x86.c > +++ b/arch/x86/kvm/x86.c > @@ -1669,12 +1669,28 @@ static int kvm_guest_time_update(struct kvm_vcpu *v) > &guest_hv_clock, sizeof(guest_hv_clock)))) > return 0; > > - /* > - * The interface expects us to write an even number signaling that the > - * update is finished. Since the guest won't see the intermediate > - * state, we just increase by 2 at the end. > + /* This VCPU is paused, but it's legal for a guest to read another > + * VCPU's kvmclock, so we really have to follow the specification where > + * it says that version is odd if data is being modified, and even after > + * it is consistent. > + * > + * Version field updates must be kept separate. This is because > + * kvm_write_guest_cached might use a "rep movs" instruction, and > + * writes within a string instruction are weakly ordered. So there > + * are three writes overall. > + * > + * As a small optimization, only write the version field in the first > + * and third write. The vcpu->pv_time cache is still valid, because the > + * version field is the first in the struct. > */ > - vcpu->hv_clock.version = guest_hv_clock.version + 2; > + BUILD_BUG_ON(offsetof(struct pvclock_vcpu_time_info, version) != 0); > + > + vcpu->hv_clock.version = guest_hv_clock.version + 1; > + kvm_write_guest_cached(v->kvm, &vcpu->pv_time, > + &vcpu->hv_clock, > + sizeof(vcpu->hv_clock.version)); > + > + smp_wmb(); > > /* retain PVCLOCK_GUEST_STOPPED if set in guest copy */ > pvclock_flags = (guest_hv_clock.flags & PVCLOCK_GUEST_STOPPED); > @@ -1695,6 +1711,13 @@ static int kvm_guest_time_update(struct kvm_vcpu *v) > kvm_write_guest_cached(v->kvm, &vcpu->pv_time, > &vcpu->hv_clock, > sizeof(vcpu->hv_clock)); > + > + smp_wmb(); > + > + vcpu->hv_clock.version++; > + kvm_write_guest_cached(v->kvm, &vcpu->pv_time, > + &vcpu->hv_clock, > + sizeof(vcpu->hv_clock.version)); > return 0; > } > > -- > 1.8.3.1 Acked-by: Marcelo Tosatti <mtosatti@xxxxxxxxxx> -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html