2015-03-24 15:33-0700, Andy Lutomirski: > On Tue, Mar 24, 2015 at 8:34 AM, Radim Krčmář <rkrcmar@xxxxxxxxxx> wrote: > > What is the problem? > > The kvmclock spec says that the host will increment a version field to > an odd number, then update stuff, then increment it to an even number. > The host is buggy and doesn't do this, and the result is observable > when one vcpu reads another vcpu's kvmclock data. > > Since there's no good way for a guest kernel to keep its vdso from > reading a different vcpu's kvmclock data, this is a real corner-case > bug. This patch allows the vdso to retry when this happens. I don't > think it's a great solution, but it should mostly work. Great explanation, thank you. Reverting the patch protects us from any migration, but I don't think we need to care about changing VCPUs as long as we read a consistent data from kvmclock. (VCPU can change outside of this loop too, so it doesn't matter if we return a value not fit for this VCPU.) I think we could drop the second __getcpu if our kvmclock was being handled better; maybe with a patch like the one below: diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index cc2c759f69a3..8658599e0024 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -1658,12 +1658,24 @@ static int kvm_guest_time_update(struct kvm_vcpu *v) &guest_hv_clock, sizeof(guest_hv_clock)))) return 0; - /* - * The interface expects us to write an even number signaling that the - * update is finished. Since the guest won't see the intermediate - * state, we just increase by 2 at the end. + /* A guest can read other VCPU's kvmclock; specification says that + * version is odd if data is being modified and even after it is + * consistent. + * We write three times to be sure. + * 1) update version to odd number + * 2) write modified data (version is still odd) + * 3) update version to even number + * + * TODO: optimize + * - only two writes should be enough -- version is first + * - the second write could update just version */ - vcpu->hv_clock.version = guest_hv_clock.version + 2; + guest_hv_clock.version += 1; + kvm_write_guest_cached(v->kvm, &vcpu->pv_time, + &guest_hv_clock, + sizeof(guest_hv_clock)); + + vcpu->hv_clock.version = guest_hv_clock.version; /* retain PVCLOCK_GUEST_STOPPED if set in guest copy */ pvclock_flags = (guest_hv_clock.flags & PVCLOCK_GUEST_STOPPED); @@ -1684,6 +1696,11 @@ static int kvm_guest_time_update(struct kvm_vcpu *v) kvm_write_guest_cached(v->kvm, &vcpu->pv_time, &vcpu->hv_clock, sizeof(vcpu->hv_clock)); + + vcpu->hv_clock.version += 1; + kvm_write_guest_cached(v->kvm, &vcpu->pv_time, + &vcpu->hv_clock, + sizeof(vcpu->hv_clock)); return 0; } -- To unsubscribe from this list: send the line "unsubscribe stable" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html