On Thu, Feb 25, 2010 at 04:17:22PM +0100, Jan Kiszka wrote: > Marcelo Tosatti wrote: > > On Thu, Feb 25, 2010 at 09:48:47AM +0100, Jan Kiszka wrote: > >> Marcelo Tosatti wrote: > >>> On Thu, Feb 25, 2010 at 12:58:26AM +0100, Jan Kiszka wrote: > >>>> Marcelo Tosatti wrote: > >>>>> On Thu, Feb 25, 2010 at 12:45:55AM +0100, Jan Kiszka wrote: > >>>>>> Marcelo Tosatti wrote: > >>>>>>> On Wed, Feb 24, 2010 at 03:17:55PM +0100, Jan Kiszka wrote: > >>>>>>>> Drop kvm_load_tsc in favor of level-dependent writeback in > >>>>>>>> kvm_arch_load_regs. KVM's PV clock MSRs fall in the same category and > >>>>>>>> should therefore only be written back on full sync. > >>>>>>>> > >>>>>>>> Signed-off-by: Jan Kiszka <jan.kiszka@xxxxxxxxxxx> > >>>>>>>> --- > >>>>>>>> qemu-kvm-x86.c | 19 +++++-------------- > >>>>>>>> qemu-kvm.h | 4 ---- > >>>>>>>> target-i386/machine.c | 5 ----- > >>>>>>>> 3 files changed, 5 insertions(+), 23 deletions(-) > >>>>>>>> > >>>>>>>> diff --git a/qemu-kvm-x86.c b/qemu-kvm-x86.c > >>>>>>>> index 840c1c9..84fd7fa 100644 > >>>>>>>> --- a/qemu-kvm-x86.c > >>>>>>>> +++ b/qemu-kvm-x86.c > >>>>>>>> @@ -965,8 +965,11 @@ void kvm_arch_load_regs(CPUState *env, int level) > >>>>>>>> set_msr_entry(&msrs[n++], MSR_LSTAR , env->lstar); > >>>>>>>> } > >>>>>>>> #endif > >>>>>>>> - set_msr_entry(&msrs[n++], MSR_KVM_SYSTEM_TIME, env->system_time_msr); > >>>>>>>> - set_msr_entry(&msrs[n++], MSR_KVM_WALL_CLOCK, env->wall_clock_msr); > >>>>>>>> + if (level == KVM_PUT_FULL_STATE) { > >>>>>>>> + set_msr_entry(&msrs[n++], MSR_IA32_TSC, env->tsc); > >>>>>>>> + set_msr_entry(&msrs[n++], MSR_KVM_SYSTEM_TIME, env->system_time_msr); > >>>>>>>> + set_msr_entry(&msrs[n++], MSR_KVM_WALL_CLOCK, env->wall_clock_msr); > >>>>>>>> + } > >>>>>>> As things stand today, the TSC should only be written on migration. See > >>>>>>> 53f658b3c33616a4997ee254311b335e59063289 in the kernel. > >>>>>> Migration and power-up - that's what this patch ensures (=> > >>>>>> KVM_PUT_FULL_STATE). Or where do you see any problem? > >>>>>> > >>>>>> Jan > >>>>>> > >>>>> The problem is it should not write on power up (the kernel attempts > >>>>> to synchronize the TSCs in that case, see the commit). > >>>>> > >>>> OK, need to read this more carefully. > >>>> > >>>> I do not yet understand the difference from user space POV: it tries to > >>>> transfer the identical TSC values to all currently stopped VCPU threads. > >>> guest tsc = host tsc + offset > >>> > >>> So at the time you set_msr(TSC), the guest visible TSC starts ticking. > >>> For SMP guests, this does not happen exactly at the same time for all > >>> vcpus. > >> Ouch. > >> > >>>> That should not be different if we are booting a fresh VM or loading a > >>>> complete state of a migrated image. If it does, it looks like a KVM > >>>> kernel deficit on first glance. > >>> Yes it is a deficit. After migration TSCs of SMP guests go out of sync. > >>> Zachary is working on that. > >>> > >> OK, so we need a workaround, ideally without reintroducing hooks. Is > >> this one acceptable? > >> > >> > >> diff --git a/qemu-kvm-x86.c b/qemu-kvm-x86.c > >> index 84fd7fa..285c05a 100644 > >> --- a/qemu-kvm-x86.c > >> +++ b/qemu-kvm-x86.c > >> @@ -966,7 +966,15 @@ void kvm_arch_load_regs(CPUState *env, int level) > >> } > >> #endif > >> if (level == KVM_PUT_FULL_STATE) { > >> - set_msr_entry(&msrs[n++], MSR_IA32_TSC, env->tsc); > >> + /* > >> + * KVM is yet unable to synchronize TSC values of multiple VCPUs on > >> + * writeback. Until this is fixed, we only write the offset to SMP > >> + * guests after migration, desynchronizing the VCPUs, but avoiding > >> + * huge jump-backs that would occur without any writeback at all. > >> + */ > >> + if (smp_cpus == 1 || env->tsc != 0) { > >> + set_msr_entry(&msrs[n++], MSR_IA32_TSC, env->tsc); > >> + } > >> set_msr_entry(&msrs[n++], MSR_KVM_SYSTEM_TIME, env->system_time_msr); > >> set_msr_entry(&msrs[n++], MSR_KVM_WALL_CLOCK, env->wall_clock_msr); > >> } > > > > Well, migration of the SMP TSCs is not precise, but it is better than > > zeroing the TSCs on migration. So something like a new state (migration > > only), or a hack that mimicks that behaviour, is needed :( > > It is not precise, but the above code shouldn't behave differently > compared to the existing one: tcp != 0 => we are loading values from > some VM that ran before, i.e. we are migrating. > > If somehow possible, I do not want to introduce a new writeback level > for an x86-only kernel quirk. > > Jan I can't read. Looks OK. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html