Marcelo Tosatti wrote: > On Thu, Feb 25, 2010 at 09:48:47AM +0100, Jan Kiszka wrote: >> Marcelo Tosatti wrote: >>> On Thu, Feb 25, 2010 at 12:58:26AM +0100, Jan Kiszka wrote: >>>> Marcelo Tosatti wrote: >>>>> On Thu, Feb 25, 2010 at 12:45:55AM +0100, Jan Kiszka wrote: >>>>>> Marcelo Tosatti wrote: >>>>>>> On Wed, Feb 24, 2010 at 03:17:55PM +0100, Jan Kiszka wrote: >>>>>>>> Drop kvm_load_tsc in favor of level-dependent writeback in >>>>>>>> kvm_arch_load_regs. KVM's PV clock MSRs fall in the same category and >>>>>>>> should therefore only be written back on full sync. >>>>>>>> >>>>>>>> Signed-off-by: Jan Kiszka <jan.kiszka@xxxxxxxxxxx> >>>>>>>> --- >>>>>>>> qemu-kvm-x86.c | 19 +++++-------------- >>>>>>>> qemu-kvm.h | 4 ---- >>>>>>>> target-i386/machine.c | 5 ----- >>>>>>>> 3 files changed, 5 insertions(+), 23 deletions(-) >>>>>>>> >>>>>>>> diff --git a/qemu-kvm-x86.c b/qemu-kvm-x86.c >>>>>>>> index 840c1c9..84fd7fa 100644 >>>>>>>> --- a/qemu-kvm-x86.c >>>>>>>> +++ b/qemu-kvm-x86.c >>>>>>>> @@ -965,8 +965,11 @@ void kvm_arch_load_regs(CPUState *env, int level) >>>>>>>> set_msr_entry(&msrs[n++], MSR_LSTAR , env->lstar); >>>>>>>> } >>>>>>>> #endif >>>>>>>> - set_msr_entry(&msrs[n++], MSR_KVM_SYSTEM_TIME, env->system_time_msr); >>>>>>>> - set_msr_entry(&msrs[n++], MSR_KVM_WALL_CLOCK, env->wall_clock_msr); >>>>>>>> + if (level == KVM_PUT_FULL_STATE) { >>>>>>>> + set_msr_entry(&msrs[n++], MSR_IA32_TSC, env->tsc); >>>>>>>> + set_msr_entry(&msrs[n++], MSR_KVM_SYSTEM_TIME, env->system_time_msr); >>>>>>>> + set_msr_entry(&msrs[n++], MSR_KVM_WALL_CLOCK, env->wall_clock_msr); >>>>>>>> + } >>>>>>> As things stand today, the TSC should only be written on migration. See >>>>>>> 53f658b3c33616a4997ee254311b335e59063289 in the kernel. >>>>>> Migration and power-up - that's what this patch ensures (=> >>>>>> KVM_PUT_FULL_STATE). Or where do you see any problem? >>>>>> >>>>>> Jan >>>>>> >>>>> The problem is it should not write on power up (the kernel attempts >>>>> to synchronize the TSCs in that case, see the commit). >>>>> >>>> OK, need to read this more carefully. >>>> >>>> I do not yet understand the difference from user space POV: it tries to >>>> transfer the identical TSC values to all currently stopped VCPU threads. >>> guest tsc = host tsc + offset >>> >>> So at the time you set_msr(TSC), the guest visible TSC starts ticking. >>> For SMP guests, this does not happen exactly at the same time for all >>> vcpus. >> Ouch. >> >>>> That should not be different if we are booting a fresh VM or loading a >>>> complete state of a migrated image. If it does, it looks like a KVM >>>> kernel deficit on first glance. >>> Yes it is a deficit. After migration TSCs of SMP guests go out of sync. >>> Zachary is working on that. >>> >> OK, so we need a workaround, ideally without reintroducing hooks. Is >> this one acceptable? >> >> >> diff --git a/qemu-kvm-x86.c b/qemu-kvm-x86.c >> index 84fd7fa..285c05a 100644 >> --- a/qemu-kvm-x86.c >> +++ b/qemu-kvm-x86.c >> @@ -966,7 +966,15 @@ void kvm_arch_load_regs(CPUState *env, int level) >> } >> #endif >> if (level == KVM_PUT_FULL_STATE) { >> - set_msr_entry(&msrs[n++], MSR_IA32_TSC, env->tsc); >> + /* >> + * KVM is yet unable to synchronize TSC values of multiple VCPUs on >> + * writeback. Until this is fixed, we only write the offset to SMP >> + * guests after migration, desynchronizing the VCPUs, but avoiding >> + * huge jump-backs that would occur without any writeback at all. >> + */ >> + if (smp_cpus == 1 || env->tsc != 0) { >> + set_msr_entry(&msrs[n++], MSR_IA32_TSC, env->tsc); >> + } >> set_msr_entry(&msrs[n++], MSR_KVM_SYSTEM_TIME, env->system_time_msr); >> set_msr_entry(&msrs[n++], MSR_KVM_WALL_CLOCK, env->wall_clock_msr); >> } > > Well, migration of the SMP TSCs is not precise, but it is better than > zeroing the TSCs on migration. So something like a new state (migration > only), or a hack that mimicks that behaviour, is needed :( It is not precise, but the above code shouldn't behave differently compared to the existing one: tcp != 0 => we are loading values from some VM that ran before, i.e. we are migrating. If somehow possible, I do not want to introduce a new writeback level for an x86-only kernel quirk. Jan -- Siemens AG, Corporate Technology, CT T DE IT 1 Corporate Competence Center Embedded Linux -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html