On Mon, Nov 14, 2016 at 04:00:38PM +0100, Paolo Bonzini wrote: > > > On 14/11/2016 15:50, Marcelo Tosatti wrote: > > Well, i didnt want to mix the meaning of the variables: > > > > + /* whether machine supports reliable KVM_GET_CLOCK */ > > + bool mach_use_reliable_get_clock; > > + > > + /* whether source host supported reliable KVM_GET_CLOCK */ > > + bool src_use_reliable_get_clock; > > > > See the comments on top (later if you look at the variable, > > then have to think: well it has one name, but its disabled > > by that other path as well, so its more than its > > name,etc...). > > > >> I'm thinking "mach_use_reliable_get_clock is just for migration, > > > > Thats whether the machine supports it. New machines have it enabled, > > olders don't. > > Yes. > > >> src_use_reliable_get_clock is the state". > > > > Thats whether the migration source supported it. > > But it's not used only for migration. It's used on every vmstate change > (running->stop and stop->running, isn't it? No. Its used only if s->clock_valid == false, which means either: migration/savevm/init. Now for savevm there is a bug: > I think that, apart from > the migration case, it's better to use s->clock if kvmclock is stable, > even on older machine types. Yes, its already doing that: /* local (running VM) restore */ if (s->clock_valid) { /* * if host does not support reliable KVM_GET_CLOCK, * read kvmclock value from memory */ if (!kvm_has_adjust_clock_stable()) { time_at_migration = kvmclock_current_nsec(s); } It only uses kvmclock_current_nsec() if kvm_has_adjust_clock_stable is false. > >>>>> +static bool kvmclock_src_use_reliable_get_clock(void *opaque) > >>>>> +{ > >>>>> + KVMClockState *s = opaque; > >>>>> + > >>>>> + /* > >>>>> + * On machine types that support reliable KVM_GET_CLOCK, > >>>>> + * if host kernel does provide reliable KVM_GET_CLOCK, > >>>>> + * set src_use_reliable_get_clock=true so that destination > >>>>> + * avoids reading kvmclock from memory. > >>>>> + */ > >>>>> + if (s->mach_use_reliable_get_clock && kvm_has_adjust_clock_stable()) { > >>>>> + s->src_use_reliable_get_clock = true; > >>>>> + } > >>>>> + > >>>>> + return s->src_use_reliable_get_clock; > >>>>> +} > >>>> > >>>> Here you can just return s->mach_use_reliable_get_clock. > >>> > >>> mach_use_reliable_get_clock can be true but host might not support it. > >> > >> Yes, but the "needed" function is only required to avoid breaking > >> pc-i440fx-2.7 and earlier. > > > > "needed" is required so that the migration between: > > > > SRC DEST BEHAVIOUR > > ~support supports on migration read from guest, > > on stop/cont use > > kvm_get_clock/kvm_set_clock > > > > Destination does not use KVM_GET_CLOCK value (which is > > broken and should not be used). > > If needed returns false, the destination will see > src_use_reliable_get_clock = false anyway. Causing it to read the clock from memory, thats what should happen. > If needed returns true, the destination can still see > src_use_reliable_get_clock = false if that's the value. You are asking me to change from: +static bool kvmclock_src_use_reliable_get_clock(void *opaque) +{ + KVMClockState *s = opaque; + + /* + * On machine types that support reliable KVM_GET_CLOCK, + * if host kernel does provide reliable KVM_GET_CLOCK, + * set src_use_reliable_get_clock=true so that destination + * avoids reading kvmclock from memory. + */ + if (s->mach_use_reliable_get_clock && kvm_has_adjust_clock_stable()) { + s->src_use_reliable_get_clock = true; + } + + return s->src_use_reliable_get_clock; +} to static bool kvmclock_src_use_reliable_get_clock(void *opaque) { KVMClockState *s = opaque; /* * On machine types that support reliable KVM_GET_CLOCK, * if host kernel does provide reliable KVM_GET_CLOCK, * set src_use_reliable_get_clock=true so that destination * avoids reading kvmclock from memory. */ if (s->mach_use_reliable_get_clock && kvm_has_adjust_clock_stable()) { s->src_use_reliable_get_clock = true; } return s->mach_use_reliable_get_clock; } Ah, OK, done. > needed src_use_reliable_get_clock effect > false false use kvmclock_current_nsec > false true use kvmclock_current_nsec > true false use kvmclock_current_nsec > true true use s->clock > > > So the idea is: > > - set s->clock and s->reliable_get_clock on every KVM_GET_CLOCK Already do that (apart from s->clock_valid which is necessary). > - on migration source, use subsections and > s->mach_use_reliable_get_clock to avoid breaking migration on pre-2.8 > machine types Migration is already broken when you use different machine types. So s->src_use_reliable_get_clock is only used to indicate to the destination that: "you can use KVM_GET_CLOCK value, its safe". > - on migration destination, use .pre_load so that s->reliable_get_clock > is initialized to false on older machine types Thats what mach_use_reliable_get_clock does already: static Property kvmclock_properties[] = { DEFINE_PROP_BOOL("mach_use_reliable_get_clock", KVMClockState, mach_use_reliable_get_clock, true), DEFINE_PROP_END_OF_LIST(), }; + .driver = "kvmclock",\ + .property = "mach_use_reliable_get_clock",\ + .value = "off",\ + },\ + {\ > > >> If you return true here, you can still > >> migrate a "false" value for src_use_reliable_get_clock. > > > > But the source only uses a reliable KVM_GET_CLOCK if > > both conditions are true. > > > > And the subsection is only needed if the source > > uses a reliable KVM_GET_CLOCK. > > Yes, but the point of the subsection is just to avoid breaking migration > format. Its a new creative use of the subsection. > >> It is the same as "is using masterclock", which is actually a stricter > >> condition than the KVM_CHECK_EXTENSION return value. The right check to > >> use is whether masterclock is in use, > > > > Actually its "has a reliable KVM_GET_CLOCK" (which returns > > get_kernel_clock() + (rdtsc() - tsc_timestamp), > > > > "broken KVM_GET_CLOCK" = get_kernel_clock() > > Yes. And these end up being the same. No. You can have masterclock enabled, KVM_TSC_STABLE set in pvclock flags, and unreliable KVM_GET_CLOCK (without your patch to KVM_GET_CLOCK). Paolo not sure if there is anything further you want me to change at this point ? Index: qemu-mig-advance-clock/hw/i386/kvm/clock.c =================================================================== --- qemu-mig-advance-clock.orig/hw/i386/kvm/clock.c 2016-11-14 10:40:39.748116312 -0200 +++ qemu-mig-advance-clock/hw/i386/kvm/clock.c 2016-11-14 13:38:29.299955042 -0200 @@ -36,6 +36,12 @@ uint64_t clock; bool clock_valid; + + /* whether machine supports reliable KVM_GET_CLOCK */ + bool mach_use_reliable_get_clock; + + /* whether source host supported reliable KVM_GET_CLOCK */ + bool src_use_reliable_get_clock; } KVMClockState; struct pvclock_vcpu_time_info { @@ -81,6 +87,19 @@ return nsec + time.system_time; } +static uint64_t kvm_get_clock(void) +{ + struct kvm_clock_data data; + int ret; + + ret = kvm_vm_ioctl(kvm_state, KVM_GET_CLOCK, &data); + if (ret < 0) { + fprintf(stderr, "KVM_GET_CLOCK failed: %s\n", strerror(ret)); + abort(); + } + return data.clock; +} + static void kvmclock_vm_state_change(void *opaque, int running, RunState state) { @@ -91,15 +110,37 @@ if (running) { struct kvm_clock_data data = {}; - uint64_t time_at_migration = kvmclock_current_nsec(s); + uint64_t pvclock_via_mem = 0; - s->clock_valid = false; + /* local (running VM) restore */ + if (s->clock_valid) { + /* + * if host does not support reliable KVM_GET_CLOCK, + * read kvmclock value from memory + */ + if (!kvm_has_adjust_clock_stable()) { + pvclock_via_mem = kvmclock_current_nsec(s); + } + /* migration/savevm/init restore */ + } else { + /* + * use s->clock in case machine uses reliable + * get clock and source host supported + * reliable get clock + */ + if (!(s->mach_use_reliable_get_clock && + s->src_use_reliable_get_clock)) { + pvclock_via_mem = kvmclock_current_nsec(s); + } + } - /* We can't rely on the migrated clock value, just discard it */ - if (time_at_migration) { - s->clock = time_at_migration; + /* We can't rely on the saved clock value, just discard it */ + if (pvclock_via_mem) { + s->clock = pvclock_via_mem; } + s->clock_valid = false; + data.clock = s->clock; ret = kvm_vm_ioctl(kvm_state, KVM_SET_CLOCK, &data); if (ret < 0) { @@ -120,8 +161,6 @@ } } } else { - struct kvm_clock_data data; - int ret; if (s->clock_valid) { return; @@ -129,13 +168,7 @@ kvm_synchronize_all_tsc(); - ret = kvm_vm_ioctl(kvm_state, KVM_GET_CLOCK, &data); - if (ret < 0) { - fprintf(stderr, "KVM_GET_CLOCK failed: %s\n", strerror(ret)); - abort(); - } - s->clock = data.clock; - + s->clock = kvm_get_clock(); /* * If the VM is stopped, declare the clock state valid to * avoid re-reading it on next vmsave (which would return @@ -152,22 +185,69 @@ qemu_add_vm_change_state_handler(kvmclock_vm_state_change, s); } +static bool kvmclock_src_use_reliable_get_clock(void *opaque) +{ + KVMClockState *s = opaque; + + /* + * On machine types that support reliable KVM_GET_CLOCK, + * if host kernel does provide reliable KVM_GET_CLOCK, + * set src_use_reliable_get_clock=true so that destination + * avoids reading kvmclock from memory. + */ + if (s->mach_use_reliable_get_clock && kvm_has_adjust_clock_stable()) { + s->src_use_reliable_get_clock = true; + } + + return s->mach_use_reliable_get_clock; +} + +static const VMStateDescription kvmclock_reliable_get_clock = { + .name = "kvmclock/src_use_reliable_get_clock", + .version_id = 1, + .minimum_version_id = 1, + .needed = kvmclock_src_use_reliable_get_clock, + .fields = (VMStateField[]) { + VMSTATE_BOOL(src_use_reliable_get_clock, KVMClockState), + VMSTATE_END_OF_LIST() + } +}; + +static void kvmclock_pre_save(void *opaque) +{ + KVMClockState *s = opaque; + + s->clock = kvm_get_clock(); +} + static const VMStateDescription kvmclock_vmsd = { .name = "kvmclock", .version_id = 1, .minimum_version_id = 1, + .pre_save = kvmclock_pre_save, .fields = (VMStateField[]) { VMSTATE_UINT64(clock, KVMClockState), VMSTATE_END_OF_LIST() + }, + .subsections = (const VMStateDescription * []) { + &kvmclock_reliable_get_clock, + NULL } }; +static Property kvmclock_properties[] = { + DEFINE_PROP_BOOL("mach_use_reliable_get_clock", KVMClockState, + mach_use_reliable_get_clock, true), + DEFINE_PROP_END_OF_LIST(), +}; + static void kvmclock_class_init(ObjectClass *klass, void *data) { DeviceClass *dc = DEVICE_CLASS(klass); dc->realize = kvmclock_realize; dc->vmsd = &kvmclock_vmsd; + dc->props = kvmclock_properties; } static const TypeInfo kvmclock_info = { Index: qemu-mig-advance-clock/include/hw/i386/pc.h =================================================================== --- qemu-mig-advance-clock.orig/include/hw/i386/pc.h 2016-11-14 10:40:39.748116312 -0200 +++ qemu-mig-advance-clock/include/hw/i386/pc.h 2016-11-14 10:40:48.059128649 -0200 @@ -370,6 +370,11 @@ #define PC_COMPAT_2_7 \ HW_COMPAT_2_7 \ {\ + .driver = "kvmclock",\ + .property = "mach_use_reliable_get_clock",\ + .value = "off",\ + },\ + {\ .driver = TYPE_X86_CPU,\ .property = "l3-cache",\ .value = "off",\ Index: qemu-mig-advance-clock/target-i386/kvm.c =================================================================== --- qemu-mig-advance-clock.orig/target-i386/kvm.c 2016-11-14 10:40:39.750116314 -0200 +++ qemu-mig-advance-clock/target-i386/kvm.c 2016-11-14 10:40:48.061128652 -0200 @@ -117,6 +117,13 @@ return kvm_check_extension(kvm_state, KVM_CAP_X86_SMM); } +bool kvm_has_adjust_clock_stable(void) +{ + int ret = kvm_check_extension(kvm_state, KVM_CAP_ADJUST_CLOCK); + + return (ret == KVM_CLOCK_TSC_STABLE); +} + bool kvm_allows_irq0_override(void) { return !kvm_irqchip_in_kernel() || kvm_has_gsi_routing(); Index: qemu-mig-advance-clock/target-i386/kvm_i386.h =================================================================== --- qemu-mig-advance-clock.orig/target-i386/kvm_i386.h 2016-11-14 10:40:39.751116316 -0200 +++ qemu-mig-advance-clock/target-i386/kvm_i386.h 2016-11-14 10:40:48.061128652 -0200 @@ -17,6 +17,7 @@ bool kvm_allows_irq0_override(void); bool kvm_has_smm(void); +bool kvm_has_adjust_clock_stable(void); void kvm_synchronize_all_tsc(void); void kvm_arch_reset_vcpu(X86CPU *cs); void kvm_arch_do_init_vcpu(X86CPU *cs); -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html