On Mon, Dec 12, 2011 at 02:37:15PM +0100, Vasilis Liaskovitis wrote: > Hotplugging a vCPU with kvmclock enabled can cause a guest stall/hang. When > the stall happens, pvclock_clocksource_read() is called for the new vCPU and > pvclock_get_nsec_offset calculates native_read_tsc() - shadow->tsc_timestamp. > shadow->tsc_timestamp contains a value larger than native_read_tsc(), so the > result is a very large 64-bit unsigned value. The global tsc variable > last_value gets updated with this, causing system stall/freeze: > "rcu_sched_state detected stalls on CPUs/tasks ..." > > The large shadow->tsc_timestamp value observed in the hanged cases is the tsc > written into the "boot clock" on VM startup. > Is the "boot clock" persistent in the guest? Can it get accessed by a vCPU > other than vCPU 0, if its own hv_clock struct has not yet been registered > or if the host has not yet updated the new hv_clock with a valid tsc_timestamp > in kvm_guest_time_update() ? When a CPU is hotplugged it'll have its TSC start counting at 0. We should cope with that fact and fix this bug in the boot clock handling. >From the guests perspective, shadow->tsc_timestamp should be updated to reflect the current vcpu (which is not the case when its reading the value from the boot clock). That said, i am not sure what is the best path to fix this, but the workaround below is ugly. > > Fix temporarily by returning a zero offset if the delta in > pvclock_get_nsec_offset() is negative. > > Tested on 3.0.6 guest kernel. Testing this patch requires qemu-kvm from: > git://git.kiszka.org/qemu-kvm.git queues/cpu-hotplug > > --- > arch/x86/kernel/pvclock.c | 11 ++++++++--- > 1 files changed, 8 insertions(+), 3 deletions(-) > > diff --git a/arch/x86/kernel/pvclock.c b/arch/x86/kernel/pvclock.c > index 42eb330..9d31144 100644 > --- a/arch/x86/kernel/pvclock.c > +++ b/arch/x86/kernel/pvclock.c > @@ -43,9 +43,14 @@ void pvclock_set_flags(u8 flags) > > static u64 pvclock_get_nsec_offset(struct pvclock_shadow_time *shadow) > { > - u64 delta = native_read_tsc() - shadow->tsc_timestamp; > - return pvclock_scale_delta(delta, shadow->tsc_to_nsec_mul, > - shadow->tsc_shift); > + u64 current_read_tsc = native_read_tsc(); > + if (current_read_tsc > shadow->tsc_timestamp) { > + u64 delta = current_read_tsc - shadow->tsc_timestamp; > + return pvclock_scale_delta(delta, shadow->tsc_to_nsec_mul, > + shadow->tsc_shift); > + } > + /* tsc value can be smaller than tsc_timestamp on a vCPU hotplug */ > + else return 0; > } > > /* > -- > 1.7.7.3 > > -- > To unsubscribe from this list: send the line "unsubscribe kvm" in > the body of a message to majordomo@xxxxxxxxxxxxxxx > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html