On 2011-12-12 14:37, Vasilis Liaskovitis wrote: > Hotplugging a vCPU with kvmclock enabled can cause a guest stall/hang. When > the stall happens, pvclock_clocksource_read() is called for the new vCPU and > pvclock_get_nsec_offset calculates native_read_tsc() - shadow->tsc_timestamp. > shadow->tsc_timestamp contains a value larger than native_read_tsc(), so the > result is a very large 64-bit unsigned value. The global tsc variable > last_value gets updated with this, causing system stall/freeze: > "rcu_sched_state detected stalls on CPUs/tasks ..." > > The large shadow->tsc_timestamp value observed in the hanged cases is the tsc > written into the "boot clock" on VM startup. > Is the "boot clock" persistent in the guest? Can it get accessed by a vCPU > other than vCPU 0, if its own hv_clock struct has not yet been registered > or if the host has not yet updated the new hv_clock with a valid tsc_timestamp > in kvm_guest_time_update() ? > > Fix temporarily by returning a zero offset if the delta in > pvclock_get_nsec_offset() is negative. > > Tested on 3.0.6 guest kernel. Testing this patch requires qemu-kvm from: > git://git.kiszka.org/qemu-kvm.git queues/cpu-hotplug > Fixing up Glommer's address (in case he has time) and adding Zach to CC. > --- > arch/x86/kernel/pvclock.c | 11 ++++++++--- > 1 files changed, 8 insertions(+), 3 deletions(-) > > diff --git a/arch/x86/kernel/pvclock.c b/arch/x86/kernel/pvclock.c > index 42eb330..9d31144 100644 > --- a/arch/x86/kernel/pvclock.c > +++ b/arch/x86/kernel/pvclock.c > @@ -43,9 +43,14 @@ void pvclock_set_flags(u8 flags) > > static u64 pvclock_get_nsec_offset(struct pvclock_shadow_time *shadow) > { > - u64 delta = native_read_tsc() - shadow->tsc_timestamp; > - return pvclock_scale_delta(delta, shadow->tsc_to_nsec_mul, > - shadow->tsc_shift); > + u64 current_read_tsc = native_read_tsc(); > + if (current_read_tsc > shadow->tsc_timestamp) { > + u64 delta = current_read_tsc - shadow->tsc_timestamp; > + return pvclock_scale_delta(delta, shadow->tsc_to_nsec_mul, > + shadow->tsc_shift); > + } > + /* tsc value can be smaller than tsc_timestamp on a vCPU hotplug */ > + else return 0; > } > > /* Can't comment on the semantics, but your patch is whitespace damaged and doesn't follow kernel coding style. But I assume it's not for application yet, right? Would be cool if we find a fix the kvmclock hotplug issue. There are some good patches on the way to finally make this a proper upstream feature. Jan -- Siemens AG, Corporate Technology, CT T DE IT 1 Corporate Competence Center Embedded Linux -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html