On 2010-08-02 22:26, Zachary Amsden wrote: > On 08/02/2010 04:43 AM, Glauber Costa wrote: >> On Sat, Jul 31, 2010 at 01:55:10PM -1000, Zachary Amsden wrote: >> >>> On 07/31/2010 06:36 AM, Arjan Koers wrote: >>> >>>> On 2010-07-31 13:53, Arjan Koers wrote: >>>> >>>>> The kernel boots successfully when CONFIG_PRINTK_TIME is not set. >>>>> >>>>> >>>> The problem occurs when this message is printed: >>>> >>>> [ 0.016000] kvm-clock: cpu 1, msr 0:1511c01, secondary cpu clock >>>> >>>> When I disable that printk, the kernel boots with >>>> CONFIG_PRINTK_TIME=y >>>> >>>> --- a/arch/x86/kernel/kvmclock.c >>>> +++ b/arch/x86/kernel/kvmclock.c >>>> @@ -131,8 +131,8 @@ static int kvm_register_clock(char *txt) >>>> int low, high; >>>> low = (int)__pa(&per_cpu(hv_clock, cpu)) | 1; >>>> high = ((u64)__pa(&per_cpu(hv_clock, cpu))>> 32); >>>> - printk(KERN_INFO "kvm-clock: cpu %d, msr %x:%x, %s\n", >>>> - cpu, high, low, txt); >>>> + /*printk(KERN_INFO "kvm-clock: cpu %d, msr %x:%x, %s\n", >>>> + cpu, high, low, txt);*/ >>>> >>>> return native_write_msr_safe(msr_kvm_system_time, low, high); >>>> } >>>> >>>> So the problem appears to be that the clock of the second CPU >>>> is used too soon (or that clock setup should finish earlier). >>>> >>> That's almost hilarious. The printk from setting up the kvm clock >>> is invoking the kvm clock before it is setup. >>> >>> There's no reason other printks couldn't do the same thing, however. >>> I think it's safest to keep an initialized flag and check for it >>> before attempting to return a meaningful value. >>> >> I was on vacations, just got back. >> >> I think it is safe to just patch our own use of it. Before that, all >> other >> printks will be handled by the main cpu anyway, since it'll be the >> only one active >> at the moment. The only possible offenders for this are us, and the >> cpu initialization >> code, which is already fragile in multiple ways anyway. >> >> A flag would only make things more complicated and dirty >> > Can we just do this? Sorry, the patch doesn't help. See line 68 in my debug log: 65: ffff880001411c00 1b68905d7 156558001892 6e10a 1b6c0631e 375d47 13c5ce 15655813de60 66: ffff880001411c00 1b68905d7 156558001892 6e10a 1b6c0653b 375f64 13c68f 15655813df21 67: ffff880001411c00 1b68905d7 156558001892 6e10a 1b6c06746 37616f 13c74a 15655813dfdc 68: ffff880001511c00 1967ac192 15654c8d826a 63c6c 3bf58bf0ea18 3bf3f5762886 15695466a1e5 2acea0f4244f 69: ffff880001411c00 1b6f3fbda 156558264b1e 6e10e 1b6f424aa 28d0 e93 1565582659b1 70: ffff880001411c00 1b6f3fbda 156558264b1e 6e10e 1b6f4a1e0 a606 3b4b 156558268669 71: ffff880001411c00 1b6f3fbda 156558264b1e 6e10e 1b6f4ba63 be89 440b 156558268f29 72: ffff880001411c00 1b6f3fbda 156558264b1e 6e10e 1b6f4d8e7 dd0d 4ef1 156558269a0f 73: ffff880001511c00 3bf58bf16356 15655825e74b 40496 3bf58bf4d52c 371d6 13aef 15655827223a 74: ffff880001511c00 3bf58bf16356 15655825e74b 40496 3bf58bf4ebec 38896 1430f 156558272a5a I don't think that pvclock_clocksource_read is receiving completely random uninitialized data. The values in shadow are wrong, but could be interpreted as valid data (shadow.tsc_to_nsec_mul = b6dc43b6, shadow.tsc_shift = ffffffff, shadow.flags = 0 and shadow.version is always even). I've attached the printk patches for 2.6.34.1 and 2.6.35, in case anyone needs them...
Move a printk that's using the clock before it's ready Fix a hang during SMP kernel boot on KVM that showed up after commit 489fb490dbf8dab0249ad82b56688ae3842a79e8 (2.6.35) and 59aab522154a2f17b25335b63c1cf68a51fb6ae0 (2.6.34.1). The problem only occurs when CONFIG_PRINTK_TIME is set. Signed-off-by: Arjan Koers <0h61vkll2ly8@xxxxxxxxxx> diff --git a/arch/x86/kernel/kvmclock.c b/arch/x86/kernel/kvmclock.c index feaeb0d..71bf2df 100644 --- a/arch/x86/kernel/kvmclock.c +++ b/arch/x86/kernel/kvmclock.c @@ -125,12 +125,15 @@ static struct clocksource kvm_clock = { static int kvm_register_clock(char *txt) { int cpu = smp_processor_id(); - int low, high; + int low, high, ret; + low = (int)__pa(&per_cpu(hv_clock, cpu)) | 1; high = ((u64)__pa(&per_cpu(hv_clock, cpu)) >> 32); + ret = native_write_msr_safe(MSR_KVM_SYSTEM_TIME, low, high); printk(KERN_INFO "kvm-clock: cpu %d, msr %x:%x, %s\n", cpu, high, low, txt); - return native_write_msr_safe(MSR_KVM_SYSTEM_TIME, low, high); + + return ret; } #ifdef CONFIG_X86_LOCAL_APIC
Move a printk that's using the clock before it's ready Fix a hang during SMP kernel boot on KVM that showed up after commit 489fb490dbf8dab0249ad82b56688ae3842a79e8 (2.6.35) and 59aab522154a2f17b25335b63c1cf68a51fb6ae0 (2.6.34.1). The problem only occurs when CONFIG_PRINTK_TIME is set. Signed-off-by: Arjan Koers <0h61vkll2ly8@xxxxxxxxxx> diff --git a/arch/x86/kernel/kvmclock.c b/arch/x86/kernel/kvmclock.c index eb9b76c..ca43ce3 100644 --- a/arch/x86/kernel/kvmclock.c +++ b/arch/x86/kernel/kvmclock.c @@ -128,13 +128,15 @@ static struct clocksource kvm_clock = { static int kvm_register_clock(char *txt) { int cpu = smp_processor_id(); - int low, high; + int low, high, ret; + low = (int)__pa(&per_cpu(hv_clock, cpu)) | 1; high = ((u64)__pa(&per_cpu(hv_clock, cpu)) >> 32); + ret = native_write_msr_safe(msr_kvm_system_time, low, high); printk(KERN_INFO "kvm-clock: cpu %d, msr %x:%x, %s\n", cpu, high, low, txt); - return native_write_msr_safe(msr_kvm_system_time, low, high); + return ret; } #ifdef CONFIG_X86_LOCAL_APIC