Re: 2.6.35-rc1 regression with pvclock and smp guests

Arjan Koers <0h61vkll2ly8@xxxxxxxxxx> · Thu, 30 Sep 2010 20:49:44 +0200

On 2010-09-30 17:32, Zachary Amsden wrote:
> On 09/30/2010 05:12 AM, Michael Tokarev wrote:
>> 30.09.2010 17:54, Zachary Amsden wrote:
>> []
>>   
>>> The printk movement is just a bandaid patch, correct?  Anything which
>>> does printk before kvmclock is registered could trigger the same bug.
>>>      
>> Well, I'd not say it's just a bandaid patch, it's real bug -- either
>> we can read kvmclock (so it's initialized), or we don't touch it (at
>> least before registration).
>>    
> 
> Yes, that's the bug, but moving the printk doesn't fix that, it just
> hides it.

Correct. It's just luck that it works for my  64-bit 2.6.34.* and
2.6.35.* kernels. The working kernels will break when compiled to
print additional debug information.

Here's a partial boot log of 2.6.32.23 with smpboot.c compiled
with DEBUG define. I modified printk to display the CPU# (printk_cpu).
All lines on CPU 1 up to 0.136487 are using the invalid clock and
will cause the kernel to hang later (if I hadn't patched pvclock
to correct the clock backwards).
...
[0:    0.124221] Booting processor 1 APIC 0x1 ip 0x6000
[0:    0.124579] Setting warm reset code and vector.
[0:    0.124585] 1.
[0:    0.124587] 2.
[0:    0.124588] 3.
[0:    0.124601] Asserting INIT.
[0:    0.124613] Waiting for send to finish...
[0:    0.134490] Deasserting INIT.
[0:    0.134497] Waiting for send to finish...
[0:    0.134501] #startup loops: 2.
[0:    0.134503] Sending STARTUP #1.
[0:    0.134508] After apic_write.
[1:    0.008000] Initializing CPU#1
[1:    0.008000] CPU#1 (phys ID: 1) waiting for CALLOUT
[0:    0.134826] Startup point 1.
[0:    0.135133] Waiting for send to finish...
[0:    0.135340] Sending STARTUP #2.
[0:    0.135346] After apic_write.
[0:    0.135650] Startup point 1.
[0:    0.135651] Waiting for send to finish...
[0:    0.135858] After Startup.
[0:    0.135859] Before Callout 1.
[0:    0.135861] After Callout 1.
[1:    0.008000] CALLIN, before setup_local_APIC().
[1:    0.008000] Stack at about ffff88001f889f44
[1:    0.008000] CPU: L1 I Cache: 64K (64 bytes/line), D cache 64K (64 bytes/line)
[1:    0.008000] CPU: L2 Cache: 512K (64 bytes/line)
[1:    0.008000] kvm-clock: cpu 1, msr 0:1511601, secondary cpu clock
[0:    0.136461] OK.
[0:    0.136463] CPU1: AMD Athlon(tm) II X2 240 Processor stepping 02
[0:    0.136465] CPU has booted.
[0:    0.136488] Brought up 2 CPUs
[0:    0.136489] Boot done.
[0:    0.136490] Before bogomips.
[0:    0.136491] Total of 2 processors activated (11202.17 BogoMIPS).
[0:    0.136493] Before bogocount - setting activated=1.
[1:    0.136487] x86 PAT enabled: cpu 1, old 0x0, new 0x7010600070106
[0:    0.144007] CPU0 attaching sched-domain:
[0:    0.144010]  domain 0: span 0-1 level CPU
[0:    0.144012]   groups: 0 1
[0:    0.144016] CPU1 attaching sched-domain:
[0:    0.144018]  domain 0: span 0-1 level CPU
[0:    0.144020]   groups: 1 0
[0:    0.144219] NET: Registered protocol family 16
[0:    0.148091] PCI: Using configuration type 1 for base access
[0:    0.148451] PCI: Using configuration type 1 for extended access
[0:    0.148870] mtrr: your CPUs had inconsistent variable MTRR settings
[0:    0.148870] mtrr: your CPUs had inconsistent MTRRdefType settings
[0:    0.148870] mtrr: probably your BIOS does not setup all CPUs.
[0:    0.149185] mtrr: corrected configuration.
[0:    0.156112] bio: create slab <bio-0> at 0
[0:    0.156635] vgaarb: loaded
[0:    0.156635] PCI: Probing PCI hardware
[0:    0.156635] PCI: Probing PCI hardware (bus 00)
[0:    0.156635] pci 0000:00:01.1: reg 20 io port: [0xc000-0xc00f]
[0:    0.156773] pci 0000:00:01.3: quirk: region b000-b03f claimed by PIIX4 ACPI
[0:    0.160012] pci 0000:00:01.3: quirk: region b100-b10f claimed by PIIX4 SMB
[0:    0.163379] pci 0000:00:02.0: reg 10 32bit mmio pref: [0xf0000000-0xf1ffffff]
[0:    0.164660] pci 0000:00:02.0: reg 14 32bit mmio: [0xf2000000-0xf2000fff]
[0:    0.170537] pci 0000:00:03.0: reg 10 io port: [0xc020-0xc03f]
[0:    0.170629] pci 0000:00:03.0: reg 14 32bit mmio: [0xf2001000-0xf2001fff]
[0:    0.171037] pci 0000:00:04.0: reg 10 io port: [0xc040-0xc05f]
[0:    0.171373] pci 0000:00:05.0: reg 10 io port: [0xc080-0xc0bf]
[0:    0.172273] pci 0000:00:06.0: reg 10 io port: [0xc0c0-0xc0ff]
[0:    0.173099] vgaarb: device added: PCI:0000:00:02.0,decodes=io+mem,owns=io+mem,locks=none
[0:    0.176131] pci 0000:00:01.0: PIIX/ICH IRQ router [8086:7000]
[0:    0.177112] Switching to clocksource kvm-clock
[1:    0.181401] pci_bus 0000:00: resource 0 io:  [0x00-0xffff]
[1:    0.181412] pci_bus 0000:00: resource 1 mem: [0x000000-0xffffffffffffffff]
[1:    0.181825] NET: Registered protocol family 2
...

>>> Can you try with printk timing disabled and see if the bug disappears?
>>>      
>> Yes it disappears so far, at last I can't trigger it anymore, tried
>> numerous boots including the 2.6.35.6 32bit kernel (patched with the
>> printk registration patch!) which shows the prob in almost every boot.
>>    
> 
> So, looks like we need to do the real fix.

Your ideas to zero hv_clock or to use an initialized flag may be usable.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html