On Wed, Jun 1, 2011 at 6:49 PM, john stultz <johnstul@xxxxxxxxxx> wrote: > On Wed, 2011-06-01 at 18:31 -0600, Bjorn Helgaas wrote: >> On Wed, Jun 1, 2011 at 5:58 PM, john stultz <johnstul@xxxxxxxxxx> wrote: >> > My first instinct is "don't do that!" to whatever driver is disabling >> > irqs for so long. Do you know what's actually causing these long irq off >> > periods? >> > >> > I assume you're noticing this offset by seeing that CLOCK_REALTIME is >> > off from the RTC right after boot? How severe is this? The RTC read is >> > only second granular, so there's a fair amount of error (~1 second) >> > possible right at boot, so this then must be many seconds worth of lost >> > ticks to be noticeable, right? >> >> I'm using 2.6.34, so not really new. I think the major offender is >> kernel serial printk, which is done in polled mode. A *lot* of it, >> e.g., 30+ seconds' worth. > > CC'ing the linux-serial list and Alan for their thoughts: Does the > serial port is disabling irqs for 30+ seconds during boot seem at all > sane? > > I would think it would periodically enable irqs in the polling loop to > let things be processed, but maybe not? > >> I wonder if there's some reasonably clean >> way to resync with the RTC, say at the time we register a clocksource >> better than jiffies, or in clocksource_done_booting(), or something. > > I definitely think we need to address the long irq disable periods > instead of trying to hack around the issue in the timekeeping code. OK, I have a theory. Here are snippets from my dmesg log (this is from a current upstream kernel). The "BH" lines are instrumentation I added to read the RTC at a few places: Linux version 3.0.0-smp-DEV ... BH now rtc 1306992452 (start_kernel, before setup_arch) Printk 230K of numa=fake debug stuff (more than seems strictly necessary to me, but there it is). All this data goes into the log buffer, not to the UART, because the console hasn't been initialized yet. Command line: auto BOOT_IMAGE=300smpD ro root=/dev/hda1,/dev/sda1 ignore_loglevel console=ttyS0,115200n8 swiotlb=16000 numa=fake=16M BH now rtc 1306992457 (start_kernel, after mm init) BH btime rtc 1306992457 xt 1306992457 wtm -1306992457 (timekeeping_init) Snapshot btime and init xtime and wall_to_monotonic. So theoretically, we should see 1306992457 in /proc/stat btime. Console: colour dummy device 80x25 console [ttyS0] enabled Now ttyS0 is enabled, so we dump the log buffer to the UART. I think this happens in console_unlock(), with interrupts disabled for the whole buffer. BH now rtc 1306992481 xt 1306992459 wtm -1306992457 clocksource_register jiffies This RTC read is in clocksource_register(); note that xtime is now 22 seconds behind the RTC. The UART is running at 115200 baud, and 230K/(115200/10) = about 20 seconds, so this sort of matches the time I expect it to take to dump the buffer. HPET: 3 timers in total, 0 timers will be used for per-cpu timer hpet0: at MMIO 0xfed00000, IRQs 2, 8, 0 hpet0: 3 comparators, 64-bit 14.318180 MHz counter Switching to clocksource hpet Refined TSC clocksource calibration: 2400.084 MHz. Switching to clocksource tsc BH <set rtc 1306992488 xt 1306992465 wtm -1306992457 to 1306992488 BH settimeofday delta 23 BH set> rtc 1306992488 xt 1306992488 wtm -1306992480 This settimeofday() call happens when userland runs "hwclock --hctosys". We adjust xtime and wall_to_monotonic by 23 seconds. Finally, we read /proc/stat. Since we only re-synced the system clock to the RTC (we didn't sync to an external source), btime ought to be the same as it was when we collected it, i.e., 1306992457. But instead, we get this, which I assume is the same wall_to_monotonic we computed in settimeofday(), but rounded: btime 1306992479 Things that look interesting to me: - numa=fake prints an excessive amount of debug info! - We register the ttyS0 console before registering the jiffies clocksource. It seems like the jiffies clocksource ought to be registered about the same time we initialize xtime in timekeeping_init(). - Interrupts are disabled the entire time we dump the log buffer, which could be a very long time. Maybe we could restore/disable interrupts periodically there. Bjorn -- To unsubscribe from this list: send the line "unsubscribe linux-serial" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html