On Thu, Oct 08, 2009 at 06:06:39PM +0400, Michael Tokarev wrote: > Thomas Gleixner wrote: >> On Thu, 8 Oct 2009, Michael Tokarev wrote: >> >>> Thomas Gleixner wrote: >>>> On Thu, 8 Oct 2009, Michael Tokarev wrote: >>>>> Yesterday I was "lucky" enough to actually watch what's >>>>> going on when the delay actually happens. >>>>> >>>>> I run desktop environment on a kvm virtual machine here. >>>>> The server is on diskless terminal, and the rest, incl. >>>>> the window manager etc, is started from a VM. >>>>> >>>>> And yesterday, during normal system load (nothing extra, >>>>> and not idle either, and all the other guests were running >>>>> under normal load too), I had a stall of everyhing on this >>>>> X session for about 2..3, maybe 5 secounds. >>>>> >>>>> It felt like completely stuck machine. Nothing were moving >>>>> on the screen, no reaction to the keyboard etc. >>>>> >>>>> And after several seconds it returned to normal. With >>>>> the familiar message in dmesg -- increasing hrtimer etc, >>>>> to the next 50%. (Without a patch from Marcelo at this >>>>> time it shuold increase min_delta to a large number). >>>>> >>>>> To summarize: there's something, well, more interesting >>>>> going on here. In addition to the scheduling issues that >>>>> causes timers to be calculated on the "wrong" CPU etc as >>>> Care to elaborate ? >>> Such huge delays (in terms of seconds, not ms or ns) - I don't >>> understand how such delays can be explained by sheduling to the >>> different cpu etc. That's what I mean. I know very little about >>> all this low-level stuff so I may be completely out of context, >>> but such explanation does not look right to me, simple as that. >>> By "scheduling mistakes" we can get mistakes in range of millisecs, >>> but not secs. >> >> I'm really missing the big picture here. >> >> What means "causes timers to be calculated on the "wrong" CPU etc" ? >> And what do you consider a "scheduling mistake" ? > > From the initial diagnostics by Marcelo: > > > It seems the way hrtimer_interrupt_hanging calculates min_delta is > > wrong (especially to virtual machines). The guest vcpu can be scheduled > > out during the execution of the hrtimer callbacks (and the callbacks > > themselves can do operations that translate to blocking operations in > > the hypervisor). > > > > So high min_delta values can be calculated if, for example, a single > > hrtimer_interrupt run takes two host time slices to execute, while some > > other higher priority task runs for N slices in between. > > From this I conclude that the huge min_delta is due to some other task(s) > on the host being run while this guest is in hrtimer callback. But I > fail to see why that process on the host takes SO MUCH time, to warrant > resulting min_delta to 0.5s, or to cause delays for 3..5 seconds in > guest. It's ok to have delays in range of several extra milliseconds, > but for *seconds* is too much. > > Note again that neither host nor guest are not under high load when > this jump happens. Also note that there's no high-priority processes > running on the host, all are of the same priority level, including > all the guests. > > Note also that so far I only see it on SMP guests, never on UP > guests. And only on guests with kvm_clock, not with acpi_pm > clocksource. > > What I'm trying to say is that it looks like there's something > else wrong here in the guest code. Huge stalls, huge delays > while in hrtimer callback (i think it jappens always when such > delay is happening, it's just noticed by hrtimer code) -- that's > the root cause of all this, (probably) wrong logic in hrtimer > calibration just shows the results of something that's wrong > elsewhere. True. Would be useful to collect sar (sar -B -b -u) output every one second in both host/guest. You already mentioned load was low, but this should give more details. Was there swapping going on? -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html