On Sat, Oct 10, 2009 at 02:34:22AM +0400, Michael Tokarev wrote: > Frederic Weisbecker wrote: >> On Sat, Oct 10, 2009 at 01:22:16AM +0400, Michael Tokarev wrote: >>> Marcelo Tosatti wrote: >>> [snip] >>> >>>> Would be useful to collect sar (sar -B -b -u) output every one second >>>> in both host/guest. You already mentioned load was low, but this should >>>> give more details. >>> Here we go: http://www.corpit.ru/mjt/hrtimer-interrupt-too-slow/ >>> >>> Two incindents - cases when hrtimer: interrupt is too slow were >>> reported in the guest (with Marcelo's patch so that min_delta is >>> increased to 50% only), happened at 11:44:48 and 21:46:56 (as >>> shown in guest-dmesg file). For both, there's `sar -BWbd' output >>> for a 2-minute interval (starting one minute before the delay >>> and ending one minute after) from both host and guest. >>> >>>> Was there swapping going on? >>> Not as far as I can see, and sar output agrees. >> >> But I can read this from you guest traces: >> >> pgpgin/s pgpgout/s fault/s majflt/s pgfree/s pgscank/s pgscand/s pgsteal/s %vmeff >> >> 11:44:45 0.00 32.32 174.75 0.00 176.77 0.00 0.00 0.00 0.00 >> 11:44:46 0.00 16.00 789.00 0.00 323.00 0.00 0.00 0.00 0.00 >> 11:44:47 0.00 32.32 907.07 0.00 277.78 0.00 0.00 0.00 0.00 >> 11:44:48 27.59 22.99 44.83 0.00 150.57 0.00 0.00 0.00 0.00 >> 11:44:49 0.00 33.68 22.11 0.00 218.95 0.00 0.00 0.00 0.00 >> 11:44:50 0.00 101.01 17.17 0.00 151.52 0.00 0.00 0.00 0.00 >> 11:44:51 0.00 15.69 16.67 0.00 126.47 0.00 0.00 0.00 0.00 >> [...] >> 21:46:52 0.00 40.00 17.00 0.00 82.00 0.00 0.00 0.00 0.00 >> 21:46:53 0.00 31.68 18.81 0.00 94.06 0.00 0.00 0.00 0.00 >> 21:46:54 0.00 31.68 16.83 0.00 90.10 0.00 0.00 0.00 0.00 >> 21:46:55 0.00 108.00 17.00 0.00 89.00 0.00 0.00 0.00 0.00 >> 21:46:56 9.76 482.93 3890.24 0.00 439.02 0.00 0.00 0.00 0.00 >> 21:46:57 0.00 760.00 8627.00 0.00 1133.00 0.00 0.00 0.00 0.00 >> 21:46:58 0.00 84.85 2612.12 0.00 138.38 0.00 0.00 0.00 0.00 >> 21:46:59 0.00 16.00 17.00 0.00 90.00 0.00 0.00 0.00 0.00 >> 21:47:00 0.00 36.36 17.17 0.00 90.91 0.00 0.00 0.00 0.00 >> >> >> So it looks like there was some swapping in when the hrtimer (spuriously) >> hanged. > > It's not swapping. Swapping is in a separate table, with columns titled > pswpin/s and pswpout/s -- first table. > > On my home machine with no swap at all, 4gig memory and 2gig free, > pgpgin and pgpgout are increasing too. > > Also, while in the second case (21:46:56) there's actually some > noticeable activity (page faults at least), in first case that > activity is modest. Yes. It still could be scheduling though (but you mention the host is idle, you forgot cpu stats in sar). It might be a QEMU bug, can you share your qemu-kvm command line? There's always (guest) disk i/o when the hang happens. Also please give the latest hrtimer_interrupt patch a try. Thanks! -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html