On Tue, 2009-01-20 at 13:56 +0100, Ingo Molnar wrote: > * Kevin Shanahan <kmshanah@xxxxxxxxxxx> wrote: > > > This suggests some sort of KVM-specific problem. Scheduler latencies > > > in the seconds that occur under normal load situations are noticed and > > > reported quickly - and there are no such open regressions currently. > > > > It at least suggests a problem with interaction between the scheduler > > and kvm, otherwise reverting that scheduler patch wouldn't have made the > > regression go away. > > the scheduler affects almost everything, so almost by definition a > scheduler change can tickle a race or other timing bug in just about any > code - and reverting that change in the scheduler can make the bug go > away. But yes, it could also be a genuine scheduler bug - that is always a > possibility. Okay, I understand. > Could you please run a cfs-debug-info.sh session on a CONFIG_SCHED_DEBUG=y > and CONFIG_SCHEDSTATS=y kernel, while you are experiencing those > latencies: > > http://people.redhat.com/mingo/cfs-scheduler/tools/cfs-debug-info.sh > > and post that (relatively large) somewhere, or send it as a reply after > bzip2 -9 compressing it? It will include a lot of information about the > delays your tasks are experiencing. Running it while the problem is occuring will be tricky, as it only lasts for a few seconds at a time. Is it going to be useful at all to just see those statistics if the system is running normally? I might need to modify the script a little. Am I right that everything above "gathering statistics..." is pretty much static information? I could run top, vmstat and cat /proc/sched_debug in a loop until the problem occurs and then trim it. Something like: while true; do date >> $FILE echo "-- top: --" >> $FILE top -H -c -b -d 1 -n 0.5 >> $FILE 2>/dev/null echo "-- vmstat: --" >> $FILE vmstat >> $FILE 2>/dev/null echo "-- sched_debug #$i: --" >> $FILE cat /proc/sched_debug >> $FILE 2>/dev/null done That should take a snapshot every half second or so. Regards, Kevin. P.S. Please keep kmshanah@xxxxxxxxxxxxxxxxx out of the CC list (it won't route properly anyway). I don't know how it got added - the only place it would have appeared was in the "revert" commit message when I was testing 2.6.28 with the commit I bisected down to removed. -- To unsubscribe from this list: send the line "unsubscribe kernel-testers" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html