On Wed, Jan 30, 2013 at 11:21:08AM +0300, Andrey Korolyov wrote: > On Wed, Jan 30, 2013 at 3:15 AM, Marcelo Tosatti <mtosatti@xxxxxxxxxx> wrote: > > On Tue, Jan 29, 2013 at 02:35:02AM +0300, Andrey Korolyov wrote: > >> On Mon, Jan 28, 2013 at 5:56 PM, Andrey Korolyov <andrey@xxxxxxx> wrote: > >> > On Mon, Jan 28, 2013 at 3:14 AM, Marcelo Tosatti <mtosatti@xxxxxxxxxx> wrote: > >> >> On Mon, Jan 28, 2013 at 12:04:50AM +0300, Andrey Korolyov wrote: > >> >>> On Sat, Jan 26, 2013 at 12:49 AM, Marcelo Tosatti <mtosatti@xxxxxxxxxx> wrote: > >> >>> > On Fri, Jan 25, 2013 at 10:45:02AM +0300, Andrey Korolyov wrote: > >> >>> >> On Thu, Jan 24, 2013 at 4:20 PM, Marcelo Tosatti <mtosatti@xxxxxxxxxx> wrote: > >> >>> >> > On Thu, Jan 24, 2013 at 01:54:03PM +0300, Andrey Korolyov wrote: > >> >>> >> >> Thank you Marcelo, > >> >>> >> >> > >> >>> >> >> Host node locking up sometimes later than yesterday, bur problem still > >> >>> >> >> here, please see attached dmesg. Stuck process looks like > >> >>> >> >> root 19251 0.0 0.0 228476 12488 ? D 14:42 0:00 > >> >>> >> >> /usr/bin/kvm -no-user-config -device ? -device pci-assign,? -device > >> >>> >> >> virtio-blk-pci,? -device > >> >>> >> >> > >> >>> >> >> on fourth vm by count. > >> >>> >> >> > >> >>> >> >> Should I try upstream kernel instead of applying patch to the latest > >> >>> >> >> 3.4 or it is useless? > >> >>> >> > > >> >>> >> > If you can upgrade to an upstream kernel, please do that. > >> >>> >> > > >> >>> >> > >> >>> >> With vanilla 3.7.4 there is almost no changes, and NMI started firing > >> >>> >> again. External symptoms looks like following: starting from some > >> >>> >> count, may be third or sixth vm, qemu-kvm process allocating its > >> >>> >> memory very slowly and by jumps, 20M-200M-700M-1.6G in minutes. Patch > >> >>> >> helps, of course - on both patched 3.4 and vanilla 3.7 I`m able to > >> >>> >> kill stuck kvm processes and node returned back to the normal, when on > >> >>> >> 3.2 sending SIGKILL to the process causing zombies and hanged ``ps'' > >> >>> >> output (problem and workaround when no scheduler involved described > >> >>> >> here http://www.spinics.net/lists/kvm/msg84799.html). > >> >>> > > >> >>> > Try disabling pause loop exiting with ple_gap=0 kvm-intel.ko module parameter. > >> >>> > > >> >>> > >> >>> Hi Marcelo, > >> >>> > >> >>> thanks, this parameter helped to increase number of working VMs in a > >> >>> half of order of magnitude, from 3-4 to 10-15. Very high SY load, 10 > >> >>> to 15 percents, persists on such numbers for a long time, where linux > >> >>> guests in same configuration do not jump over one percent even under > >> >>> stress bench. After I disabled HT, crash happens only in long runs and > >> >>> now it is kernel panic :) > >> >>> Stair-like memory allocation behaviour disappeared, but other symptom > >> >>> leading to the crash which I have not counted previously, persists: if > >> >>> VM count is ``enough'' for crash, some qemu processes starting to eat > >> >>> one core, and they`ll panic system after run in tens of minutes in > >> >>> such state or if I try to attach debugger to one of them. If needed, I > >> >>> can log entire crash output via netconsole, now I have some tail, > >> >>> almost the same every time: > >> >>> http://xdel.ru/downloads/btwin.png > >> >> > >> >> Yes, please log entire crash output, thanks. > >> >> > >> > > >> > Here please, 3.7.4-vanilla, 16 vms, ple_gap=0: > >> > > >> > http://xdel.ru/downloads/oops-default-kvmintel.txt > >> > >> Just an update: I was able to reproduce that on pure linux VMs using > >> qemu-1.3.0 and ``stress'' benchmark running on them - panic occurs at > >> start of vm(with count ten working machines at the moment). Qemu-1.1.2 > >> generally is not able to reproduce that, but host node with older > >> version crashing on less amount of Windows VMs(three to six instead > >> ten to fifteen) than with 1.3, please see trace below: > >> > >> http://xdel.ru/downloads/oops-old-qemu.txt > > > > Single bit memory error, apparently. Try: > > > > 1. memtest86. > > 2. Boot with slub_debug=ZFPU kernel parameter. > > 3. Reproduce on different machine > > > > > > Hi Marcelo, > > I always follow the rule - if some weird bug exists, check it on > ECC-enabled machine and check IPMI logs too before start complaining > :) I have finally managed to ``fix'' the problem, but my solution > seems a bit strange: > - I have noticed that if virtual machines started without any cgroup > setting they will not cause this bug under any conditions, > - I have thought, very wrong in my mind, that the > CONFIG_SCHED_AUTOGROUP should regroup the tasks without any cgroup and > should not touch tasks already inside any existing cpu cgroup. First > sight on the 200-line patch shows that the autogrouping always applies > to all tasks, so I tried to disable it, > - wild magic appears - VMs didn`t crashed host any more, even in count > 30+ they work fine. > I still don`t know what exactly triggered that and will I face it > again under different conditions, so my solution more likely to be a > patch of mud in wall of the dam, instead of proper fixing. > > There seems to be two possible origins of such error - a very very > hideous race condition involving cgroups and processes like qemu-kvm > causing frequent context switches and simple incompatibility between > NUMA, logic of CONFIG_SCHED_AUTOGROUP and qemu VMs already doing work > in the cgroup, since I have not observed this errors on single numa > node(mean, desktop) on relatively heavier condition. Yes, it would be important to track it down though. Enabling slub_debug=ZFPU kernel parameter should help. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html