On Sat, Jan 26, 2013 at 12:49 AM, Marcelo Tosatti <mtosatti@xxxxxxxxxx> wrote: > On Fri, Jan 25, 2013 at 10:45:02AM +0300, Andrey Korolyov wrote: >> On Thu, Jan 24, 2013 at 4:20 PM, Marcelo Tosatti <mtosatti@xxxxxxxxxx> wrote: >> > On Thu, Jan 24, 2013 at 01:54:03PM +0300, Andrey Korolyov wrote: >> >> Thank you Marcelo, >> >> >> >> Host node locking up sometimes later than yesterday, bur problem still >> >> here, please see attached dmesg. Stuck process looks like >> >> root 19251 0.0 0.0 228476 12488 ? D 14:42 0:00 >> >> /usr/bin/kvm -no-user-config -device ? -device pci-assign,? -device >> >> virtio-blk-pci,? -device >> >> >> >> on fourth vm by count. >> >> >> >> Should I try upstream kernel instead of applying patch to the latest >> >> 3.4 or it is useless? >> > >> > If you can upgrade to an upstream kernel, please do that. >> > >> >> With vanilla 3.7.4 there is almost no changes, and NMI started firing >> again. External symptoms looks like following: starting from some >> count, may be third or sixth vm, qemu-kvm process allocating its >> memory very slowly and by jumps, 20M-200M-700M-1.6G in minutes. Patch >> helps, of course - on both patched 3.4 and vanilla 3.7 I`m able to >> kill stuck kvm processes and node returned back to the normal, when on >> 3.2 sending SIGKILL to the process causing zombies and hanged ``ps'' >> output (problem and workaround when no scheduler involved described >> here http://www.spinics.net/lists/kvm/msg84799.html). > > Try disabling pause loop exiting with ple_gap=0 kvm-intel.ko module parameter. > Hi Marcelo, thanks, this parameter helped to increase number of working VMs in a half of order of magnitude, from 3-4 to 10-15. Very high SY load, 10 to 15 percents, persists on such numbers for a long time, where linux guests in same configuration do not jump over one percent even under stress bench. After I disabled HT, crash happens only in long runs and now it is kernel panic :) Stair-like memory allocation behaviour disappeared, but other symptom leading to the crash which I have not counted previously, persists: if VM count is ``enough'' for crash, some qemu processes starting to eat one core, and they`ll panic system after run in tens of minutes in such state or if I try to attach debugger to one of them. If needed, I can log entire crash output via netconsole, now I have some tail, almost the same every time: http://xdel.ru/downloads/btwin.png -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html