Hi, Thanks for reply. Well, although there is plenty of RAM left (about 100MB), some swap space was used during the operation: Mem: 8193472k total, 8089788k used, 103684k free, 5768k buffers Swap: 11716412k total, 36636k used, 11679776k free, 103112k cached I am not sure why, though. Are you saying that there are bursts of memory usage that push some pages to swap and they are not unswapped although used? I will try to replicate the problem now and send you some better printout from the moment the problem happens. I have not noticed anything unusual when I was watching the system - there was plenty of RAM free and a few megabytes in swap... Is there any kind of check I can try during the problem occurring? Or should I free 50-100MB from hugepages and the system shall be stable again? Thanks, Dmitry On Sat, Oct 2, 2010 at 1:30 AM, Marcelo Tosatti <mtosatti@xxxxxxxxxx> wrote: > On Thu, Sep 30, 2010 at 12:07:15PM +0300, Dmitry Golubev wrote: >> Hi, >> >> I am not sure what's really happening, but every few hours >> (unpredictable) two virtual machines (Linux 2.6.32) start to generate >> huge cpu loads. It looks like some kind of loop is unable to complete >> or something... >> >> So the idea is: >> >> 1. I have two linux 2.6.32 x64 (openvz, proxmox project) guests >> running on linux 2.6.35 x64 (ubuntu maverick) host with a Q6600 >> Core2Quad on qemu-kvm 0.12.5 and libvirt 0.8.3 and another one small >> 32bit linux virtual machine (16MB of ram) with a router inside (i >> doubt it contributes to the problem). >> >> 2. All these machines use hufetlbfs. The server has 8GB of RAM, I >> reserved 3696 huge pages (page size is 2MB) on the server, and I am >> running the main guests each having 3550MB of virtual memory. The >> third guest, as I wrote before, takes 16MB of virtual memory. >> >> 3. Once run, the guests reserve huge pages for themselves normally. As >> mem-prealloc is default, they grab all the memory they should have, >> leaving 6 pages unreserved (HugePages_Free - HugePages_Rsvd = 6) all >> times - so as I understand they should not want to get any more, >> right? >> >> 4. All virtual machines run perfectly normal without any disturbances >> for few hours. They do not, however, use all their memory, so maybe >> the issue arises when they pass some kind of a threshold. >> >> 5. At some point of time both guests exhibit cpu load over the top >> (16-24). At the same time, host works perfectly well, showing load of >> 8 and that both kvm processes use CPU equally and fully. This point of >> time is unpredictable - it can be anything from one to twenty hours, >> but it will be less than a day. Sometimes the load disappears in a >> moment, but usually it stays like that, and everything works extremely >> slow (even a 'ps' command executes some 2-5 minutes). >> >> 6. If I am patient, I can start rebooting the gueat systems - once >> they have restarted, everything returns to normal. If I destroy one of >> the guests (virsh destroy), the other one starts working normally at >> once (!). >> >> I am relatively new to kvm and I am absolutely lost here. I have not >> experienced such problems before, but recently I upgraded from ubuntu >> lucid (I think it was linux 2.6.32, qemukvm 0.12.3 and libvirt 0.7.5) >> and started to use hugepages. These two virtual machines are not >> normally run on the same host system (i have a corosync/pacemaker >> cluster with drbd storage), but when one of the hosts is not >> abailable, they start running on the same host. That is the reason I >> have not noticed this earlier. >> >> Unfortunately, I don't have any spare hardware to experiment and this >> is a production system, so my debugging options are rather limited. >> >> Do you have any ideas, what could be wrong? > > Is there swapping activity on the host when this happens? > > -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html