KVM with hugepages generate huge load with two guests

Dmitry Golubev <lastguru@xxxxxxxxx> · Thu, 30 Sep 2010 12:07:15 +0300

Hi,

I am not sure what's really happening, but every few hours
(unpredictable) two virtual machines (Linux 2.6.32) start to generate
huge cpu loads. It looks like some kind of loop is unable to complete
or something...

So the idea is:

1. I have two linux 2.6.32 x64 (openvz, proxmox project) guests
running on linux 2.6.35 x64 (ubuntu maverick) host with a Q6600
Core2Quad on qemu-kvm 0.12.5 and libvirt 0.8.3 and another one small
32bit linux virtual machine (16MB of ram) with a router inside (i
doubt it contributes to the problem).

2. All these machines use hufetlbfs. The server has 8GB of RAM, I
reserved 3696 huge pages (page size is 2MB) on the server, and I am
running the main guests each having 3550MB of virtual memory. The
third guest, as I wrote before, takes 16MB of virtual memory.

3. Once run, the guests reserve huge pages for themselves normally. As
mem-prealloc is default, they grab all the memory they should have,
leaving 6 pages unreserved (HugePages_Free - HugePages_Rsvd = 6) all
times - so as I understand they should not want to get any more,
right?

4. All virtual machines run perfectly normal without any disturbances
for few hours. They do not, however, use all their memory, so maybe
the issue arises when they pass some kind of a threshold.

5. At some point of time both guests exhibit cpu load over the top
(16-24). At the same time, host works perfectly well, showing load of
8 and that both kvm processes use CPU equally and fully. This point of
time is unpredictable - it can be anything from one to twenty hours,
but it will be less than a day. Sometimes the load disappears in a
moment, but usually it stays like that, and everything works extremely
slow (even a 'ps' command executes some 2-5 minutes).

6. If I am patient, I can start rebooting the gueat systems - once
they have restarted, everything returns to normal. If I destroy one of
the guests (virsh destroy), the other one starts working normally at
once (!).

I am relatively new to kvm and I am absolutely lost here. I have not
experienced such problems before, but recently I upgraded from ubuntu
lucid (I think it was linux 2.6.32, qemukvm 0.12.3 and libvirt 0.7.5)
and started to use hugepages. These two virtual machines are not
normally run on the same host system (i have a corosync/pacemaker
cluster with drbd storage), but when one of the hosts is not
abailable, they start running on the same host. That is the reason I
have not noticed this earlier.

Unfortunately, I don't have any spare hardware to experiment and this
is a production system, so my debugging options are rather limited.

Do you have any ideas, what could be wrong?

Thanks,
Dmitry
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html