Re: KVM with hugepages generate huge load with two guests

Dmitry Golubev <lastguru@xxxxxxxxx> · Sat, 2 Oct 2010 03:56:51 +0300

OK, I have repeated the problem. The two machines were working fine
for few hours without some services running (these would take up some
gigabyte additionally in total), I ran these services again and some
40 minutes later the problem reappeared (may be a coincidence, though,
but I don't think so). From top command output it looks like this:

top - 03:38:10 up 2 days, 20:08,  1 user,  load average: 9.60, 6.92, 5.36
Tasks: 143 total,   3 running, 140 sleeping,   0 stopped,   0 zombie
Cpu(s): 85.7%us,  4.2%sy,  0.0%ni,  0.0%id,  0.0%wa,  0.0%hi, 10.0%si,  0.0%st
Mem:   8193472k total,  8056700k used,   136772k free,     4912k buffers
Swap: 11716412k total,    64884k used, 11651528k free,    55640k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
21306 libvirt-  20   0 3781m  10m 2408 S  190  0.1  31:36.09 kvm
 4984 libvirt-  20   0 3771m  19m 1440 S  180  0.2 390:30.04 kvm

Comparing to the previous shot i sent before (that was taken few hours
ago), and you will not see much difference in my opinion.

Note that I have 8GB of RAM and totally both VMs take up 7GB. There is
nothing else running on the server, except the VMs and cluster
software (drbd, pacemaker etc). Right now the drbd sync process is
taking some cpu resources - that is why the libvirt processes do not
show as 200% (physically, it is a quad-core processor). Is almost 1GB
really not enough for KVM to support two 3.5GB guests? I see 136MB of
free memory right now - it is not even used...

Thanks,
Dmitry

On Sat, Oct 2, 2010 at 2:50 AM, Dmitry Golubev <lastguru@xxxxxxxxx> wrote:
> Hi,
>
> Thanks for reply. Well, although there is plenty of RAM left (about
> 100MB), some swap space was used during the operation:
>
> Mem:   8193472k total,  8089788k used,   103684k free,     5768k buffers
> Swap: 11716412k total,    36636k used, 11679776k free,   103112k cached
>
> I am not sure why, though. Are you saying that there are bursts of
> memory usage that push some pages to swap and they are not unswapped
> although used? I will try to replicate the problem now and send you
> some better printout from the moment the problem happens. I have not
> noticed anything unusual when I was watching the system - there was
> plenty of RAM free and a few megabytes in swap... Is there any kind of
> check I can try during the problem occurring? Or should I free
> 50-100MB from hugepages and the system shall be stable again?
>
> Thanks,
> Dmitry
>
> On Sat, Oct 2, 2010 at 1:30 AM, Marcelo Tosatti <mtosatti@xxxxxxxxxx> wrote:
>> On Thu, Sep 30, 2010 at 12:07:15PM +0300, Dmitry Golubev wrote:
>>> Hi,
>>>
>>> I am not sure what's really happening, but every few hours
>>> (unpredictable) two virtual machines (Linux 2.6.32) start to generate
>>> huge cpu loads. It looks like some kind of loop is unable to complete
>>> or something...
>>>
>>> So the idea is:
>>>
>>> 1. I have two linux 2.6.32 x64 (openvz, proxmox project) guests
>>> running on linux 2.6.35 x64 (ubuntu maverick) host with a Q6600
>>> Core2Quad on qemu-kvm 0.12.5 and libvirt 0.8.3 and another one small
>>> 32bit linux virtual machine (16MB of ram) with a router inside (i
>>> doubt it contributes to the problem).
>>>
>>> 2. All these machines use hufetlbfs. The server has 8GB of RAM, I
>>> reserved 3696 huge pages (page size is 2MB) on the server, and I am
>>> running the main guests each having 3550MB of virtual memory. The
>>> third guest, as I wrote before, takes 16MB of virtual memory.
>>>
>>> 3. Once run, the guests reserve huge pages for themselves normally. As
>>> mem-prealloc is default, they grab all the memory they should have,
>>> leaving 6 pages unreserved (HugePages_Free - HugePages_Rsvd = 6) all
>>> times - so as I understand they should not want to get any more,
>>> right?
>>>
>>> 4. All virtual machines run perfectly normal without any disturbances
>>> for few hours. They do not, however, use all their memory, so maybe
>>> the issue arises when they pass some kind of a threshold.
>>>
>>> 5. At some point of time both guests exhibit cpu load over the top
>>> (16-24). At the same time, host works perfectly well, showing load of
>>> 8 and that both kvm processes use CPU equally and fully. This point of
>>> time is unpredictable - it can be anything from one to twenty hours,
>>> but it will be less than a day. Sometimes the load disappears in a
>>> moment, but usually it stays like that, and everything works extremely
>>> slow (even a 'ps' command executes some 2-5 minutes).
>>>
>>> 6. If I am patient, I can start rebooting the gueat systems - once
>>> they have restarted, everything returns to normal. If I destroy one of
>>> the guests (virsh destroy), the other one starts working normally at
>>> once (!).
>>>
>>> I am relatively new to kvm and I am absolutely lost here. I have not
>>> experienced such problems before, but recently I upgraded from ubuntu
>>> lucid (I think it was linux 2.6.32, qemukvm 0.12.3 and libvirt 0.7.5)
>>> and started to use hugepages. These two virtual machines are not
>>> normally run on the same host system (i have a corosync/pacemaker
>>> cluster with drbd storage), but when one of the hosts is not
>>> abailable, they start running on the same host. That is the reason I
>>> have not noticed this earlier.
>>>
>>> Unfortunately, I don't have any spare hardware to experiment and this
>>> is a production system, so my debugging options are rather limited.
>>>
>>> Do you have any ideas, what could be wrong?
>>
>> Is there swapping activity on the host when this happens?
>>
>>
>
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html