Re: KVM with hugepages generate huge load with two guests

Dmitry Golubev <lastguru@xxxxxxxxx> · Wed, 17 Nov 2010 04:19:32 +0200

Hi,

Maybe you remember that I wrote few weeks ago about KVM cpu load
problem with hugepages. The problem was lost hanging, however I have
now some new information. So the description remains, however I have
decreased both guest memory and the amount of hugepages:

Ram = 8GB, hugepages = 3546

Total of 2 virual machines:
1. router with 32MB of RAM (hugepages) and 1VCPU
2. linux guest with 3500MB of RAM (hugepages) and 4VCPU

Everything works fine until I start the second linux guest with the
same 3500MB of guest RAM also in hugepages and also 4VCPU. The rest of
description is the same as before: after a while the host shows
loadaverage of about 8 (on a Core2Quad) and it seems that both big
guests consume exactly the same amount of resources. The hosts seems
responsive though. Inside the guests, however, things are not so good
- the load sky rockets to at least 20. Guests are not responsive and
even a 'ps' executes inappropriately slow (may take few minutes -
here, however, load builds up and it seems that machine becomes slower
with time, unlike host, which shows the jump in resource consumption
instantly). It also seem that the more guests uses memory, the faster
the problem appers. Still at least a gig of RAM is free on each guest
and there is no swap activity inside the guest.

The most important thing - why I went back and quoted older message
than the last one, is that there is no more swap activity on host, so
the previous track of thought may also be wrong and I returned to the
beginning. There is plenty of RAM now and swap on host is always on 0
as seen in 'top'. And there is 100% cpu load, equally shared between
the two large guests. To stop the load I can destroy either large
guest. Additionally, I have just discovered that suspending any large
guest works as well. Moreover, after resume, the load does not come
back for a while. Both methods stop the high load instantly (faster
than a second). As you were asking for a 'top' inside the guest, here
it is:

top - 03:27:27 up 42 min,  1 user,  load average: 18.37, 7.68, 3.12
Tasks: 197 total,  23 running, 174 sleeping,   0 stopped,   0 zombie
Cpu(s):  0.0%us, 89.2%sy,  0.0%ni, 10.5%id,  0.0%wa,  0.0%hi,  0.2%si,  0.0%st
Mem:   3510912k total,  1159760k used,  2351152k free,    62568k buffers
Swap:  4194296k total,        0k used,  4194296k free,   484492k cached
  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
12303 root      20   0     0    0    0 R  100  0.0   0:33.72
vpsnetclean
11772 99        20   0  149m  11m 2104 R   82  0.3   0:15.10 httpd
10906 99        20   0  149m  11m 2124 R   73  0.3   0:11.52 httpd
10247 99        20   0  149m  11m 2128 R   31  0.3   0:05.39 httpd
 3916 root      20   0 86468  11m 1476 R   16  0.3   0:15.14
cpsrvd-ssl
10919 99        20   0  149m  11m 2124 R    8  0.3   0:03.43 httpd
11296 99        20   0  149m  11m 2112 R    7  0.3   0:03.26 httpd
12265 99        20   0  149m  11m 2088 R    7  0.3   0:08.01 httpd
12317 root      20   0 99.6m 1384  716 R    7  0.0   0:06.57 crond
12326 503       20   0  8872   96   72 R    7  0.0   0:01.13 php
 3634 root      20   0 74804 1176  596 R    6  0.0   0:12.15 crond
11864 32005     20   0 87224  13m 2528 R    6  0.4   0:30.84
cpsrvd-ssl
12275 root      20   0 30628 9976 1364 R    6  0.3   0:24.68 cpgs_chk
11305 99        20   0  149m  11m 2104 R    6  0.3   0:02.53 httpd
12278 root      20   0  8808 1328  968 R    6  0.0   0:04.63 sim
 1534 root      20   0     0    0    0 S    6  0.0   0:03.29
flush-254:2
 3626 root      20   0  149m  13m 5324 R    6  0.4   0:27.62 httpd
12279 32008     20   0 87472 7668 2480 R    6  0.2   0:27.63
munin-update
10243 99        20   0  149m  11m 2128 R    5  0.3   0:08.47 httpd
12321 root      20   0 99.6m 1460  792 R    5  0.0   0:07.43 crond
12325 root      20   0 74804  672   92 R    5  0.0   0:00.76 crond
 1531 root      20   0     0    0    0 S    2  0.0   0:02.26 kjournald
    1 root      20   0 10316  756  620 S    0  0.0   0:02.10 init
    2 root      20   0     0    0    0 S    0  0.0   0:00.01 kthreadd
    3 root      RT   0     0    0    0 S    0  0.0   0:01.08
migration/0
    4 root      20   0     0    0    0 S    0  0.0   0:00.02
ksoftirqd/0
    5 root      RT   0     0    0    0 S    0  0.0   0:00.00
watchdog/0
    6 root      RT   0     0    0    0 S    0  0.0   0:00.47
migration/1
    7 root      20   0     0    0    0 S    0  0.0   0:00.03
ksoftirqd/1
    8 root      RT   0     0    0    0 S    0  0.0   0:00.00
watchdog/1

The tasks are changing in the 'top' view, so it is nothing like a
single task hanging - it is more like a machine working off a swap.
The problem is, however that according to vmstat, there is no swap
activity during this time. Should I try to decrease RAM I give to my
guests even more? Is it too much to have 3 guests with hugepages?
Should I try something else? Unfortunately it is a production system
and I can't play with it very much.

Here is 'top' on the host:

top - 03:32:12 up 25 days, 23:38,  2 users,  load average: 8.50, 5.07, 10.39
Tasks: 133 total,   1 running, 132 sleeping,   0 stopped,   0 zombie
Cpu(s): 99.1%us,  0.7%sy,  0.0%ni,  0.0%id,  0.0%wa,  0.0%hi,  0.2%si,  0.0%st
Mem:   8193472k total,  8071776k used,   121696k free,    45296k buffers
Swap: 11716412k total,        0k used, 11714844k free,   197236k cached
  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
 8426 libvirt-  20   0 3771m  27m 3904 S  199  0.3  10:28.33 kvm
 8374 libvirt-  20   0 3815m  32m 3908 S  199  0.4   8:11.53 kvm
 1557 libvirt-  20   0  225m 7720 2092 S    1  0.1 436:54.45 kvm
   72 root      20   0     0    0    0 S    0  0.0   6:22.54
kondemand/3
  379 root      20   0     0    0    0 S    0  0.0  58:20.99 md3_raid5
    1 root      20   0 23768 1944 1228 S    0  0.0   0:00.95 init
    2 root      20   0     0    0    0 S    0  0.0   0:00.24 kthreadd
    3 root      20   0     0    0    0 S    0  0.0   0:12.66
ksoftirqd/0
    4 root      RT   0     0    0    0 S    0  0.0   0:07.58
migration/0
    5 root      RT   0     0    0    0 S    0  0.0   0:00.00
watchdog/0
    6 root      RT   0     0    0    0 S    0  0.0   0:15.05
migration/1
    7 root      20   0     0    0    0 S    0  0.0   0:19.64
ksoftirqd/1
    8 root      RT   0     0    0    0 S    0  0.0   0:00.00
watchdog/1
    9 root      RT   0     0    0    0 S    0  0.0   0:07.21
migration/2
   10 root      20   0     0    0    0 S    0  0.0   0:41.74
ksoftirqd/2
   11 root      RT   0     0    0    0 S    0  0.0   0:00.00
watchdog/2
   12 root      RT   0     0    0    0 S    0  0.0   0:13.62
migration/3
   13 root      20   0     0    0    0 S    0  0.0   0:24.63
ksoftirqd/3
   14 root      RT   0     0    0    0 S    0  0.0   0:00.00
watchdog/3
   15 root      20   0     0    0    0 S    0  0.0   1:17.11 events/0
   16 root      20   0     0    0    0 S    0  0.0   1:33.30 events/1
   17 root      20   0     0    0    0 S    0  0.0   4:15.28 events/2
   18 root      20   0     0    0    0 S    0  0.0   1:13.49 events/3
   19 root      20   0     0    0    0 S    0  0.0   0:00.00 cpuset
   20 root      20   0     0    0    0 S    0  0.0   0:00.02 khelper
   21 root      20   0     0    0    0 S    0  0.0   0:00.00 netns
   22 root      20   0     0    0    0 S    0  0.0   0:00.00 async/mgr
   23 root      20   0     0    0    0 S    0  0.0   0:00.00 pm
   25 root      20   0     0    0    0 S    0  0.0   0:02.47
sync_supers
   26 root      20   0     0    0    0 S    0  0.0   0:03.86
bdi-default

Please help...

Thanks,
Dmitry

On Sat, Oct 2, 2010 at 1:30 AM, Marcelo Tosatti <mtosatti@xxxxxxxxxx> wrote:
>
> On Thu, Sep 30, 2010 at 12:07:15PM +0300, Dmitry Golubev wrote:
> > Hi,
> >
> > I am not sure what's really happening, but every few hours
> > (unpredictable) two virtual machines (Linux 2.6.32) start to generate
> > huge cpu loads. It looks like some kind of loop is unable to complete
> > or something...
> >
> > So the idea is:
> >
> > 1. I have two linux 2.6.32 x64 (openvz, proxmox project) guests
> > running on linux 2.6.35 x64 (ubuntu maverick) host with a Q6600
> > Core2Quad on qemu-kvm 0.12.5 and libvirt 0.8.3 and another one small
> > 32bit linux virtual machine (16MB of ram) with a router inside (i
> > doubt it contributes to the problem).
> >
> > 2. All these machines use hufetlbfs. The server has 8GB of RAM, I
> > reserved 3696 huge pages (page size is 2MB) on the server, and I am
> > running the main guests each having 3550MB of virtual memory. The
> > third guest, as I wrote before, takes 16MB of virtual memory.
> >
> > 3. Once run, the guests reserve huge pages for themselves normally. As
> > mem-prealloc is default, they grab all the memory they should have,
> > leaving 6 pages unreserved (HugePages_Free - HugePages_Rsvd = 6) all
> > times - so as I understand they should not want to get any more,
> > right?
> >
> > 4. All virtual machines run perfectly normal without any disturbances
> > for few hours. They do not, however, use all their memory, so maybe
> > the issue arises when they pass some kind of a threshold.
> >
> > 5. At some point of time both guests exhibit cpu load over the top
> > (16-24). At the same time, host works perfectly well, showing load of
> > 8 and that both kvm processes use CPU equally and fully. This point of
> > time is unpredictable - it can be anything from one to twenty hours,
> > but it will be less than a day. Sometimes the load disappears in a
> > moment, but usually it stays like that, and everything works extremely
> > slow (even a 'ps' command executes some 2-5 minutes).
> >
> > 6. If I am patient, I can start rebooting the gueat systems - once
> > they have restarted, everything returns to normal. If I destroy one of
> > the guests (virsh destroy), the other one starts working normally at
> > once (!).
> >
> > I am relatively new to kvm and I am absolutely lost here. I have not
> > experienced such problems before, but recently I upgraded from ubuntu
> > lucid (I think it was linux 2.6.32, qemukvm 0.12.3 and libvirt 0.7.5)
> > and started to use hugepages. These two virtual machines are not
> > normally run on the same host system (i have a corosync/pacemaker
> > cluster with drbd storage), but when one of the hosts is not
> > abailable, they start running on the same host. That is the reason I
> > have not noticed this earlier.
> >
> > Unfortunately, I don't have any spare hardware to experiment and this
> > is a production system, so my debugging options are rather limited.
> >
> > Do you have any ideas, what could be wrong?
>
> Is there swapping activity on the host when this happens?
>
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html