Re: KVM with hugepages generate huge load with two guests

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

Seems that nobody is interested in this bug :(

Anyway I wanted to add a bit more to this investigation.

Once I put "nohz=off highres=off clocksource=acpi_pm" in guest kernel
options, the guests started to behave better - they do not stay in the
slow state, but rather get there for some seconds (usually up to
minute, but sometimes 2-3 minutes) and then get out of it (this cycle
repeats once in a while - every approx 3-6 minutes). Once the
situation became stable, so that I am able to leave the guests without
very much worries, I also noticed that sometimes the predicted
swapping occurs, although rarely (I waited about half an hour to catch
the first swapping on the host). Here is a fragment of vmstat. Note
that when the first column shows 8-9 - the slowness and huge load
happens. You can also see how is appears and disappears (with nohz and
kvm-clock it did not go out of slowness period, but with tsc clock the
probability of getting out is significantly lower):

procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa
 8  0      0  60456  19708 253688    0    0     6   170 5771 1712 97  3  0  0
 9  5      0  58752  19708 253688    0    0    11    57 6457 1500 96  4  0  0
 8  0      0  58192  19708 253688    0    0    55   106 5112 1588 98  3  0  0
 8  0      0  58068  19708 253688    0    0    21     0 2609 1498 100  0  0  0
 8  2      0  57728  19708 253688    0    0     9    96 2645 1620 100  0  0  0
 8  0      0  53852  19716 253680    0    0     2   186 6321 1935 97  4  0  0
 8  0      0  49636  19716 253688    0    0     0    45 3482 1484 99  1  0  0
 8  0      0  49452  19716 253688    0    0     0    34 3253 1851 100  0  0  0
 4  1   1468 126252  16780 182256   53  317   393   788 29318 3498 79 21  0  0
 4  0   1468 135596  16780 182332    0    0     7   360 26782 2459 79 21  0  0
 1  0   1468 169720  16780 182340    0    0    75    81 22024 3194 40 15 42  3
 3  0   1464 167608  16780 182340    6    0    26  1579 9404 5526 22  8 35 35
 0  0   1460 164232  16780 182504    0    0    85   170 4955 3345 21  5 69  5
 0  0   1460 163636  16780 182504    0    0     0    90 1288 1855  5  2 90  3
 1  0   1460 164836  16780 182504    0    0     0    34 1166 1789  4  2 93  1
 1  0   1452 165628  16780 182504    0    0   285    70 1981 2692 10  2 83  4
 1  0   1452 160044  16952 184840    6    0   832   146 5046 3303 11  6 76  7
 1  0   1452 161416  16960 184840    0    0    19   170 1732 2577 10  2 74 13
 0  1   1452 161920  16960 184840    0    0   111    53 1084 1986  0  1 96  3
 0  0   1452 161332  16960 184840    0    0   254    34  856 1505  2  1 95  3
 1  0   1452 159168  16960 184840    0    0   366    46 2137 2774  3  2 94  1
 1  0   1452 157408  16968 184840    0    0     0    69 2423 2991  9  5 84  2
 0  0   1444 157876  16968 184840    0    0     0    45 6343 3079 24 10 65  1
 0  0   1428 159644  16968 184844    6    0     8    52  724 1276  0  0 98  2
 0  0   1428 160336  16968 184844    0    0    31    98 1115 1835  1  1 92  6
 1  0   1428 161360  16968 184844    0    0     0    45 1333 1849  2  1 95  2
 0  0   1428 162092  16968 184844    0    0     0   408 3517 4267 11  2 78  8
 1  1   1428 163868  16968 184844    0    0    24   121 1714 2036 10  2 86  2
 1  3   1428 161292  16968 184844    0    0     3   143 2906 3503 16  4 77  3
 0  0   1428 156448  16976 184836    0    0     1   781 5661 4464 16  7 74  3
 1  0   1428 156924  16976 184844    0    0   588    92 2341 3845  7  2 87  4
 0  0   1428 158816  16976 184844    0    0    27   119 2052 3830  5  1 89  4
 0  0   1428 161420  16976 184844    0    0     1    56 3923 3132 26  4 68  1
 0  0   1428 162724  16976 184844    0    0    10   107 2806 3558 10  2 86  2
 1  0   1428 165244  16976 184844    0    0    34   155 2084 2469  8  2 78 12
 0  0   1428 165204  16976 184844    0    0   390   282 9568 4924 17 11 55 17
 1  0   1392 163864  16976 185064  102    0   218   411 11762 16591  6  9 68 17
 8  0   1384 164992  16984 185056    0    0     9    88 7540 5761 73  6 17  4
 8  0   1384 163620  16984 185076    0    0     1    89 21936 45040 90 10  0  0
 8  0   1384 165324  16992 185076    0    0     5   194 3330 1678 99  1  0  0
 8  0   1384 165704  16992 185076    0    0     1    54 2651 1457 99  1  0  0
procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa
 8  0   1384 163016  17000 185076    0    0     0   126 4988 1536 97  3  0  0
 9  1   1384 162608  17000 185076    0    0    34   477 20106 2351 83 17  0  0
 0  0   1384 184052  17000 185076    0    0   102  1198 48951 3628 48 38  6  8
 0  0   1384 183088  17008 185076    0    0     8   156 1228 1419  2  2 82 14
 0  0   1384 184436  17008 185164    0    0    28   113 3176 2785 12  7 75  6
 0  0   1384 184568  17008 185164    0    0    30   107 1547 1821  4  3 87  6
 4  2   1228 228808  17008 185212   34    0   243     9 1591 1212 10 14 76  1
 9  0   1228 223644  17016 185164    0    0  2872   857 18515 5134 45 20  9 26
 0  3   1228 224840  17016 185164    0    0  1080   786 8281 5490 35 12 21 33
 2  0   1228 222032  17016 185164    0    0  1184    99 21056 3713 26 17 48  9
 1  0   1228 221784  17016 185164    0    0  2075    69 3089 3749  9  7 73 11
 3  0   1228 220544  17016 185164    0    0  1501   150 3815 3520  7  8 73 12
 3  0   1228 219736  17024 185164    0    0  1129   103 7726 4177 20 11 60  9
 0  4   1228 217224  17024 185164    0    0  2844   211 6068 4643  9  7 60 23

Thanks,
Dmitry

On Thu, Nov 18, 2010 at 8:53 AM, Dmitry Golubev <lastguru@xxxxxxxxx> wrote:
> Hi,
>
> Sorry to bother you again. I have more info:
>
>> 1. router with 32MB of RAM (hugepages) and 1VCPU
> ...
>> Is it too much to have 3 guests with hugepages?
>
> OK, this router is also out of equation - I disabled hugepages for it.
> There should be also additional pages available to guests because of
> that. I think this should be pretty reproducible... Two exactly
> similar 64bit Linux 2.6.32 guests with 3500MB of virtual RAM and 4
> VCPU each, running on a Core2Quad (4 real cores) machine with 8GB of
> RAM and 3546 2MB hugepages on a 64bit Linux 2.6.35 host (libvirt
> 0.8.3) from Ubuntu Maverick.
>
> Still no swapping and the effect is pretty much the same: one guest
> runs well, two guests work for some minutes - then slow down few
> hundred times, showing huge load both inside (unlimited rapid growth
> of loadaverage) and outside (host load is not making it unresponsive
> though - but loaded to the max). Load growth on host is instant and
> finite ('r' column change indicate this sudden rise):
>
> # vmstat 5
> procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
>  r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa
>  1  3      0 194220  30680  76712    0    0   319    28 2633 1960  6  6 67 20
>  1  2      0 193776  30680  76712    0    0     4   231 55081 78491  3 39 17 41
> 10  1      0 185508  30680  76712    0    0     4    87 53042 34212 55 27  9  9
> 12  0      0 185180  30680  76712    0    0     2    95 41007 21990 84 16  0  0
>
> Thanks,
> Dmitry
>
> On Wed, Nov 17, 2010 at 4:19 AM, Dmitry Golubev <lastguru@xxxxxxxxx> wrote:
>> Hi,
>>
>> Maybe you remember that I wrote few weeks ago about KVM cpu load
>> problem with hugepages. The problem was lost hanging, however I have
>> now some new information. So the description remains, however I have
>> decreased both guest memory and the amount of hugepages:
>>
>> Ram = 8GB, hugepages = 3546
>>
>> Total of 2 virual machines:
>> 1. router with 32MB of RAM (hugepages) and 1VCPU
>> 2. linux guest with 3500MB of RAM (hugepages) and 4VCPU
>>
>> Everything works fine until I start the second linux guest with the
>> same 3500MB of guest RAM also in hugepages and also 4VCPU. The rest of
>> description is the same as before: after a while the host shows
>> loadaverage of about 8 (on a Core2Quad) and it seems that both big
>> guests consume exactly the same amount of resources. The hosts seems
>> responsive though. Inside the guests, however, things are not so good
>> - the load sky rockets to at least 20. Guests are not responsive and
>> even a 'ps' executes inappropriately slow (may take few minutes -
>> here, however, load builds up and it seems that machine becomes slower
>> with time, unlike host, which shows the jump in resource consumption
>> instantly). It also seem that the more guests uses memory, the faster
>> the problem appers. Still at least a gig of RAM is free on each guest
>> and there is no swap activity inside the guest.
>>
>> The most important thing - why I went back and quoted older message
>> than the last one, is that there is no more swap activity on host, so
>> the previous track of thought may also be wrong and I returned to the
>> beginning. There is plenty of RAM now and swap on host is always on 0
>> as seen in 'top'. And there is 100% cpu load, equally shared between
>> the two large guests. To stop the load I can destroy either large
>> guest. Additionally, I have just discovered that suspending any large
>> guest works as well. Moreover, after resume, the load does not come
>> back for a while. Both methods stop the high load instantly (faster
>> than a second). As you were asking for a 'top' inside the guest, here
>> it is:
>>
>> top - 03:27:27 up 42 min,  1 user,  load average: 18.37, 7.68, 3.12
>> Tasks: 197 total,  23 running, 174 sleeping,   0 stopped,   0 zombie
>> Cpu(s):  0.0%us, 89.2%sy,  0.0%ni, 10.5%id,  0.0%wa,  0.0%hi,  0.2%si,  0.0%st
>> Mem:   3510912k total,  1159760k used,  2351152k free,    62568k buffers
>> Swap:  4194296k total,        0k used,  4194296k free,   484492k cached
>>   PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
>> 12303 root      20   0     0    0    0 R  100  0.0   0:33.72
>> vpsnetclean
>> 11772 99        20   0  149m  11m 2104 R   82  0.3   0:15.10 httpd
>> 10906 99        20   0  149m  11m 2124 R   73  0.3   0:11.52 httpd
>> 10247 99        20   0  149m  11m 2128 R   31  0.3   0:05.39 httpd
>>  3916 root      20   0 86468  11m 1476 R   16  0.3   0:15.14
>> cpsrvd-ssl
>> 10919 99        20   0  149m  11m 2124 R    8  0.3   0:03.43 httpd
>> 11296 99        20   0  149m  11m 2112 R    7  0.3   0:03.26 httpd
>> 12265 99        20   0  149m  11m 2088 R    7  0.3   0:08.01 httpd
>> 12317 root      20   0 99.6m 1384  716 R    7  0.0   0:06.57 crond
>> 12326 503       20   0  8872   96   72 R    7  0.0   0:01.13 php
>>  3634 root      20   0 74804 1176  596 R    6  0.0   0:12.15 crond
>> 11864 32005     20   0 87224  13m 2528 R    6  0.4   0:30.84
>> cpsrvd-ssl
>> 12275 root      20   0 30628 9976 1364 R    6  0.3   0:24.68 cpgs_chk
>> 11305 99        20   0  149m  11m 2104 R    6  0.3   0:02.53 httpd
>> 12278 root      20   0  8808 1328  968 R    6  0.0   0:04.63 sim
>>  1534 root      20   0     0    0    0 S    6  0.0   0:03.29
>> flush-254:2
>>  3626 root      20   0  149m  13m 5324 R    6  0.4   0:27.62 httpd
>> 12279 32008     20   0 87472 7668 2480 R    6  0.2   0:27.63
>> munin-update
>> 10243 99        20   0  149m  11m 2128 R    5  0.3   0:08.47 httpd
>> 12321 root      20   0 99.6m 1460  792 R    5  0.0   0:07.43 crond
>> 12325 root      20   0 74804  672   92 R    5  0.0   0:00.76 crond
>>  1531 root      20   0     0    0    0 S    2  0.0   0:02.26 kjournald
>>     1 root      20   0 10316  756  620 S    0  0.0   0:02.10 init
>>     2 root      20   0     0    0    0 S    0  0.0   0:00.01 kthreadd
>>     3 root      RT   0     0    0    0 S    0  0.0   0:01.08
>> migration/0
>>     4 root      20   0     0    0    0 S    0  0.0   0:00.02
>> ksoftirqd/0
>>     5 root      RT   0     0    0    0 S    0  0.0   0:00.00
>> watchdog/0
>>     6 root      RT   0     0    0    0 S    0  0.0   0:00.47
>> migration/1
>>     7 root      20   0     0    0    0 S    0  0.0   0:00.03
>> ksoftirqd/1
>>     8 root      RT   0     0    0    0 S    0  0.0   0:00.00
>> watchdog/1
>>
>>
>> The tasks are changing in the 'top' view, so it is nothing like a
>> single task hanging - it is more like a machine working off a swap.
>> The problem is, however that according to vmstat, there is no swap
>> activity during this time. Should I try to decrease RAM I give to my
>> guests even more? Is it too much to have 3 guests with hugepages?
>> Should I try something else? Unfortunately it is a production system
>> and I can't play with it very much.
>>
>> Here is 'top' on the host:
>>
>> top - 03:32:12 up 25 days, 23:38,  2 users,  load average: 8.50, 5.07, 10.39
>> Tasks: 133 total,   1 running, 132 sleeping,   0 stopped,   0 zombie
>> Cpu(s): 99.1%us,  0.7%sy,  0.0%ni,  0.0%id,  0.0%wa,  0.0%hi,  0.2%si,  0.0%st
>> Mem:   8193472k total,  8071776k used,   121696k free,    45296k buffers
>> Swap: 11716412k total,        0k used, 11714844k free,   197236k cached
>>   PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
>>  8426 libvirt-  20   0 3771m  27m 3904 S  199  0.3  10:28.33 kvm
>>  8374 libvirt-  20   0 3815m  32m 3908 S  199  0.4   8:11.53 kvm
>>  1557 libvirt-  20   0  225m 7720 2092 S    1  0.1 436:54.45 kvm
>>    72 root      20   0     0    0    0 S    0  0.0   6:22.54
>> kondemand/3
>>   379 root      20   0     0    0    0 S    0  0.0  58:20.99 md3_raid5
>>     1 root      20   0 23768 1944 1228 S    0  0.0   0:00.95 init
>>     2 root      20   0     0    0    0 S    0  0.0   0:00.24 kthreadd
>>     3 root      20   0     0    0    0 S    0  0.0   0:12.66
>> ksoftirqd/0
>>     4 root      RT   0     0    0    0 S    0  0.0   0:07.58
>> migration/0
>>     5 root      RT   0     0    0    0 S    0  0.0   0:00.00
>> watchdog/0
>>     6 root      RT   0     0    0    0 S    0  0.0   0:15.05
>> migration/1
>>     7 root      20   0     0    0    0 S    0  0.0   0:19.64
>> ksoftirqd/1
>>     8 root      RT   0     0    0    0 S    0  0.0   0:00.00
>> watchdog/1
>>     9 root      RT   0     0    0    0 S    0  0.0   0:07.21
>> migration/2
>>    10 root      20   0     0    0    0 S    0  0.0   0:41.74
>> ksoftirqd/2
>>    11 root      RT   0     0    0    0 S    0  0.0   0:00.00
>> watchdog/2
>>    12 root      RT   0     0    0    0 S    0  0.0   0:13.62
>> migration/3
>>    13 root      20   0     0    0    0 S    0  0.0   0:24.63
>> ksoftirqd/3
>>    14 root      RT   0     0    0    0 S    0  0.0   0:00.00
>> watchdog/3
>>    15 root      20   0     0    0    0 S    0  0.0   1:17.11 events/0
>>    16 root      20   0     0    0    0 S    0  0.0   1:33.30 events/1
>>    17 root      20   0     0    0    0 S    0  0.0   4:15.28 events/2
>>    18 root      20   0     0    0    0 S    0  0.0   1:13.49 events/3
>>    19 root      20   0     0    0    0 S    0  0.0   0:00.00 cpuset
>>    20 root      20   0     0    0    0 S    0  0.0   0:00.02 khelper
>>    21 root      20   0     0    0    0 S    0  0.0   0:00.00 netns
>>    22 root      20   0     0    0    0 S    0  0.0   0:00.00 async/mgr
>>    23 root      20   0     0    0    0 S    0  0.0   0:00.00 pm
>>    25 root      20   0     0    0    0 S    0  0.0   0:02.47
>> sync_supers
>>    26 root      20   0     0    0    0 S    0  0.0   0:03.86
>> bdi-default
>>
>>
>> Please help...
>>
>> Thanks,
>> Dmitry
>>
>> On Sat, Oct 2, 2010 at 1:30 AM, Marcelo Tosatti <mtosatti@xxxxxxxxxx> wrote:
>>>
>>> On Thu, Sep 30, 2010 at 12:07:15PM +0300, Dmitry Golubev wrote:
>>> > Hi,
>>> >
>>> > I am not sure what's really happening, but every few hours
>>> > (unpredictable) two virtual machines (Linux 2.6.32) start to generate
>>> > huge cpu loads. It looks like some kind of loop is unable to complete
>>> > or something...
>>> >
>>> > So the idea is:
>>> >
>>> > 1. I have two linux 2.6.32 x64 (openvz, proxmox project) guests
>>> > running on linux 2.6.35 x64 (ubuntu maverick) host with a Q6600
>>> > Core2Quad on qemu-kvm 0.12.5 and libvirt 0.8.3 and another one small
>>> > 32bit linux virtual machine (16MB of ram) with a router inside (i
>>> > doubt it contributes to the problem).
>>> >
>>> > 2. All these machines use hufetlbfs. The server has 8GB of RAM, I
>>> > reserved 3696 huge pages (page size is 2MB) on the server, and I am
>>> > running the main guests each having 3550MB of virtual memory. The
>>> > third guest, as I wrote before, takes 16MB of virtual memory.
>>> >
>>> > 3. Once run, the guests reserve huge pages for themselves normally. As
>>> > mem-prealloc is default, they grab all the memory they should have,
>>> > leaving 6 pages unreserved (HugePages_Free - HugePages_Rsvd = 6) all
>>> > times - so as I understand they should not want to get any more,
>>> > right?
>>> >
>>> > 4. All virtual machines run perfectly normal without any disturbances
>>> > for few hours. They do not, however, use all their memory, so maybe
>>> > the issue arises when they pass some kind of a threshold.
>>> >
>>> > 5. At some point of time both guests exhibit cpu load over the top
>>> > (16-24). At the same time, host works perfectly well, showing load of
>>> > 8 and that both kvm processes use CPU equally and fully. This point of
>>> > time is unpredictable - it can be anything from one to twenty hours,
>>> > but it will be less than a day. Sometimes the load disappears in a
>>> > moment, but usually it stays like that, and everything works extremely
>>> > slow (even a 'ps' command executes some 2-5 minutes).
>>> >
>>> > 6. If I am patient, I can start rebooting the gueat systems - once
>>> > they have restarted, everything returns to normal. If I destroy one of
>>> > the guests (virsh destroy), the other one starts working normally at
>>> > once (!).
>>> >
>>> > I am relatively new to kvm and I am absolutely lost here. I have not
>>> > experienced such problems before, but recently I upgraded from ubuntu
>>> > lucid (I think it was linux 2.6.32, qemukvm 0.12.3 and libvirt 0.7.5)
>>> > and started to use hugepages. These two virtual machines are not
>>> > normally run on the same host system (i have a corosync/pacemaker
>>> > cluster with drbd storage), but when one of the hosts is not
>>> > abailable, they start running on the same host. That is the reason I
>>> > have not noticed this earlier.
>>> >
>>> > Unfortunately, I don't have any spare hardware to experiment and this
>>> > is a production system, so my debugging options are rather limited.
>>> >
>>> > Do you have any ideas, what could be wrong?
>>>
>>> Is there swapping activity on the host when this happens?
>>>
>>
>
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]
  Powered by Linux