Hello,
The issue reproduced again and it doesn't look like a swap problem. Some details:
on the baremetal, from top:
top - 08:08:52 up 5 days, 16:43, 3 users, load average: 36.19, 36.05, 36.05
Tasks: 493 total, 1 running, 492 sleeping, 0 stopped, 0 zombie
%Cpu(s): 3.5 us, 87.9 sy, 0.0 ni, 8.6 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
KiB Mem : 12357451+total, 14296000 free, 65634428 used, 43644088 buff/cache
KiB Swap: 4194300 total, 4073868 free, 120432 used. 56953888 avail Mem
19158 qemu 20 0 0.098t 0.041t 10476 S 3650 35.6 13048:24 qemu-kvm
The compute node has 36 CPUs and the usage is now 100%. There are more than 50 GB of memory still available on the baremetal. The swap is barely used, 120 MB.
On compute node, from top:
top - 05:11:58 up 1 day, 15:08, 2 users, load average: 40.46, 40.49, 40.74
%Cpu(s): 99.1 us, 0.7 sy, 0.0 ni, 0.0 id, 0.0 wa, 0.0 hi, 0.1 si, 0.1 st
KiB Mem : 10296246+total, 78079936 free, 23671360 used, 1211160 buff/cache
KiB Swap: 0 total, 0 free, 0 used. 78939968 avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
6032 qemu 20 0 10.601g 1.272g 12964 S 400.0 1.3 588:40.39 qemu-kvm
5673 qemu 20 0 10.602g 1.006g 13020 S 399.7 1.0 1161:47 qemu-kvm
5998 qemu 20 0 10.601g 1.192g 13028 S 367.9 1.2 1544:30 qemu-kvm
5951 qemu 20 0 10.601g 1.246g 13020 S 348.3 1.3 1547:38 qemu-kvm
5750 qemu 20 0 10.599g 990136 13060 S 339.1 1.0 1152:25 qemu-kvm
5752 qemu 20 0 10.598g 1.426g 13040 S 313.9 1.5 663:13.65 qemu-kvm
....
There are more than 70 GB of memory available on the compute node. All VMs are using 100% their CPUs and they are not accessible anymore.
Laurentiu
More details on the subject:
I suppose it is a nested KVM issue because it raised after I enabled the nested KVM feature. Without it, anyway, the second level VMs are unusable in terms of performance.
I am using CentOS 7 with:
kernel: 3.10.0-327.22.2.el7.x86_64
qemu-kvm:1.5.3-105.el7_2.4
libvirt:1.2.17-13.el7_2.5
on both the baremetal and the compute VM.
Please, post
1) # virsh dumpxml VM-L1 ( where on L1 level you expect nested KVM to appear)
2) Login into VM-L1 and run :-
# lsmod | grep kvm
3) I need outputs from VM-L1 ( in case it is Compute Node )
# cat /etc/nova/nova.conf | grep virt_type
# cat /etc/nova/nova.conf | grep cpu_mode
Boris.
The only workaround now is to shutdown the compute VM and start it back from baremetal with virsh start.
A simple restart of the compute node doesn't help. It looks like the qemu-kvm process corresponding to the compute VM is the problem.
Laurentiu
Hello,
I have an OpenStack setup in virtual environment on CentOS 7.
The baremetal has nested KVM enabled and 1 compute node as a VM.
Inside the compute node I have multiple VMs running.
After about every 3 days the VMs get inaccessible and the compute node reports high CPU usage. The qemu-kvm process for each VM inside the compute node reports full CPU usage.
Please help me with some hints to debug this issue.
Thanks,
Laurentiu
_______________________________________________
CentOS-virt mailing list
CentOS-virt@xxxxxxxxxx
https://lists.centos.org/mailman/listinfo/centos-virt