Hi,
my problem can be described simply: libvirt can't handle starting dozens of VMs at the same time.
(technically, it can, but it's really slow.)
We have an AMD machine with 256 logical cores and 1.5T ram.
On that machine there is roughly 200 VMs.
Each VM is the same: 8GB of RAM, 4 VCPUs. Half of them is Win7 x86, the other half is Win7 x64.
VMs are using qcow2 as the disk image. These images reside in the ramdisk (tmpfs).
We use these machines for automatic malware analysis, so our scenario consists of this cycle:
- reverting VM to a running state
- execute sample inside of the VM for ~1-2 minutes
- shutdown the VM
Of course, this results in multiple VMs trying to start at the same time.
At first, reverts/starts are really fast - second or two.
After about a minute, the "revertToSnapshot" suddenly takes 10-15 seconds, which is really unacceptable.
For comparison, we're running the same scenarion on Proxmox, where the revertToSnapshot usually takes 2 seconds.
Few notes:
- Because of this fast cycle (~2-3 minutes) and because of VMs taking 10-15 seconds to start, there is barely more than 25-30 VMs running at once.
We would really love to utilise the whole potential of such beast machine of ours, and have at least ~100 VMs running at any given time.
- During the time running, the avg. CPU load isn't higher than 25%. Also, there's only about 280 GB of RAM used. Therefore, it's not limitation of our resources.
- When the framwork is running and libvirt is making its best to start our VMs, I noticed that every libvirt operation is suddenly very slow.
Even simple "virsh list [--all]" takes few seconds to complete, even though it finishes instantly when no VM is running/starting.
I was trying to search for this issue, but didn't really find anything besides this presentation:
https://events19.linuxfoundation.org/wp-content/uploads/2017/12/Scalability-and-Stability-of-libvirt-Experiences-with-Very-Large-Hosts-Marc-Hartmayer-IBM-1.pdf
However, I couldn't find those commits in your upstream.
Is this a known issue? Or is there some setting I don't know of which would magically make the VMs start faster?
As for steps to reproduce - I don't think there is anything special needed. Just try to start/destroy several VMs in a loop.
There is even provided one-liner for that in the presentation above.
```
# For multiple domains:
# while virsh start $vm && virsh destroy $vm; do : ; done
# → ~30s hang ups of the libvirtd main loop
```
Best Regards,
Petr
|