On Mon, May 09, 2022 at 06:52:32PM +0000, Petr Beneš wrote: > Hi, > > my problem can be described simply: libvirt can't handle starting dozens of VMs at the same time. > > (technically, it can, but it's really slow.) > > We have an AMD machine with 256 logical cores and 1.5T ram. > On that machine there is roughly 200 VMs. > Each VM is the same: 8GB of RAM, 4 VCPUs. Half of them is Win7 x86, the other half is Win7 x64. > VMs are using qcow2 as the disk image. These images reside in the ramdisk (tmpfs). > > We use these machines for automatic malware analysis, so our scenario consists of this cycle: > - reverting VM to a running state > - execute sample inside of the VM for ~1-2 minutes > - shutdown the VM > > Of course, this results in multiple VMs trying to start at the same time. > At first, reverts/starts are really fast - second or two. > After about a minute, the "revertToSnapshot" suddenly takes 10-15 seconds, which is really unacceptable. > For comparison, we're running the same scenarion on Proxmox, where the revertToSnapshot usually takes 2 seconds. Can you share the XML configuration of one of your guests - assuming they all have the same basic configuration. As a gut feeling it sounds to me like it could be initially fast due to utilization of host I/O cache, but then slows down due to having to flush data to disk / read fresh from disk. This could be the case if the disk configuration cache mode is set to certain values, so the XML config will show us this info. With regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|