Re: Slow VM start/revert, when trying to start/revert dozens of VMs in parallel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Sure.
As for flushing/reading from disk - as I said, all VMs reside on ramdisk.
I'd also like to add, that the VMs are "linked-clones" with the same underlying base qcow2 - which is also​ in the ramdisk.

```xml
<domain type='kvm'>
  <name>win7-x86-1-101</name>
  <uuid>dc7296c0-228a-44e8-bd47-db019a1f6344</uuid>
  <metadata>
    <libosinfo:libosinfo xmlns:libosinfo="http://libosinfo.org/xmlns/libvirt/domain/1.0">
      <libosinfo:os id="http://microsoft.com/win/10"/>
    </libosinfo:libosinfo>
  </metadata>
  <memory unit='KiB'>8388608</memory>
  <currentMemory unit='KiB'>8388608</currentMemory>
  <vcpu placement='static'>4</vcpu>
  <os>
    <type arch='x86_64' machine='pc-q35-6.2'>hvm</type>
    <boot dev='hd'/>
  </os>
  <features>
    <acpi/>
    <apic/>
    <hyperv mode='custom'>
      <relaxed state='on'/>
      <vapic state='on'/>
      <spinlocks state='on' retries='8191'/>
    </hyperv>
    <vmcoreinfo state='on'/>
  </features>
  <cpu mode='host-passthrough' check='none' migratable='on'>
    <topology sockets='1' dies='1' cores='4' threads='1'/>
  </cpu>
  <clock offset='localtime'>
    <timer name='rtc' tickpolicy='catchup'/>
    <timer name='pit' tickpolicy='delay'/>
    <timer name='hpet' present='no'/>
    <timer name='hypervclock' present='yes'/>
  </clock>
  <on_poweroff>destroy</on_poweroff>
  <on_reboot>restart</on_reboot>
  <on_crash>preserve</on_crash>
  <pm>
    <suspend-to-mem enabled='no'/>
    <suspend-to-disk enabled='no'/>
  </pm>
  <devices>
    <emulator>/usr/bin/qemu-system-x86_64</emulator>
    <disk type='file' device='disk'>
      <driver name='qemu' type='qcow2'/>
      <source file='/mnt/ramdisk/vms/clones/win7-x86-1-101/image.qcow2'/>
      <target dev='hda' bus='sata'/>
      <address type='drive' controller='0' bus='0' target='0' unit='0'/>
    </disk>
    <controller type='usb' index='0' model='qemu-xhci' ports='15'>
      <address type='pci' domain='0x0000' bus='0x03' slot='0x00' function='0x0'/>
    </controller>
    <controller type='sata' index='0'>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x1f' function='0x2'/>
    </controller>
    <controller type='pci' index='0' model='pcie-root'/>
    <controller type='pci' index='1' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='1' port='0x10'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x0' multifunction='on'/>
    </controller>
    <controller type='pci' index='2' model='pcie-to-pci-bridge'>
      <model name='pcie-pci-bridge'/>
      <address type='pci' domain='0x0000' bus='0x01' slot='0x00' function='0x0'/>
    </controller>
    <controller type='pci' index='3' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='3' port='0x11'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x1'/>
    </controller>
    <controller type='pci' index='4' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='4' port='0x12'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x2'/>
    </controller>
    <controller type='pci' index='5' model='pcie-root-port'>
      <model name='pcie-root-port'/>
      <target chassis='5' port='0x13'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x3'/>
    </controller>
    <interface type='network'>
      <mac address='52:54:00:bf:7b:01'/>
      <source network='cuckoonet1'/>
      <model type='e1000'/>
      <address type='pci' domain='0x0000' bus='0x02' slot='0x01' function='0x0'/>
    </interface>
    <serial type='pty'>
      <target type='isa-serial' port='0'>
        <model name='isa-serial'/>
      </target>
    </serial>
    <console type='pty'>
      <target type='serial' port='0'/>
    </console>
    <input type='tablet' bus='usb'>
      <address type='usb' bus='0' port='1'/>
    </input>
    <input type='mouse' bus='ps2'/>
    <input type='keyboard' bus='ps2'/>
    <graphics type='vnc' port='21101' autoport='no' websocket='11101' listen='0.0.0.0'>
      <listen type='address' address='0.0.0.0'/>
    </graphics>
    <audio id='1' type='none'/>
    <video>
      <model type='qxl' ram='65536' vram='65536' vgamem='16384' heads='1' primary='yes'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x0'/>
    </video>
    <memballoon model='virtio'>
      <address type='pci' domain='0x0000' bus='0x04' slot='0x00' function='0x0'/>
    </memballoon>
    <panic model='hyperv'/>
  </devices>
</domain>
```


Od: Daniel P. Berrangé <berrange@xxxxxxxxxx>
Odesláno: úterý 10. května 2022 10:08
Komu: Petr Beneš <w.benny@xxxxxxxxxxx>
Kopie: libvirt-users@xxxxxxxxxx <libvirt-users@xxxxxxxxxx>
Předmět: Re: Slow VM start/revert, when trying to start/revert dozens of VMs in parallel
 
On Mon, May 09, 2022 at 06:52:32PM +0000, Petr Beneš wrote:
> Hi,
>
> my problem can be described simply: libvirt can't handle starting dozens of VMs at the same time.
>
> (technically, it can, but it's really slow.)
>
> We have an AMD machine with 256 logical cores and 1.5T ram.
> On that machine there is roughly 200 VMs.
> Each VM is the same: 8GB of RAM, 4 VCPUs. Half of them is Win7 x86, the other half is Win7 x64.
> VMs are using qcow2 as the disk image. These images reside in the ramdisk (tmpfs).
>
> We use these machines for automatic malware analysis, so our scenario consists of this cycle:
> - reverting VM to a running state
> - execute sample inside of the VM for ~1-2 minutes
> - shutdown the VM
>
> Of course, this results in multiple VMs trying to start at the same time.
> At first, reverts/starts are really fast - second or two.
> After about a minute, the "revertToSnapshot" suddenly takes 10-15 seconds, which is really unacceptable.
> For comparison, we're running the same scenarion on Proxmox, where the revertToSnapshot usually takes 2 seconds.

Can you share the XML configuration of one of your guests - assuming
they all have the same basic configuration.

As a gut feeling it sounds to me like it could be initially fast due to
utilization of host I/O cache, but then slows down due to having to
flush data to disk / read fresh from disk. This could be the case if
the disk configuration cache mode is set to certain values, so the XML
config will show us this info.

With regards,
Daniel
--
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|


[Index of Archives]     [Virt Tools]     [Lib OS Info]     [Fedora Users]     [Fedora Desktop]     [Fedora SELinux]     [Yosemite News]     [KDE Users]

  Powered by Linux