RE: [Qemu-devel] vm performance degradation after kvm live migration or save-restore with EPT enabled

"Zhanghaoyu (A)" <haoyu.zhang@xxxxxxxxxx> · Sat, 31 Aug 2013 07:45:28 +0000

I tested below combos of qemu and kernel,
+------------------------+-----------------+-------------+
|        kernel          |      QEMU       |  migration  |
+------------------------+-----------------+-------------+
| SLES11SP2+kvm-kmod-3.6 |   qemu-1.6.0    |    GOOD     |
+------------------------+-----------------+-------------+
| SLES11SP2+kvm-kmod-3.6 |   qemu-1.6.0*   |    BAD      |
+------------------------+-----------------+-------------+
| SLES11SP2+kvm-kmod-3.6 |   qemu-1.5.1    |    BAD      |
+------------------------+-----------------+-------------+
| SLES11SP2+kvm-kmod-3.6*|   qemu-1.5.1    |    GOOD     |
+------------------------+-----------------+-------------+
| SLES11SP2+kvm-kmod-3.6 |   qemu-1.5.1*   |    GOOD     |
+------------------------+-----------------+-------------+
| SLES11SP2+kvm-kmod-3.6 |   qemu-1.5.2    |    BAD      |
+------------------------+-----------------+-------------+
| kvm-3.11-2             |   qemu-1.5.1    |    BAD      |
+------------------------+-----------------+-------------+
NOTE:
1. kvm-3.11-2 : the whole tag kernel downloaded from https://git.kernel.org/pub/scm/virt/kvm/kvm.git
2. SLES11SP2+kvm-kmod-3.6 : our release kernel, replace the SLES11SP2's default kvm-kmod with kvm-kmod-3.6, SLES11SP2's kernel version is 3.0.13-0.27
3. qemu-1.6.0* : revert the commit 211ea74022f51164a7729030b28eec90b6c99a08 on qemu-1.6.0
4. kvm-kmod-3.6* : kvm-kmod-3.6 with EPT disabled
5. qemu-1.5.1* : apply below patch to qemu-1.5.1 to delete qemu_madvise() statement in ram_load() function

--- qemu-1.5.1/arch_init.c      2013-06-27 05:47:29.000000000 +0800
+++ qemu-1.5.1_fix3/arch_init.c 2013-08-28 19:43:42.000000000 +0800
@@ -842,7 +842,6 @@ static int ram_load(QEMUFile *f, void *o
             if (ch == 0 &&
                 (!kvm_enabled() || kvm_has_sync_mmu()) &&
                 getpagesize() <= TARGET_PAGE_SIZE) {
-                qemu_madvise(host, TARGET_PAGE_SIZE, QEMU_MADV_DONTNEED);
             }
 #endif
         } else if (flags & RAM_SAVE_FLAG_PAGE) {

If I apply above patch to qemu-1.5.1 to delete the qemu_madvise() statement, the test result of the combos of SLES11SP2+kvm-kmod-3.6 and qemu-1.5.1 is good.
Why do we perform the qemu_madvise(QEMU_MADV_DONTNEED) for those zero pages?
Does the qemu_madvise() have sustained effect on the range of virtual address?  In other words, does qemu_madvise() have sustained effect on the VM performance?
If later frequently read/write the range of virtual address which have been advised to DONTNEED, could performance degradation happen?

The reason why the combos of SLES11SP2+kvm-kmod-3.6 and qemu-1.6.0 is good, is because of commit 211ea74022f51164a7729030b28eec90b6c99a08,
if I revert the commit 211ea74022f51164a7729030b28eec90b6c99a08 on qemu-1.6.0, the test result of combos of SLES11SP2+kvm-kmod-3.6 and qemu-1.6.0 is bad, performance degradation happened, too.

Thanks,
Zhang Haoyu

>> >>> The QEMU command line (/var/log/libvirt/qemu/[domain name].log), 
>> >>> LC_ALL=C PATH=/bin:/sbin:/usr/bin:/usr/sbin HOME=/ 
>> >>> QEMU_AUDIO_DRV=none
>> >>> /usr/local/bin/qemu-system-x86_64 -name ATS1 -S -M pc-0.12 -cpu
>> >>> qemu32 -enable-kvm -m 12288 -smp 4,sockets=4,cores=1,threads=1 
>> >>> -uuid
>> >>> 0505ec91-382d-800e-2c79-e5b286eb60b5 -no-user-config -nodefaults 
>> >>> -chardev 
>> >>> socket,id=charmonitor,path=/var/lib/libvirt/qemu/ATS1.monitor,serv
>> >>> er, n owait -mon chardev=charmonitor,id=monitor,mode=control -rtc 
>> >>> base=localtime -no-shutdown -device
>> >>> piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -drive 
>> >>> file=/opt/ne/vm/ATS1.img,if=none,id=drive-virtio-disk0,format=raw,
>> >>> cac
>> >>> h
>> >>> e=none -device
>> >>> virtio-blk-pci,scsi=off,bus=pci.0,addr=0x8,drive=drive-virtio-disk
>> >>> 0,i
>> >>> d
>> >>> =virtio-disk0,bootindex=1 -netdev
>> >>> tap,fd=20,id=hostnet0,vhost=on,vhostfd=21 -device 
>> >>> virtio-net-pci,netdev=hostnet0,id=net0,mac=00:e0:fc:00:0f:00,bus=pci.
>> >>> 0
>> >>> ,addr=0x3,bootindex=2 -netdev
>> >>> tap,fd=22,id=hostnet1,vhost=on,vhostfd=23 -device 
>> >>> virtio-net-pci,netdev=hostnet1,id=net1,mac=00:e0:fc:01:0f:00,bus=pci.
>> >>> 0
>> >>> ,addr=0x4 -netdev tap,fd=24,id=hostnet2,vhost=on,vhostfd=25 
>> >>> -device virtio-net-pci,netdev=hostnet2,id=net2,mac=00:e0:fc:02:0f:00,bus=pci.
>> >>> 0
>> >>> ,addr=0x5 -netdev tap,fd=26,id=hostnet3,vhost=on,vhostfd=27 
>> >>> -device virtio-net-pci,netdev=hostnet3,id=net3,mac=00:e0:fc:03:0f:00,bus=pci.
>> >>> 0
>> >>> ,addr=0x6 -netdev tap,fd=28,id=hostnet4,vhost=on,vhostfd=29 
>> >>> -device virtio-net-pci,netdev=hostnet4,id=net4,mac=00:e0:fc:0a:0f:00,bus=pci.
>> >>> 0
>> >>> ,addr=0x7 -netdev tap,fd=30,id=hostnet5,vhost=on,vhostfd=31 
>> >>> -device virtio-net-pci,netdev=hostnet5,id=net5,mac=00:e0:fc:0b:0f:00,bus=pci.
>> >>> 0
>> >>> ,addr=0x9 -chardev pty,id=charserial0 -device 
>> >>> isa-serial,chardev=charserial0,id=serial0 -vnc *:0 -k en-us -vga 
>> >>> cirrus -device i6300esb,id=watchdog0,bus=pci.0,addr=0xb
>> >>> -watchdog-action poweroff -device
>> >>> virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0xa
>> >>> 
>> >>Which QEMU version is this? Can you try with e1000 NICs instead of virtio?
>> >>
>> >This QEMU version is 1.0.0, but I also test QEMU 1.5.2, the same problem exists, including the performance degradation and readonly GFNs' flooding.
>> >I tried with e1000 NICs instead of virtio, including the performance degradation and readonly GFNs' flooding, the QEMU version is 1.5.2.
>> >No matter e1000 NICs or virtio NICs, the GFNs' flooding is initiated at post-restore stage (i.e. running stage), as soon as the restoring completed, the flooding is starting.
>> >
>> >Thanks,
>> >Zhang Haoyu
>> >
>> >>--
>> >>			Gleb.
>> 
>> Should we focus on the first bad commit(612819c3c6e67bac8fceaa7cc402f13b1b63f7e4) and the surprising GFNs' flooding?
>> 
>Not really. There is no point in debugging very old version compiled with kvm-kmod, there are to many variables in the environment. I cannot reproduce the GFN flooding on upstream, so the problem may be gone, may be a result of kvm-kmod problem or something different in how I invoke qemu. So the best way to proceed is for you to reproduce with upstream version then at least I will be sure that we are using the same code.
>
>> I applied below patch to  __direct_map(), @@ -2223,6 +2223,8 @@ static 
>> int __direct_map(struct kvm_vcpu
>>         int pt_write = 0;
>>         gfn_t pseudo_gfn;
>> 
>> +        map_writable = true;
>> +
>>         for_each_shadow_entry(vcpu, (u64)gfn << PAGE_SHIFT, iterator) {
>>                 if (iterator.level == level) {
>>                         unsigned pte_access = ACC_ALL; and rebuild the 
>> kvm-kmod, then re-insmod it.
>> After I started a VM, the host seemed to be abnormal, so many programs cannot be started successfully, segmentation fault is reported.
>> In my opinion, after above patch applied, the commit: 612819c3c6e67bac8fceaa7cc402f13b1b63f7e4 should be of no effect, but the test result proved me wrong.
>> Dose the map_writable value's getting process in hva_to_pfn() have effect on the result?
>> 
>If hva_to_pfn() returns map_writable == false it means that page is mapped as read only on primary MMU, so it should not be mapped writable on secondary MMU either. This should not happen usually.
>
>--
>			Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html