>>> On 7/11/2013 at 03:36 AM, "Zhanghaoyu (A)" <haoyu.zhang@xxxxxxxxxx> wrote: > hi all, > > I met similar problem to these, while performing live migration or > save-restore test on the kvm platform (qemu:1.4.0, host:suse11sp2, > guest:suse11sp2), running tele-communication software suite in guest, > https://lists.gnu.org/archive/html/qemu-devel/2013-05/msg00098.html > http://comments.gmane.org/gmane.comp.emulators.kvm.devel/102506 > http://thread.gmane.org/gmane.comp.emulators.kvm.devel/100592 > https://bugzilla.kernel.org/show_bug.cgi?id=58771 > > After live migration or virsh restore [savefile], one process's CPU > utilization went up by about 30%, resulted in throughput degradation of this > process. > oprofile report on this process in guest, > pre live migration: > CPU: CPU with timer interrupt, speed 0 MHz (estimated) > Profiling through timer interrupt > samples % app name symbol name > 248 12.3016 no-vmlinux (no symbols) > 78 3.8690 libc.so.6 memset > 68 3.3730 libc.so.6 memcpy > 30 1.4881 cscf.scu SipMmBufMemAlloc > 29 1.4385 libpthread.so.0 pthread_mutex_lock > 26 1.2897 cscf.scu SipApiGetNextIe > 25 1.2401 cscf.scu DBFI_DATA_Search > 20 0.9921 libpthread.so.0 __pthread_mutex_unlock_usercnt > 16 0.7937 cscf.scu DLM_FreeSlice > 16 0.7937 cscf.scu receivemessage > 15 0.7440 cscf.scu SipSmCopyString > 14 0.6944 cscf.scu DLM_AllocSlice > > post live migration: > CPU: CPU with timer interrupt, speed 0 MHz (estimated) > Profiling through timer interrupt > samples % app name symbol name > 1586 42.2370 libc.so.6 memcpy > 271 7.2170 no-vmlinux (no symbols) > 83 2.2104 libc.so.6 memset > 41 1.0919 libpthread.so.0 __pthread_mutex_unlock_usercnt > 35 0.9321 cscf.scu SipMmBufMemAlloc > 29 0.7723 cscf.scu DLM_AllocSlice > 28 0.7457 libpthread.so.0 pthread_mutex_lock > 23 0.6125 cscf.scu SipApiGetNextIe > 17 0.4527 cscf.scu SipSmCopyString > 16 0.4261 cscf.scu receivemessage > 15 0.3995 cscf.scu SipcMsgStatHandle > 14 0.3728 cscf.scu Urilex > 12 0.3196 cscf.scu DBFI_DATA_Search > 12 0.3196 cscf.scu SipDsmGetHdrBitValInner > 12 0.3196 cscf.scu SipSmGetDataFromRefString > > So, memcpy costs much more cpu cycles after live migration. Then, I restart > the process, this problem disappeared. save-restore has the similar problem. > > perf report on vcpu thread in host, > pre live migration: > Performance counter stats for thread id '21082': > > 0 page-faults > 0 minor-faults > 0 major-faults > 31616 cs > 506 migrations > 0 alignment-faults > 0 emulation-faults > 5075957539 L1-dcache-loads > [21.32%] > 324685106 L1-dcache-load-misses # 6.40% of all L1-dcache hits > [21.85%] > 3681777120 L1-dcache-stores > [21.65%] > 65251823 L1-dcache-store-misses # 1.77% > [22.78%] > 0 L1-dcache-prefetches > [22.84%] > 0 L1-dcache-prefetch-misses > [22.32%] > 9321652613 L1-icache-loads > [22.60%] > 1353418869 L1-icache-load-misses # 14.52% of all L1-icache hits > [21.92%] > 169126969 LLC-loads > [21.87%] > 12583605 LLC-load-misses # 7.44% of all LL-cache hits > [ 5.84%] > 132853447 LLC-stores > [ 6.61%] > 10601171 LLC-store-misses #7.9% > [ 5.01%] > 25309497 LLC-prefetches #30% > [ 4.96%] > 7723198 LLC-prefetch-misses > [ 6.04%] > 4954075817 dTLB-loads > [11.56%] > 26753106 dTLB-load-misses # 0.54% of all dTLB cache hits > [16.80%] > 3553702874 dTLB-stores > [22.37%] > 4720313 dTLB-store-misses #0.13% > [21.46%] > <not counted> dTLB-prefetches > <not counted> dTLB-prefetch-misses > > 60.000920666 seconds time elapsed > > post live migration: > Performance counter stats for thread id '1579': > > 0 page-faults > [100.00%] > 0 minor-faults > [100.00%] > 0 major-faults > [100.00%] > 34979 cs > [100.00%] > 441 migrations > [100.00%] > 0 alignment-faults > [100.00%] > 0 emulation-faults > 6903585501 L1-dcache-loads > [22.06%] > 525939560 L1-dcache-load-misses # 7.62% of all L1-dcache hits > [21.97%] > 5042552685 L1-dcache-stores > [22.20%] > 94493742 L1-dcache-store-misses #1.8% > [22.06%] > 0 L1-dcache-prefetches > [22.39%] > 0 L1-dcache-prefetch-misses > [22.47%] > 13022953030 L1-icache-loads > [22.25%] > 1957161101 L1-icache-load-misses # 15.03% of all L1-icache hits > [22.47%] > 348479792 LLC-loads > [22.27%] > 80662778 LLC-load-misses # 23.15% of all LL-cache hits > [ 5.64%] > 198745620 LLC-stores > [ 5.63%] > 14236497 LLC-store-misses # 7.1% > [ 5.41%] > 20757435 LLC-prefetches > [ 5.42%] > 5361819 LLC-prefetch-misses # 25% > [ 5.69%] > 7235715124 dTLB-loads > [11.26%] > 49895163 dTLB-load-misses # 0.69% of all dTLB cache hits > [16.96%] > 5168276218 dTLB-stores > [22.44%] > 6765983 dTLB-store-misses #0.13% > [22.24%] > <not counted> dTLB-prefetches > <not counted> dTLB-prefetch-misses > > The "LLC-load-misses" went up by about 16%. Then, I restarted the process in > guest, the perf data back to normal, > Performance counter stats for thread id '1579': > > 0 page-faults > [100.00%] > 0 minor-faults > [100.00%] > 0 major-faults > [100.00%] > 30594 cs > [100.00%] > 327 migrations > [100.00%] > 0 alignment-faults > [100.00%] > 0 emulation-faults > 7707091948 L1-dcache-loads > [22.10%] > 559829176 L1-dcache-load-misses # 7.26% of all L1-dcache hits > [22.28%] > 5976654983 L1-dcache-stores > [23.22%] > 160436114 L1-dcache-store-misses > [22.80%] > 0 L1-dcache-prefetches > [22.51%] > 0 L1-dcache-prefetch-misses > [22.53%] > 13798415672 L1-icache-loads > [22.28%] > 2017724676 L1-icache-load-misses # 14.62% of all L1-icache hits > [22.49%] > 254598008 LLC-loads > [22.86%] > 16035378 LLC-load-misses # 6.30% of all LL-cache hits > [ 5.36%] > 307019606 LLC-stores > [ 5.60%] > 13665033 LLC-store-misses > [ 5.43%] > 17715554 LLC-prefetches > [ 5.57%] > 4187006 LLC-prefetch-misses > [ 5.44%] > 7811502895 dTLB-loads > [10.72%] > 40547330 dTLB-load-misses # 0.52% of all dTLB cache hits > [16.31%] > 6144202516 dTLB-stores > [21.58%] > 6313363 dTLB-store-misses > [21.91%] > <not counted> dTLB-prefetches > <not counted> dTLB-prefetch-misses > > 60.000812523 seconds time elapsed > > If EPT disabled, this problem gone. > > I suspect that kvm hypervisor has business with this problem. > Based on above suspect, I want to find the two adjacent versions of kvm-kmod > which triggers this problem or not (e.g. 2.6.39, 3.0-rc1), > and analyze the differences between this two versions, or apply the patches > between this two versions by bisection method, finally find the key patches. > > Any better ideas? > > Thanks, > Zhang Haoyu I've attempted to duplicate this on a number of machines that are as similar to yours as I am able to get my hands on, and so far have not been able to see any performance degradation. And from what I've read in the above links, huge pages do not seem to be part of the problem. So, if you are in a position to bisect the kernel changes, that would probably be the best avenue to pursue in my opinion. Bruce -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html