On 05/04/2010 03:56 PM, Takuya Yoshikawa wrote:
[Performance test] We measured the tsc needed to the ioctl()s for getting dirty logs in kernel. Test environment AMD Phenom(tm) 9850 Quad-Core Processor with 8GB memory 1. GUI test (running Ubuntu guest in graphical mode) sudo qemu-system-x86_64 -hda dirtylog_test.img -boot c -m 4192 -net ... We show a relatively stable part to compare how much time is needed for the basic parts of dirty log ioctl. get.org get.opt switch.opt slots[7].len=32768 278379 66398 64024 slots[8].len=32768 181246 270 160 slots[7].len=32768 263961 64673 64494 slots[8].len=32768 181655 265 160 slots[7].len=32768 263736 64701 64610 slots[8].len=32768 182785 267 160 slots[7].len=32768 260925 65360 65042 slots[8].len=32768 182579 264 160 slots[7].len=32768 267823 65915 65682 slots[8].len=32768 186350 271 160 At a glance, we know our optimization improved significantly compared to the original get dirty log ioctl. This is true for both get.opt and switch.opt. This has a really big impact for the personal KVM users who drive KVM in GUI mode on their usual PCs. Next, we notice that switch.opt improved a hundred nano seconds or so for these slots. Although this may sound a bit tiny improvement, we can feel this as a difference of GUI's responses like mouse reactions.
100 ns... this is a bit on the low side (and if you can measure it interactively you have much better reflexes than I).
To feel the difference, please try GUI on your PC with our patch series!
No doubt get.org -> get.opt is measurable, but get.opt->switch.opt is problematic. Have you tried profiling to see where the time is spent (well I can guess, clearing the write access from the sptes).
2. Live-migration test (4GB guest, write loop with 1GB buf) We also did a live-migration test. get.org get.opt switch.opt slots[0].len=655360 797383 261144 222181 slots[1].len=3757047808 2186721 1965244 1842824 slots[2].len=637534208 1433562 1012723 1031213 slots[3].len=131072 216858 331 331 slots[4].len=131072 121635 225 164 slots[5].len=131072 120863 356 164 slots[6].len=16777216 121746 1133 156 slots[7].len=32768 120415 230 278 slots[8].len=32768 120368 216 149 slots[0].len=655360 806497 194710 223582 slots[1].len=3757047808 2142922 1878025 1895369 slots[2].len=637534208 1386512 1021309 1000345 slots[3].len=131072 221118 459 296 slots[4].len=131072 121516 272 166 slots[5].len=131072 122652 244 173 slots[6].len=16777216 123226 99185 149 slots[7].len=32768 121803 457 505 slots[8].len=32768 121586 216 155 slots[0].len=655360 766113 211317 213179 slots[1].len=3757047808 2155662 1974790 1842361 slots[2].len=637534208 1481411 1020004 1031352 slots[3].len=131072 223100 351 295 slots[4].len=131072 122982 436 164 slots[5].len=131072 122100 300 503 slots[6].len=16777216 123653 779 151 slots[7].len=32768 122617 284 157 slots[8].len=32768 122737 253 149 For slots other than 0,1,2 we can see the similar improvement. Considering the fact that switch.opt does not depend on the bitmap length except for kvm_mmu_slot_remove_write_access(), this is the cause of some usec to msec time consumption: there might be some context switches. But note that this was done with the workload which dirtied the memory endlessly during the live-migration. In usual workload, the number of dirty pages varies a lot for each iteration and we should gain really a lot for relatively clean cases.
Can you post such a test, for an idle large guest? -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm-ia64" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html