Re: [PATCH] KVM: Use thread debug register storage instead of kvm specific data

Avi Kivity <avi@xxxxxxxxxx> · Sun, 06 Sep 2009 11:21:15 +0300

On 09/04/2009 05:48 PM, Andrew Theurer wrote:

Still not idle=poll, it may shave off 0.2%.

Won't this affect SMT in a negative way?  (OK, I am not running SMT now,
but eventually we will be) A long time ago, we tested P4's with HT, and
a polling idle in one thread always negatively impacted performance in
the sibling thread.

Sorry, I meant idle=halt.  idle=poll is too wasteful to be used.

FWIW, I did try idle=halt, and it was slightly worse.

Interesting, I've heard that mwait latency is bad for spinlocks, guess 
it's fine for idle.

profile1 is qemu-kvm-87
profile2 is qemu-master
Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with a unit mask of 0x00 (No unit mask) count 10000000
total samples (ts1) for profile1 is 1616921
total samples (ts2) for profile2 is 1752347 (includes multiplier of 0.995420)
functions which have a abs(pct2-pct1)<  0.06 are not displayed

                               pct2:   pct1:
                                100*    100*  pct2
        s1        s2   s2/s1  s2/ts1  s1/ts1  -pct1 symbol                     bin
--------- --------- ------- ------- ------- ------ ------                     ---
    879611    907883  1.03/1  56.149  54.400  1.749 vmx_vcpu_run               kvm
       614     11553 18.82/1   0.715   0.038  0.677 gfn_to_memslot_unali    kvm.ko
     34511     44922  1.30/1   2.778   2.134  0.644 phys_page_find_alloc      qemu
      2866      9334  3.26/1   0.577   0.177  0.400 paging64_walk_addr      kvm.ko
     11139     17200  1.54/1   1.064   0.689  0.375 copy_user_generic_st   vmlinux
      3100      7108  2.29/1   0.440   0.192  0.248 x86_decode_insn         kvm.ko
      8169     11873  1.45/1   0.734   0.505  0.229 virtqueue_avail_byte      qemu
      1103      4540  4.12/1   0.281   0.068  0.213 kvm_read_guest          kvm.ko
     17427     20401  1.17/1   1.262   1.078  0.184 memcpy                    libc
         0      2905           0.180   0.000  0.180 gfn_to_pfn              kvm.ko
      1831      4328  2.36/1   0.268   0.113  0.154 x86_emulate_insn        kvm.ko
        65      2431 37.41/1   0.150   0.004  0.146 emulator_read_emulat    kvm.ko
     14922     17196  1.15/1   1.064   0.923  0.141 qemu_get_ram_ptr          qemu
       545      2724  5.00/1   0.168   0.034  0.135 emulate_instruction     kvm.ko
       599      2464  4.11/1   0.152   0.037  0.115 kvm_read_guest_page     kvm.ko
       503      2355  4.68/1   0.146   0.031  0.115 gfn_to_hva              kvm.ko
      1076      2918  2.71/1   0.181   0.067  0.114 memcpy_c               vmlinux
       594      2241  3.77/1   0.139   0.037  0.102 next_segment            kvm.ko
      1680      3248  1.93/1   0.201   0.104  0.097 pipe_poll              vmlinux
         0      1463           0.090   0.000  0.090 subpage_readl             qemu
         0      1363           0.084   0.000  0.084 msix_enabled              qemu
       527      1883  3.57/1   0.116   0.033  0.084 paging64_gpte_to_gfn    kvm.ko
       962      2223  2.31/1   0.138   0.059  0.078 do_insn_fetch           kvm.ko
       348      1605  4.61/1   0.099   0.022  0.078 is_rsvd_bits_set        kvm.ko
       520      1763  3.39/1   0.109   0.032  0.077 unalias_gfn             kvm.ko
         1      1163 1163.65/1   0.072   0.000  0.072 tdp_page_fault          kvm.ko
      3827      4912  1.28/1   0.304   0.237  0.067 __down_read            vmlinux
         0      1014           0.063   0.000  0.063 mapping_level           kvm.ko
       973         0           0.000   0.060 -0.060 pm_ioport_readl           qemu
      1635       528  1/3.09   0.033   0.101 -0.068 ioport_read               qemu
      2179      1017  1/2.14   0.063   0.135 -0.072 kvm_emulate_pio         kvm.ko
     25141     23722  1/1.06   1.467   1.555 -0.088 native_write_msr_saf   vmlinux
      1560         0           0.000   0.096 -0.096 eventfd_poll           vmlinux
                             ------- ------- ------
                             105.100  97.450  7.650

18x more samples for gfn_to_memslot_unali*, 37x for
emulator_read_emula*, and more CPU time in guest mode.

And 5x more instructions emulated.  I wonder where that comes from.

One other thing:  So far I have not been using preadv/pwritev.  I assume
I need a more recent glibc (on 2.5 now) for qemu to take advantage of
this?

Yes, but it should be easy to write a LD_PRELOAD hack that will work 
with your current glibc.  It should certainly improve things.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html