Re: [PATCH] KVM: Use thread debug register storage instead of kvm specific data

Brian Jackson <iggy@xxxxxxxxxxx> · Fri, 4 Sep 2009 10:30:39 -0500

On Friday 04 September 2009 09:48:17 am Andrew Theurer wrote:
<snip>
> >
> > Still not idle=poll, it may shave off 0.2%.
> 
> Won't this affect SMT in a negative way?  (OK, I am not running SMT now,
> but eventually we will be) A long time ago, we tested P4's with HT, and
> a polling idle in one thread always negatively impacted performance in
> the sibling thread.
> 
> FWIW, I did try idle=halt, and it was slightly worse.
> 
> I did get a chance to try the latest qemu (master and next heads).  I
> have been running into a problem with virtIO stor driver for windows on
> anything much newer than kvm-87.  I compiled the driver from the new git
> tree, installed OK, but still had the same error.  Finally, I removed
> the serial number feature in the virtio-blk in qemu, and I can now get
> the driver to work in Windows.

What were the symptoms you were seeing (i.e. define "a problem").

> 
> So, not really any good news on performance with latest qemu builds.
> Performance is slightly worse:
> 
> qemu-kvm-87
> user  nice  system   irq  softirq guest   idle  iowait
> 5.79  0.00    9.28  0.08     1.00 20.81  58.78    4.26
> total busy: 36.97
> 
> qemu-kvm-88-905-g6025b2d (master)
> user  nice  system   irq  softirq guest   idle  iowait
> 6.57  0.00   10.86  0.08     1.02 21.35  55.90    4.21
> total busy: 39.89
> 
> qemu-kvm-88-910-gbf8a05b (next)
> user  nice  system   irq  softirq guest   idle  iowait
> 6.60  0.00  10.91   0.09     1.03 21.35  55.71    4.31
> total busy: 39.98
> 
> diff of profiles, p1=qemu-kvm-87, p2=qemu-master
> 
<snip>
> 
> 18x more samples for gfn_to_memslot_unali*, 37x for
> emulator_read_emula*, and more CPU time in guest mode.
> 
> One other thing I decided to try was some cpu binding.  I know this is
> not practical for production, but I wanted to see if there's any benefit
> at all.  One reason was that a coworker here tried binding the qemu
> thread for the vcpu and the qemu IO thread to the same cpu.  On a
> networking test, guest->local-host, throughput was up about 2x.
> Obviously there was a nice effect of being on the same cache.  I
> wondered, even without full bore throughput tests, could we see any
> benefit here.  So, I bound each pair of VMs to a dedicated core.  What I
> saw was about a 6% improvement in performance.  For a system which has
> pretty incredible memory performance and is not that busy, I was
> surprised that I got 6%.  I am not advocating binding, but what I do
> wonder:  on 1-way VMs, if we keep all the qemu threads together on the
> same CPU, but still allowing the scheduler to move them (all of them at
> once) to different cpus over time, would we see the same benefit?
> 
> One other thing:  So far I have not been using preadv/pwritev.  I assume
> I need a more recent glibc (on 2.5 now) for qemu to take advantage of
> this?

Getting p(read|write)v working almost doubled my virtio-net throughput in a 
Linux guest. Not quite as much in Windows guests. Yes you need glibc-2.10. I 
think some distros might have backported it to 2.9. You will also need some 
support for it in your system includes.

--Iggy

> 
> Thanks!
> 
> -Andrew
> 

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html