On Friday 04 September 2009 09:48:17 am Andrew Theurer wrote: <snip> > > > > Still not idle=poll, it may shave off 0.2%. > > Won't this affect SMT in a negative way? (OK, I am not running SMT now, > but eventually we will be) A long time ago, we tested P4's with HT, and > a polling idle in one thread always negatively impacted performance in > the sibling thread. > > FWIW, I did try idle=halt, and it was slightly worse. > > I did get a chance to try the latest qemu (master and next heads). I > have been running into a problem with virtIO stor driver for windows on > anything much newer than kvm-87. I compiled the driver from the new git > tree, installed OK, but still had the same error. Finally, I removed > the serial number feature in the virtio-blk in qemu, and I can now get > the driver to work in Windows. What were the symptoms you were seeing (i.e. define "a problem"). > > So, not really any good news on performance with latest qemu builds. > Performance is slightly worse: > > qemu-kvm-87 > user nice system irq softirq guest idle iowait > 5.79 0.00 9.28 0.08 1.00 20.81 58.78 4.26 > total busy: 36.97 > > qemu-kvm-88-905-g6025b2d (master) > user nice system irq softirq guest idle iowait > 6.57 0.00 10.86 0.08 1.02 21.35 55.90 4.21 > total busy: 39.89 > > qemu-kvm-88-910-gbf8a05b (next) > user nice system irq softirq guest idle iowait > 6.60 0.00 10.91 0.09 1.03 21.35 55.71 4.31 > total busy: 39.98 > > diff of profiles, p1=qemu-kvm-87, p2=qemu-master > <snip> > > 18x more samples for gfn_to_memslot_unali*, 37x for > emulator_read_emula*, and more CPU time in guest mode. > > One other thing I decided to try was some cpu binding. I know this is > not practical for production, but I wanted to see if there's any benefit > at all. One reason was that a coworker here tried binding the qemu > thread for the vcpu and the qemu IO thread to the same cpu. On a > networking test, guest->local-host, throughput was up about 2x. > Obviously there was a nice effect of being on the same cache. I > wondered, even without full bore throughput tests, could we see any > benefit here. So, I bound each pair of VMs to a dedicated core. What I > saw was about a 6% improvement in performance. For a system which has > pretty incredible memory performance and is not that busy, I was > surprised that I got 6%. I am not advocating binding, but what I do > wonder: on 1-way VMs, if we keep all the qemu threads together on the > same CPU, but still allowing the scheduler to move them (all of them at > once) to different cpus over time, would we see the same benefit? > > One other thing: So far I have not been using preadv/pwritev. I assume > I need a more recent glibc (on 2.5 now) for qemu to take advantage of > this? Getting p(read|write)v working almost doubled my virtio-net throughput in a Linux guest. Not quite as much in Windows guests. Yes you need glibc-2.10. I think some distros might have backported it to 2.9. You will also need some support for it in your system includes. --Iggy > > Thanks! > > -Andrew > -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html