"Michael S. Tsirkin" <mst@xxxxxxxxxx> wrote on 02/02/2011 12:38:47 PM: > On Tue, Jan 25, 2011 at 03:09:34PM -0600, Steve Dobbelstein wrote: > > > > I am working on a KVM network performance issue found in our lab running > > the DayTrader benchmark. The benchmark throughput takes a significant hit > > when running the application server in a KVM guest verses on bare metal. > > We have dug into the problem and found that DayTrader's use of small > > packets exposes KVM's overhead of handling network packets. I have been > > able to reproduce the performance hit with a simpler setup using the > > netperf benchmark with the TCP_RR test and the request and response sizes > > set to 256 bytes. I run the benchmark between two physical systems, each > > using a 1GB link. In order to get the maximum throughput for the system I > > have to run 100 instances of netperf. When I run the netserver processes > > in a guest, I see a maximum throughput that is 51% of what I get if I run > > the netserver processes directly on the host. The CPU utilization in the > > guest is only 85% at maximum throughput, whereas it is 100% on bare metal. > > You are stressing the scheduler pretty hard with this test :) > Is your real benchmark also using a huge number of threads? Yes. The real benchmark has 60 threads handling client requests and 48 threads talking to a database server. > If it's not, you might be seeing a different issue. > IOW, the netperf degradation might not be network-related at all, > but have to do with speed of context switch in guest. > Thoughts? Yes, context switches can add to the overhead. We have that data captured, and I can look at it. What makes me think that's not the issue is that the CPU utilization in the guest is only about 85% at maximum throughput. Throughput/CPU is comparable to a different hypervisor, but that hypervisor runs at full CPU utilization and gets better throughput. I can't help but think KVM would get better throughput if it could just keep the guest VCPUs busy. Recently I have been playing with different CPU pinnings for the guest VCPUs and the vhost thread. Certain combinations can get us up to a 35% improvement in throughput with the same throughput/CPU ratio. CPU utilization was 94% -- not full CPU utilization, but it does illustrate that we can get better throughput if we keep the guest VCPUs busy. At this point it's looking more like a scheduler issue. We're starting to dig through the scheduler code for clues. Steve D. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html