Re: Network performance with small packets

Steve Dobbelstein <steved@xxxxxxxxxx> · Wed, 2 Feb 2011 13:15:54 -0600

"Michael S. Tsirkin" <mst@xxxxxxxxxx> wrote on 02/02/2011 12:38:47 PM:

> On Tue, Jan 25, 2011 at 03:09:34PM -0600, Steve Dobbelstein wrote:
> >
> > I am working on a KVM network performance issue found in our lab
running
> > the DayTrader benchmark.  The benchmark throughput takes a significant
hit
> > when running the application server in a KVM guest verses on bare
metal.
> > We have dug into the problem and found that DayTrader's use of small
> > packets exposes KVM's overhead of handling network packets.  I have
been
> > able to reproduce the performance hit with a simpler setup using the
> > netperf benchmark with the TCP_RR test and the request and response
sizes
> > set to 256 bytes.  I run the benchmark between two physical systems,
each
> > using a 1GB link.  In order to get the maximum throughput for the
system I
> > have to run 100 instances of netperf.  When I run the netserver
processes
> > in a guest, I see a maximum throughput that is 51% of what I get if I
run
> > the netserver processes directly on the host.  The CPU utilization in
the
> > guest is only 85% at maximum throughput, whereas it is 100% on bare
metal.
>
> You are stressing the scheduler pretty hard with this test :)
> Is your real benchmark also using a huge number of threads?

Yes.  The real benchmark has 60 threads handling client requests and 48
threads talking to a database server.

> If it's not, you might be seeing a different issue.
> IOW, the netperf degradation might not be network-related at all,
> but have to do with speed of context switch in guest.
> Thoughts?

Yes, context switches can add to the overhead.  We have that data captured,
and I can look at it.  What makes me think that's not the issue is that the
CPU utilization in the guest is only about 85% at maximum throughput.
Throughput/CPU is comparable to a different hypervisor, but that hypervisor
runs at full CPU utilization and gets better throughput.  I can't help but
think KVM would get better throughput if it could just keep the guest VCPUs
busy.

Recently I have been playing with different CPU pinnings for the guest
VCPUs and the vhost thread.  Certain combinations can get us up to a 35%
improvement in throughput with the same throughput/CPU ratio.  CPU
utilization was 94% -- not full CPU utilization, but it does illustrate
that we can get better throughput if we keep the guest VCPUs busy.  At this
point it's looking more like a scheduler issue.  We're starting to dig
through the scheduler code for clues.

Steve D.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html