Re: [RFC PATCH 0/4] Implement multiqueue virtio-net

Krishna Kumar2 <krkumar2@xxxxxxxxxx> · Mon, 13 Sep 2010 09:42:22 +0530

"Michael S. Tsirkin" <mst@xxxxxxxxxx> wrote on 09/12/2010 05:10:25 PM:

> > SINGLE vhost (Guest -> Host):
> >    1 netperf:    BW: 10.7%     SD: -1.4%
> >    4 netperfs:   BW: 3%        SD: 1.4%
> >    8 netperfs:   BW: 17.7%     SD: -10%
> >       16 netperfs:  BW: 4.7%      SD: -7.0%
> >       32 netperfs:  BW: -6.1%     SD: -5.7%
> > BW and SD both improves (guest multiple txqs help). For 32
> > netperfs, SD improves.
> >
> > But with multiple vhosts, guest is able to send more packets
> > and BW increases much more (SD too increases, but I think
> > that is expected).
>
> Why is this expected?

Results with the original kernel:
_____________________________
#       BW      SD      RSD
______________________________
1       20903   1       6
2       21963   6       25
4       22042   23      102
8       21674   97      419
16      22281   379     1663
24      22521   857     3748
32      22976   1528    6594
40      23197   2390    10239
48      22973   3542    15074
64      23809   6486    27244
80      23564   10169   43118
96      22977   14954   62948
128     23649   27067   113892
________________________________

With higher number of threads running in parallel, SD
increased. In this case most threads run in parallel
only till __dev_xmit_skb (#numtxqs=1). With mq TX patch,
higher number of threads run in parallel through
ndo_start_xmit. I *think* the increase in SD is to do
with higher # of threads running for larger code path
>From the numbers I posted with the patch (cut-n-paste
only the % parts), BW increased much more than the SD,
sometimes more than twice the increase in SD.

N#      BW%     SD%      RSD%
4       54.30   40.00    -1.16
8       71.79   46.59    -2.68
16      71.89   50.40    -2.50
32      72.24   34.26    -14.52
48      70.10   31.51    -14.35
64      69.01   38.81    -9.66
96      70.68   71.26    10.74

I also think SD calculation gets skewed for guest->local
host testing. For this test, I ran a guest with numtxqs=16.
The first result below is with my patch, which creates 16
vhosts. The second result is with a modified patch which
creates only 2 vhosts (testing with #netperfs = 64):

#vhosts  BW%     SD%        RSD%
16       20.79   186.01     149.74
2        30.89   34.55      18.44

The remote SD increases with the number of vhost threads,
but that number seems to correlate with guest SD. So though
BW% increased slightly from 20% to 30%, SD fell drastically
from 186% to 34%. I think it could be a calculation skew
with host SD, which also fell from 150% to 18%.

I am planning to submit 2nd patch rev with restricted
number of vhosts.

> > Likely cause for the 1 stream degradation with multiple
> > vhost patch:
> >
> > 1. Two vhosts run handling the RX and TX respectively.
> >    I think the issue is related to cache ping-pong esp
> >    since these run on different cpus/sockets.
>
> Right. With TCP I think we are better off handling
> TX and RX for a socket by the same vhost, so that
> packet and its ack are handled by the same thread.
> Is this what happens with RX multiqueue patch?
> How do we select an RX queue to put the packet on?

My (unsubmitted) RX patch doesn't do this yet, that is
something I will check.

Thanks,

- KK

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html