On Tue, Dec 02, 2014 at 08:15:02AM +0008, Jason Wang wrote: > > > On Tue, Dec 2, 2014 at 11:15 AM, Jason Wang <jasowang@xxxxxxxxxx> wrote: > > > > > >On Mon, Dec 1, 2014 at 6:42 PM, Michael S. Tsirkin <mst@xxxxxxxxxx> wrote: > >>On Mon, Dec 01, 2014 at 06:17:03PM +0800, Jason Wang wrote: > >>> Hello: > >>> We used to orphan packets before transmission for virtio-net. This > >>>breaks > >>> socket accounting and can lead serveral functions won't work, e.g: > >>> - Byte Queue Limit depends on tx completion nofication to work. > >>> - Packet Generator depends on tx completion nofication for the last > >>> transmitted packet to complete. > >>> - TCP Small Queue depends on proper accounting of sk_wmem_alloc to > >>>work. > >>> This series tries to solve the issue by enabling tx interrupts. To > >>>minize > >>> the performance impacts of this, several optimizations were used: > >>> - In guest side, virtqueue_enable_cb_delayed() was used to delay the > >>>tx > >>> interrupt untile 3/4 pending packets were sent. > >>> - In host side, interrupt coalescing were used to reduce tx > >>>interrupts. > >>> Performance test results[1] (tx-frames 16 tx-usecs 16) shows: > >>> - For guest receiving. No obvious regression on throughput were > >>> noticed. More cpu utilization were noticed in few cases. > >>> - For guest transmission. Very huge improvement on througput for > >>>small > >>> packet transmission were noticed. This is expected since TSQ and > >>>other > >>> optimization for small packet transmission work after tx interrupt. > >>>But > >>> will use more cpu for large packets. > >>> - For TCP_RR, regression (10% on transaction rate and cpu > >>>utilization) were > >>> found. Tx interrupt won't help but cause overhead in this case. > >>>Using > >>> more aggressive coalescing parameters may help to reduce the > >>>regression. > >> > >>OK, you do have posted coalescing patches - does it help any? > > > >Helps a lot. > > > >For RX, it saves about 5% - 10% cpu. (reduce 60%-90% tx intrs) > >For small packet TX, it increases 33% - 245% throughput. (reduce about 60% > >inters) > >For TCP_RR, it increase the 3%-10% trans.rate. (reduce 40%-80% tx intrs) > > > >> > >>I'm not sure the regression is due to interrupts. > >>It would make sense for CPU but why would it > >>hurt transaction rate? > > > >Anyway guest need to take some cycles to handle tx interrupts. > >And transaction rate does increase if we coalesces more tx interurpts. > >> > >> > >>It's possible that we are deferring kicks too much due to BQL. > >> > >>As an experiment: do we get any of it back if we do > >>- if (kick || netif_xmit_stopped(txq)) > >>- virtqueue_kick(sq->vq); > >>+ virtqueue_kick(sq->vq); > >>? > > > > > >I will try, but during TCP_RR, at most 1 packets were pending, > >I suspect if BQL can help in this case. > > Looks like this helps a lot in multiple sessions of TCP_RR. so what's faster BQL + kick each packet no BQL ? > How about move the BQL patch out of this series? > > Let's first converge tx interrupt and then introduce it? > (e.g with kicking after queuing X bytes?) Sounds good. _______________________________________________ Virtualization mailing list Virtualization@xxxxxxxxxxxxxxxxxxxxxxxxxx https://lists.linuxfoundation.org/mailman/listinfo/virtualization