On 10/15/2014 05:51 AM, Michael S. Tsirkin wrote: > On Sat, Oct 11, 2014 at 03:16:46PM +0800, Jason Wang wrote: >> > We free transmitted packets in ndo_start_xmit() in the past to get better >> > performance in the past. One side effect is that skb_orphan() needs to be >> > called in ndo_start_xmit() which makes sk_wmem_alloc not accurate in >> > fact. For TCP protocol, this means several optimization could not work well >> > such as TCP small queue and auto corking. This can lead extra low >> > throughput of small packets stream. >> > >> > Thanks to the urgent descriptor support. This patch tries to solve this >> > issue by enable the tx interrupt selectively for stream packets. This means >> > we don't need to orphan TCP stream packets in ndo_start_xmit() but enable >> > tx interrupt for those packets. After we get tx interrupt, a tx napi was >> > scheduled to free those packets. >> > >> > With this method, sk_wmem_alloc of TCP socket were more accurate than in >> > the past which let TCP can batch more through TSQ and auto corking. >> > >> > Signed-off-by: Jason Wang <jasowang@xxxxxxxxxx> >> > --- >> > drivers/net/virtio_net.c | 164 ++++++++++++++++++++++++++++++++++++----------- >> > 1 file changed, 128 insertions(+), 36 deletions(-) >> > >> > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c >> > index 5810841..b450fc4 100644 >> > --- a/drivers/net/virtio_net.c >> > +++ b/drivers/net/virtio_net.c >> > @@ -72,6 +72,8 @@ struct send_queue { >> > >> > /* Name of the send queue: output.$index */ >> > char name[40]; >> > + >> > + struct napi_struct napi; >> > }; >> > >> > /* Internal representation of a receive virtqueue */ >> > @@ -217,15 +219,40 @@ static struct page *get_a_page(struct receive_queue *rq, gfp_t gfp_mask) >> > return p; >> > } >> > >> > +static int free_old_xmit_skbs(struct send_queue *sq, int budget) >> > +{ >> > + struct sk_buff *skb; >> > + unsigned int len; >> > + struct virtnet_info *vi = sq->vq->vdev->priv; >> > + struct virtnet_stats *stats = this_cpu_ptr(vi->stats); >> > + int sent = 0; >> > + >> > + while (sent < budget && >> > + (skb = virtqueue_get_buf(sq->vq, &len)) != NULL) { >> > + pr_debug("Sent skb %p\n", skb); >> > + >> > + u64_stats_update_begin(&stats->tx_syncp); >> > + stats->tx_bytes += skb->len; >> > + stats->tx_packets++; >> > + u64_stats_update_end(&stats->tx_syncp); >> > + >> > + dev_kfree_skb_any(skb); >> > + sent++; >> > + } >> > + >> > + return sent; >> > +} >> > + >> > static void skb_xmit_done(struct virtqueue *vq) >> > { >> > struct virtnet_info *vi = vq->vdev->priv; >> > + struct send_queue *sq = &vi->sq[vq2txq(vq)]; >> > >> > - /* Suppress further interrupts. */ >> > - virtqueue_disable_cb(vq); >> > - >> > - /* We were probably waiting for more output buffers. */ >> > - netif_wake_subqueue(vi->dev, vq2txq(vq)); >> > + if (napi_schedule_prep(&sq->napi)) { >> > + virtqueue_disable_cb(vq); >> > + virtqueue_disable_cb_urgent(vq); > This disable_cb is no longer safe in xmit_done callback, > since queue can be running at the same time. > > You must do it under tx lock. And yes, this likely will not work > work well without event_idx. We'll probably need extra > synchronization for such old hosts. > > > Yes, and the virtqueue_enable_cb_prepare() in virtnet_poll_tx() needs to be synced with virtqueue_enabled_cb_dealyed(). Otherwise an old idx will be published and we may still see tx interrupt storm. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html