On 2023/3/8 20:21, Michael S. Tsirkin wrote: > On Wed, Mar 08, 2023 at 04:13:12PM +0800, Yunsheng Lin wrote: >> On 2023/3/8 15:14, Xuan Zhuo wrote: >>> On Wed, 8 Mar 2023 14:59:36 +0800, Yunsheng Lin <linyunsheng@xxxxxxxxxx> wrote: >>>> On 2023/3/8 10:49, Xuan Zhuo wrote: >>>>> If the queue of xdp xmit is not an independent queue, then when the xdp >>>>> xmit used all the desc, the xmit from the __dev_queue_xmit() may encounter >>>>> the following error. >>>>> >>>>> net ens4: Unexpected TXQ (0) queue failure: -28 >>>>> >>>>> This patch adds a check whether sq is full in xdp xmit. >>>>> >>>>> Fixes: 56434a01b12e ("virtio_net: add XDP_TX support") >>>>> Reported-by: Yichun Zhang <yichun@xxxxxxxxxxxxx> >>>>> Signed-off-by: Xuan Zhuo <xuanzhuo@xxxxxxxxxxxxxxxxx> >>>>> Reviewed-by: Alexander Duyck <alexanderduyck@xxxxxx> >>>>> Acked-by: Michael S. Tsirkin <mst@xxxxxxxxxx> >>>>> --- >>>>> drivers/net/virtio_net.c | 3 +++ >>>>> 1 file changed, 3 insertions(+) >>>>> >>>>> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c >>>>> index 46bbddaadb0d..1a309cfb4976 100644 >>>>> --- a/drivers/net/virtio_net.c >>>>> +++ b/drivers/net/virtio_net.c >>>>> @@ -767,6 +767,9 @@ static int virtnet_xdp_xmit(struct net_device *dev, >>>>> } >>>>> ret = nxmit; >>>>> >>>>> + if (!is_xdp_raw_buffer_queue(vi, sq - vi->sq)) >>>>> + check_sq_full_and_disable(vi, dev, sq); >>>>> + >>>> >>>> Sorry if I missed something obvious here. >>>> >>>> As the comment in start_xmit(), the current skb is added to the sq->vq, so >>>> NETDEV_TX_BUSY can not be returned. >>>> >>>> /* If running out of space, stop queue to avoid getting packets that we >>>> * are then unable to transmit. >>>> * An alternative would be to force queuing layer to requeue the skb by >>>> * returning NETDEV_TX_BUSY. However, NETDEV_TX_BUSY should not be >>>> * returned in a normal path of operation: it means that driver is not >>>> * maintaining the TX queue stop/start state properly, and causes >>>> * the stack to do a non-trivial amount of useless work. >>>> * Since most packets only take 1 or 2 ring slots, stopping the queue >>>> * early means 16 slots are typically wasted. >>>> */ >>>> >>>> It there any reason not to check the sq->vq->num_free at the begin of start_xmit(), >>>> if the space is not enough for the current skb, TX queue is stopped and NETDEV_TX_BUSY >>>> is return to the stack to requeue the current skb. >>>> >>>> It seems it is the pattern that most network driver follow, and it seems we can avoid >>>> calling check_sq_full_and_disable() in this patch and not wasting 16 slots as mentioned >>>> in the comment above. >>>> >>> >>> >>> >>> * netdev_tx_t (*ndo_start_xmit)(struct sk_buff *skb, >>> * struct net_device *dev); >>> * Called when a packet needs to be transmitted. >>> * Returns NETDEV_TX_OK. Can return NETDEV_TX_BUSY, but you should stop >>> * the queue before that can happen; it's for obsolete devices and weird >>> * corner cases, but the stack really does a non-trivial amount >>> * of useless work if you return NETDEV_TX_BUSY. >>> * Required; cannot be NULL. >> >> Thanks for the pointer. It is intersting, it seems most driver is not flollowing >> the suggestion. > > Yes - I don't know why. I digged a little deeper. And there was a long discussion about this. https://patchwork.ozlabs.org/project/netdev/patch/200906031247.05591.rusty@xxxxxxxxxxxxxxx/#70283 > >> I found out why the above comment was added, but I am not sure I understand >> what does "non-trivial amount of useless work" means yet. >> https://lists.linuxfoundation.org/pipermail/virtualization/2015-April/029718.html > > dev_requeue_skb > That's the part I did not understand, the implemention of dev_requeue_skb() does not seem "non-trivial" to me, maybe the implemention before dev_requeue_skb() was non-trivial. But now "non-trivial amount of useless work" just seems to add confusion here. Anyway, according to the discussion between Herbert and Rusty, the problems seems to be: 1. tcpdump may see the packet twice due to requeuing. 2. higher priority skb is not able to preempt the requeued low priority skb. For 1, it seems fixable. For 2, I am not a QoS expert, but does the big TCP support added recently and the batching support in dev_hard_start_xmit() adds more noise to this qos problem?