Re: [PATCH net-next] virtio-net: switch napi_tx without downing nic

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Heng Qi wrote:
> 
> 
> 在 2023/12/20 下午10:45, Willem de Bruijn 写道:
> > Heng Qi wrote:
> >> virtio-net has two ways to switch napi_tx: one is through the
> >> module parameter, and the other is through coalescing parameter
> >> settings (provided that the nic status is down).
> >>
> >> Sometimes we face performance regression caused by napi_tx,
> >> then we need to switch napi_tx when debugging. However, the
> >> existing methods are a bit troublesome, such as needing to
> >> reload the driver or turn off the network card. So try to make
> >> this update.
> >>
> >> Signed-off-by: Heng Qi <hengqi@xxxxxxxxxxxxxxxxx>
> >> Reviewed-by: Xuan Zhuo <xuanzhuo@xxxxxxxxxxxxxxxxx>
> > The commit does not explain why it is safe to do so.
> 
> virtnet_napi_tx_disable ensures that already scheduled tx napi ends and 
> no new tx napi will be scheduled.
> 
> Afterwards, if the __netif_tx_lock_bh lock is held, the stack cannot 
> send the packet.
> 
> Then we can safely toggle the weight to indicate where to clear the buffers.
> 
> >
> > The tx-napi weights are not really weights: it is a boolean whether
> > napi is used for transmit cleaning, or whether packets are cleaned
> > in ndo_start_xmit.
> 
> Right.
> 
> >
> > There certainly are some subtle issues with regard to pausing/waking
> > queues when switching between modes.
> 
> What are "subtle issues" and if there are any, we find them.

A single runtime test is not sufficient to exercise all edge cases.

Please don't leave it to reviewers to establish the correctness of a
patch.

The napi_tx and non-napi code paths differ in how they handle at least
the following structures:

1. skb: non-napi orphans these in ndo_start_xmit. Without napi this is
needed as delay until the next ndo_start_xmit and thus completion is
unbounded.

When switching to napi mode, orphaned skbs may now be cleaned by the
napi handler. This is indeed safe.

When switching from napi to non-napi, the unbound latency resurfaces.
It is a small edge case, and I think a potentially acceptable risk, if
the user of this knob is aware of the risk.

2. virtqueue callback ("interrupt" masking). The non-napi path enables
the interrupt (disables the mask) when available descriptors falls
beneath a low watermark, and reenables when it recovers above a high
watermark. Napi disables when napi is scheduled, and reenables on
napi complete.

3. dev_queue->state (QUEUE_STATE_DRV_XOFF). if the ring falls below
a low watermark, the driver stops the stack for queuing more packets.
In napi mode, it schedules napi to clean packets. See the calls to
netif_xmit_stopped, netif_stop_subqueue, netif_start_subqueue and
netif_tx_wake_queue.

Some if this can be assumed safe by looking at existing analogous
code, such as the queue stop/start in virtnet_tx_resize.

But that all virtqueue callback and dev_queue->state transitions are
correct when switching between modes at runtime is not trivial to
establish, deserves some thought and explanation in the commit
message.

> So far my test results show it's working fine.
> 
> >
> > Calling napi_enable/napi_disable without bringing down the device is
> > allowed. The actually napi.weight field is only updated when neither
> > napi nor ndo_start_xmit is running.
> 
> YES.
> 
> > So I don't see an immediate issue.
> 
> Switching napi_tx requires reloading the driver or downing the nic, 
> which is troublesome.
> I think it would be better if we could find a better way.
> 
> Thanks!
> 
> >
> >
> >> ---
> >>   drivers/net/virtio_net.c | 81 ++++++++++++++++++----------------------
> >>   1 file changed, 37 insertions(+), 44 deletions(-)
> >>
> >> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> >> index 10614e9f7cad..12f8e1f9971c 100644
> >> --- a/drivers/net/virtio_net.c
> >> +++ b/drivers/net/virtio_net.c
> >> @@ -3559,16 +3559,37 @@ static int virtnet_coal_params_supported(struct ethtool_coalesce *ec)
> >>   	return 0;
> >>   }
> >>   
> >> -static int virtnet_should_update_vq_weight(int dev_flags, int weight,
> >> -					   int vq_weight, bool *should_update)
> >> +static void virtnet_switch_napi_tx(struct virtnet_info *vi, u32 qstart,
> >> +				   u32 qend, u32 tx_frames)
> >>   {
> >> -	if (weight ^ vq_weight) {
> >> -		if (dev_flags & IFF_UP)
> >> -			return -EBUSY;
> >> -		*should_update = true;
> >> -	}
> >> +	struct net_device *dev = vi->dev;
> >> +	int new_weight, cur_weight;
> >> +	struct netdev_queue *txq;
> >> +	struct send_queue *sq;
> >>   
> >> -	return 0;
> >> +	new_weight = tx_frames ? NAPI_POLL_WEIGHT : 0;
> >> +	for (; qstart < qend; qstart++) {
> >> +		sq = &vi->sq[qstart];
> >> +		cur_weight = sq->napi.weight;
> >> +		if (!(new_weight ^ cur_weight))
> >> +			continue;
> >> +
> >> +		if (!(dev->flags & IFF_UP)) {
> >> +			sq->napi.weight = new_weight;
> >> +			continue;
> >> +		}
> >> +
> >> +		if (cur_weight)
> >> +			virtnet_napi_tx_disable(&sq->napi);
> >> +
> >> +		txq = netdev_get_tx_queue(dev, qstart);
> >> +		__netif_tx_lock_bh(txq);
> >> +		sq->napi.weight = new_weight;
> >> +		__netif_tx_unlock_bh(txq);
> >> +
> >> +		if (!cur_weight)
> >> +			virtnet_napi_tx_enable(vi, sq->vq, &sq->napi);
> >> +	}
> >>   }
> >>   
> >>   static int virtnet_set_coalesce(struct net_device *dev,
> >> @@ -3577,25 +3598,11 @@ static int virtnet_set_coalesce(struct net_device *dev,
> >>   				struct netlink_ext_ack *extack)
> >>   {
> >>   	struct virtnet_info *vi = netdev_priv(dev);
> >> -	int ret, queue_number, napi_weight;
> >> -	bool update_napi = false;
> >> -
> >> -	/* Can't change NAPI weight if the link is up */
> >> -	napi_weight = ec->tx_max_coalesced_frames ? NAPI_POLL_WEIGHT : 0;
> >> -	for (queue_number = 0; queue_number < vi->max_queue_pairs; queue_number++) {
> >> -		ret = virtnet_should_update_vq_weight(dev->flags, napi_weight,
> >> -						      vi->sq[queue_number].napi.weight,
> >> -						      &update_napi);
> >> -		if (ret)
> >> -			return ret;
> >> -
> >> -		if (update_napi) {
> >> -			/* All queues that belong to [queue_number, vi->max_queue_pairs] will be
> >> -			 * updated for the sake of simplicity, which might not be necessary
> >> -			 */
> >> -			break;
> >> -		}
> >> -	}
> >> +	int ret;
> >> +
> >> +	/* Param tx_frames can be used to switch napi_tx */
> >> +	virtnet_switch_napi_tx(vi, 0, vi->max_queue_pairs,
> >> +			       ec->tx_max_coalesced_frames);
> >>   
> >>   	if (virtio_has_feature(vi->vdev, VIRTIO_NET_F_NOTF_COAL))
> >>   		ret = virtnet_send_notf_coal_cmds(vi, ec);
> >> @@ -3605,11 +3612,6 @@ static int virtnet_set_coalesce(struct net_device *dev,
> >>   	if (ret)
> >>   		return ret;
> >>   
> >> -	if (update_napi) {
> >> -		for (; queue_number < vi->max_queue_pairs; queue_number++)
> >> -			vi->sq[queue_number].napi.weight = napi_weight;
> >> -	}
> >> -
> >>   	return ret;
> >>   }
> >>   
> >> @@ -3641,19 +3643,13 @@ static int virtnet_set_per_queue_coalesce(struct net_device *dev,
> >>   					  struct ethtool_coalesce *ec)
> >>   {
> >>   	struct virtnet_info *vi = netdev_priv(dev);
> >> -	int ret, napi_weight;
> >> -	bool update_napi = false;
> >> +	int ret;
> >>   
> >>   	if (queue >= vi->max_queue_pairs)
> >>   		return -EINVAL;
> >>   
> >> -	/* Can't change NAPI weight if the link is up */
> >> -	napi_weight = ec->tx_max_coalesced_frames ? NAPI_POLL_WEIGHT : 0;
> >> -	ret = virtnet_should_update_vq_weight(dev->flags, napi_weight,
> >> -					      vi->sq[queue].napi.weight,
> >> -					      &update_napi);
> >> -	if (ret)
> >> -		return ret;
> >> +	/* Param tx_frames can be used to switch napi_tx */
> >> +	virtnet_switch_napi_tx(vi, queue, queue, ec->tx_max_coalesced_frames);
> >>   
> >>   	if (virtio_has_feature(vi->vdev, VIRTIO_NET_F_VQ_NOTF_COAL))
> >>   		ret = virtnet_send_notf_coal_vq_cmds(vi, ec, queue);
> >> @@ -3663,9 +3659,6 @@ static int virtnet_set_per_queue_coalesce(struct net_device *dev,
> >>   	if (ret)
> >>   		return ret;
> >>   
> >> -	if (update_napi)
> >> -		vi->sq[queue].napi.weight = napi_weight;
> >> -
> >>   	return 0;
> >>   }
> >>   
> >> -- 
> >> 2.19.1.6.gb485710b
> >>
> 







[Index of Archives]     [KVM Development]     [Libvirt Development]     [Libvirt Users]     [CentOS Virtualization]     [Netdev]     [Ethernet Bridging]     [Linux Wireless]     [Kernel Newbies]     [Security]     [Linux for Hams]     [Netfilter]     [Bugtraq]     [Yosemite Forum]     [MIPS Linux]     [ARM Linux]     [Linux RAID]     [Linux Admin]     [Samba]

  Powered by Linux