On Fri, Dec 22, 2023 at 4:14 PM Michael S. Tsirkin <mst@xxxxxxxxxx> wrote: > > On Fri, Dec 22, 2023 at 10:35:07AM +0800, Jason Wang wrote: > > On Thu, Dec 21, 2023 at 11:06 PM Willem de Bruijn > > <willemdebruijn.kernel@xxxxxxxxx> wrote: > > > > > > Heng Qi wrote: > > > > > > > > > > > > 在 2023/12/20 下午10:45, Willem de Bruijn 写道: > > > > > Heng Qi wrote: > > > > >> virtio-net has two ways to switch napi_tx: one is through the > > > > >> module parameter, and the other is through coalescing parameter > > > > >> settings (provided that the nic status is down). > > > > >> > > > > >> Sometimes we face performance regression caused by napi_tx, > > > > >> then we need to switch napi_tx when debugging. However, the > > > > >> existing methods are a bit troublesome, such as needing to > > > > >> reload the driver or turn off the network card. > > > > Why is this troublesome? We don't need to turn off the card, it's just > > a toggling of the interface. > > > > This ends up with pretty simple code. > > > > > So try to make > > > > >> this update. > > > > >> > > > > >> Signed-off-by: Heng Qi <hengqi@xxxxxxxxxxxxxxxxx> > > > > >> Reviewed-by: Xuan Zhuo <xuanzhuo@xxxxxxxxxxxxxxxxx> > > > > > The commit does not explain why it is safe to do so. > > > > > > > > virtnet_napi_tx_disable ensures that already scheduled tx napi ends and > > > > no new tx napi will be scheduled. > > > > > > > > Afterwards, if the __netif_tx_lock_bh lock is held, the stack cannot > > > > send the packet. > > > > > > > > Then we can safely toggle the weight to indicate where to clear the buffers. > > > > > > > > > > > > > > The tx-napi weights are not really weights: it is a boolean whether > > > > > napi is used for transmit cleaning, or whether packets are cleaned > > > > > in ndo_start_xmit. > > > > > > > > Right. > > > > > > > > > > > > > > There certainly are some subtle issues with regard to pausing/waking > > > > > queues when switching between modes. > > > > > > > > What are "subtle issues" and if there are any, we find them. > > > > > > A single runtime test is not sufficient to exercise all edge cases. > > > > > > Please don't leave it to reviewers to establish the correctness of a > > > patch. > > > > +1 > > > > And instead of trying to do this, it would be much better to optimize > > the NAPI performance. Then we can drop the orphan mode. > > "To address your problem, optimize our code to the level which we > couldn't achieve in more than 10 years". Last time QE didn't report any issue for TCP. For others, the code might just need some optimization if it really matters, it's just because nobody has worked on this part in the past years. The ethtool trick is just for debugging purposes, I can hardly believe it is used by any management layer software. > That's not a reasonable > requirement. Not getting an interrupt will always be better for some > workloads. So NAPI has been enabled by default for many years. And most of the NIC doesn't do orphans. Orphan has known issues like pktgen and others. Keeping two modes may result in tricky code and complicate the features like BQL [1]. We would have interrupt coalescing and dim soon. Admin can enable the heuristic or tweak it according to the workload which is much more user friendly than orphaning. I don't see a strong point to keep the orphan mode any more considering the advantages of NAPI. Thanks [1] https://lore.kernel.org/lkml/20181205225323.12555-2-mst@xxxxxxxxxx/ > > > > > > > > The napi_tx and non-napi code paths differ in how they handle at least > > > the following structures: > > > > > > 1. skb: non-napi orphans these in ndo_start_xmit. Without napi this is > > > needed as delay until the next ndo_start_xmit and thus completion is > > > unbounded. > > > > > > When switching to napi mode, orphaned skbs may now be cleaned by the > > > napi handler. This is indeed safe. > > > > > > When switching from napi to non-napi, the unbound latency resurfaces. > > > It is a small edge case, and I think a potentially acceptable risk, if > > > the user of this knob is aware of the risk. > > > > > > 2. virtqueue callback ("interrupt" masking). The non-napi path enables > > > the interrupt (disables the mask) when available descriptors falls > > > beneath a low watermark, and reenables when it recovers above a high > > > watermark. Napi disables when napi is scheduled, and reenables on > > > napi complete. > > > > > > 3. dev_queue->state (QUEUE_STATE_DRV_XOFF). if the ring falls below > > > a low watermark, the driver stops the stack for queuing more packets. > > > In napi mode, it schedules napi to clean packets. See the calls to > > > netif_xmit_stopped, netif_stop_subqueue, netif_start_subqueue and > > > netif_tx_wake_queue. > > > > > > Some if this can be assumed safe by looking at existing analogous > > > code, such as the queue stop/start in virtnet_tx_resize. > > > > > > But that all virtqueue callback and dev_queue->state transitions are > > > correct when switching between modes at runtime is not trivial to > > > establish, deserves some thought and explanation in the commit > > > message. > > > > Thanks >