>> > An issue of the referenced patch is that sndbuf could be smaller than low >> > watermark. > We cannot determine the low watermark properly because of not only sndbuf size > issue but also the fact that the upper vhost-net cannot directly see how much > descriptor is currently available at the virtio-net tx queue. It depends on > multiqueue settings or other senders which are also using the same tx queue. > Note that in the latter case if they constantly transmitting, the deadlock could > not occur(*), however if it has just temporarily fulfill some portion of the > pool in the mean time, then the low watermark cannot be helpful. > (*: That is because it's reliable enough in the sense I mention below.) > > Keep in this in mind, let me briefly describe the possible deadlock I mentioned: > (1). vhost-net on L1 guest has nothing to do sendmsg until the upper layer sets > new descriptors, which depends only on the vhost-net zcopy callback and adding > newly used descriptors. > (2). vhost-net callback depends on the skb freeing on the xmit path only. > (3). the xmit path depends (possibly only) on the vhost-net sendmsg. > As you see, it's enough to bring about the situation above that L1 virtio-net > reaches its limit earlier than the L0 host processing. The vhost-net pool could > be almost full or empty, whatever. Thanks for the context. This issue is very similar to the one that used to exist when running out of transmit descriptors, before the removal of the timer and introduction of skb_orphan in start_xmit. To make sure that I understand correctly, let me paraphrase: A. guest socket cannot send because it exhausted its sk budget (sndbuf, tsq, ..) B. budget is not freed up until guest receives tx completion for this flow C. tx completion is held back on the host side in vhost_zerocopy_signal_used behind the completion for an unrelated skb D. unrelated packet is delayed somewhere in the host stackf zerocopy completions. e.g., netem The issue that is specific to vhost-net zerocopy is that (C) enforces strict ordering of transmit completions causing head of line blocking behind vhost-net zerocopy callbacks. This is a different problem from C1. tx completion is delayed until guest sends another packet and triggers free_old_xmit_skb Both in host and guest, zerocopy packets should never be able to loop to a receive path where they can cause unbounded delay. The obvious cases of latency are queueing, like netem. That leads to poor performance for unrelated flows, but I don't see how this could cause deadlock. _______________________________________________ Virtualization mailing list Virtualization@xxxxxxxxxxxxxxxxxxxxxxxxxx https://lists.linuxfoundation.org/mailman/listinfo/virtualization