On 2017年09月02日 00:17, Willem de Bruijn wrote:
This is not a 50/50 split, which impliesTw that some packets from the
large
packet flow are still converted to copying. Without the change the rate
without queue was 80k zerocopy vs 80k copy, so this choice of
(vq->num >> 2) appears too conservative.
However, testing with (vq->num >> 1) was not as effective at mitigating
stalls. I did not save that data, unfortunately. Can run more tests on
fine
tuning this variable, if the idea sounds good.
Looks like there're still two cases were left:
To be clear, this patch is not intended to fix all issues. It is a small
improvement to avoid HoL blocking due to queued zerocopy skbs.
Right, just want to see if there's anything left.
The trade-off is that reverting to copying in these cases increases
cycle cost. I think that that is a trade-off worth making compared to
the alternative drop in throughput. It probably would be good to be
able to measure this without kernel instrumentation: export
counters similar to net->tx_zcopy_err and net->tx_packets (though
without reset to zero, as in vhost_net_tx_packet).
I think it's acceptable if extra cycles were spent if we detect HOL anyhow.
1) sndbuf is not INT_MAX
You mean the case where the device stalls, later zerocopy notifications
are queued, but these are never cleaned in free_old_xmit_skbs,
because it requires a start_xmit and by now the (only) socket is out of
descriptors?
Typo, sorry. I meant out of sndbuf.
I mean e.g for tun. If its sndbuf is smaller than e.g (vq->num >> 1) *
$pkt_size and if all packet were held by some modules, limitation like
vq->num >> 1 won't work since we hit sudbuf before it.
A watchdog would help somewhat. With tx-napi, this case cannot occur,
either, as free_old_xmit_skbs no longer depends on a call to start_xmit.
2) tx napi is used for virtio-net
I am not aware of any issue specific to the use of tx-napi?
Might not be clear here, I mean e.g virtio_net (tx-napi) in guest +
vhost_net (zerocopy) in host. In this case, even if we switch to
datacopy if ubuf counts exceeds vq->num >> 1, we still complete tx
buffers in order, tx interrupt could be delayed for indefinite time.
1) could be a corner case, and for 2) what your suggest here may not solve
the issue since it still do in order completion.
Somewhat tangential, but it might also help to break the in-order
completion processing in vhost_zerocopy_signal_used. Complete
all descriptors between done_idx and upend_idx. done_idx should
then only be forward to the oldest still not-completed descriptor.
In the test I ran, where the oldest descriptors are held in a queue and
all newer ones are tail-dropped,
Do you mean the descriptors were tail-dropped by vhost?
this would avoid blocking a full ring
of completions, when only a small number (or 1) is actually delayed.
Dynamic switching between copy and zerocopy using zcopy_used
already returns completions out-of-order, so this is not a huge leap.
Yes.
Thanks
_______________________________________________
Virtualization mailing list
Virtualization@xxxxxxxxxxxxxxxxxxxxxxxxxx
https://lists.linuxfoundation.org/mailman/listinfo/virtualization