Alex Williamson wrote:
This is an attempt to improve the latency of virtio-net while not hurting throughput. I wanted to try moving packet TX into a different thread so we can quickly return to the guest after it kicks us to send packets out. I also switched the order of when the tx_timer comes into play, so we can get an inital burst of packets out, then wait for the timer to fire and notify us if there's more to do. Here's what it does for me (average of 5 runs each, testing to a remote system on a 1Gb network): netperf TCP_STREAM: 939.22Mb/s -> 935.24Mb/s = 99.58% netperf TCP_RR: 2028.72/s -> 3927.99/s = 193.62% tbench: 92.99MB/s -> 99.97MB/s = 107.51% I'd be interested to hear if it helps or hurts anyone else. Thanks,
My worry with this change is that increases cpu utilization even more than it increases bandwidth, so that our bits/cycle measure decreases. The descriptors (and perhaps data) are likely on the same cache as the vcpu, and moving the transmit to the iothread will cause them to move to the iothread's cache.
My preferred approach to increasing both bandwidth and bits/cycle (the latter figure is more important IMO, unfortunately benchmarks don't measure it) is to aio-enable tap and raw sockets. The vcpu thread would only touch the packet descriptors (not data) and submit all packets in one io_submit() call. Unfortunately a huge amount of work is needed to pull this off.
-- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html