On 2018/12/14 上午4:12, Michael S. Tsirkin wrote:
On Thu, Dec 13, 2018 at 06:10:19PM +0800, Jason Wang wrote:
Hi:
This series tries to access virtqueue metadata through kernel virtual
address instead of copy_user() friends since they had too much
overheads like checks, spec barriers or even hardware feature
toggling.
Test shows about 24% improvement on TX PPS. It should benefit other
cases as well.
Please review
I think the idea of speeding up userspace access is a good one.
However I think that moving all checks to start is way too aggressive.
So did packet and AF_XDP. Anyway, sharing address space and access them
directly is the fastest way. Performance is the major consideration for
people to choose backend. Compare to userspace implementation, vhost
does not have security advantages at any level. If vhost is still slow,
people will start to develop backends based on e.g AF_XDP.
Instead, let's batch things up but let's not keep them
around forever.
Here are some ideas:
1. Disable preemption, process a small number of small packets
directly in an atomic context. This should cut latency
down significantly, the tricky part is to only do it
on a light load and disable this
for the streaming case otherwise it's unfair.
This might fail, if it does just bounce things out to
a thread.
I'm not sure what context you meant here. Is this for TX path of TUN?
But a fundamental difference is my series is targeted for extreme heavy
load not light one, 100% cpu for vhost is expected.
2. Switch to unsafe_put_user/unsafe_get_user,
and batch up multiple accesses.
As I said, unless we can batch accessing of two difference places of
three of avail, descriptor and used. It won't help for batching the
accessing of a single place like used. I'm even not sure this can be
done consider the case of packed virtqueue, we have a single descriptor
ring. Batching through unsafe helpers may not help in this case since
it's equivalent to safe ones . And This requires non trivial refactoring
of vhost. And such refactoring itself make give us noticeable impact
(e.g it may lead regression).
3. Allow adding a fixup point manually,
such that multiple independent get_user accesses
can get a single fixup (will allow better compiler
optimizations).
So for metadata access, I don't see how you suggest here can help in the
case of heavy workload.
For data access, this may help but I've played to batch the data copy to
reduce SMAP/spec barriers in vhost-net but I don't see performance
improvement.
Thanks
Jason Wang (3):
vhost: generalize adding used elem
vhost: fine grain userspace memory accessors
vhost: access vq metadata through kernel virtual address
drivers/vhost/vhost.c | 281 ++++++++++++++++++++++++++++++++++++++----
drivers/vhost/vhost.h | 11 ++
2 files changed, 266 insertions(+), 26 deletions(-)
--
2.17.1
_______________________________________________
Virtualization mailing list
Virtualization@xxxxxxxxxxxxxxxxxxxxxxxxxx
https://lists.linuxfoundation.org/mailman/listinfo/virtualization