Hi: This series tries to access virtqueue metadata through kernel virtual address instead of copy_user() friends since they had too much overheads like checks, spec barriers or even hardware feature toggling like SMAP. This is done through setup kernel address through direct mapping and co-opreate VM management with MMU notifiers. Test shows about 23% improvement on TX PPS. TCP_STREAM doesn't see obvious improvement. Thanks Changes from RFC V3: - rebase to net-next - Tweak on the comments Changes from RFC V2: - switch to use direct mapping instead of vmap() - switch to use spinlock + RCU to synchronize MMU notifier and vhost data/control path - set dirty pages in the invalidation callbacks - always use copy_to/from_users() friends for the archs that may need flush_dcache_pages() - various minor fixes Changes from V4: - use invalidate_range() instead of invalidate_range_start() - track dirty pages Changes from V3: - don't try to use vmap for file backed pages - rebase to master Changes from V2: - fix buggy range overlapping check - tear down MMU notifier during vhost ioctl to make sure invalidation request can read metadata userspace address and vq size without holding vq mutex. Changes from V1: - instead of pinning pages, use MMU notifier to invalidate vmaps and remap duing metadata prefetch - fix build warning on MIPS Jason Wang (6): vhost: generalize adding used elem vhost: fine grain userspace memory accessors vhost: rename vq_iotlb_prefetch() to vq_meta_prefetch() vhost: introduce helpers to get the size of metadata area vhost: factor out setting vring addr and num vhost: access vq metadata through kernel virtual address drivers/vhost/net.c | 4 +- drivers/vhost/vhost.c | 850 ++++++++++++++++++++++++++++++++++++------ drivers/vhost/vhost.h | 38 +- 3 files changed, 766 insertions(+), 126 deletions(-) -- 2.18.1