On Sat, Sep 16, 2023 at 04:09:18PM +0300, Arseniy Krasnov wrote:
This adds handling of MSG_ZEROCOPY flag on transmission path: 1) If this flag is set and zerocopy transmission is possible (enabled in socket options and transport allows zerocopy), then non-linear skb will be created and filled with the pages of user's buffer. Pages of user's buffer are locked in memory by 'get_user_pages()'. 2) Replaces way of skb owning: instead of 'skb_set_owner_sk_safe()' it calls 'skb_set_owner_w()'. Reason of this change is that '__zerocopy_sg_from_iter()' increments 'sk_wmem_alloc' of socket, so to decrease this field correctly, proper skb destructor is needed: 'sock_wfree()'. This destructor is set by 'skb_set_owner_w()'. 3) Adds new callback to 'struct virtio_transport': 'can_msgzerocopy'. If this callback is set, then transport needs extra check to be able to send provided number of buffers in zerocopy mode. Currently, the only transport that needs this callback set is virtio, because this transport adds new buffers to the virtio queue and we need to check, that number of these buffers is less than size of the queue (it is required by virtio spec). vhost and loopback transports don't need this check. Signed-off-by: Arseniy Krasnov <avkrasnov@xxxxxxxxxxxxxxxxx> --- Changelog: v5(big patchset) -> v1: * Refactorings of 'if' conditions. * Remove extra blank line. * Remove 'frag_off' field unneeded init. * Add function 'virtio_transport_fill_skb()' which fills both linear and non-linear skb with provided data. v1 -> v2: * Use original order of last four arguments in 'virtio_transport_alloc_skb()'. v2 -> v3: * Add new transport callback: 'msgzerocopy_check_iov'. It checks that provided 'iov_iter' with data could be sent in a zerocopy mode. If this callback is not set in transport - transport allows to send any 'iov_iter' in zerocopy mode. Otherwise - if callback returns 'true' then zerocopy is allowed. Reason of this callback is that in case of G2H transmission we insert whole skb to the tx virtio queue and such skb must fit to the size of the virtio queue to be sent in a single iteration (may be tx logic in 'virtio_transport.c' could be reworked as in vhost to support partial send of current skb). This callback will be enabled only for G2H path. For details pls see comment 'Check that tx queue...' below. v3 -> v4: * 'msgzerocopy_check_iov' moved from 'struct vsock_transport' to 'struct virtio_transport' as it is virtio specific callback and never needed in other transports. v4 -> v5: * 'msgzerocopy_check_iov' renamed to 'can_msgzerocopy' and now it uses number of buffers to send as input argument. I think there is no need to pass iov to this callback (at least today, it is used only by guest side of virtio transport), because the only thing that this callback does is comparison of number of buffers to be inserted to the tx queue and size of this queue. * Remove any checks for type of current 'iov_iter' with payload (is it 'iovec' or 'ubuf'). These checks left from the earlier versions where I didn't use already implemented kernel API which handles every type of 'iov_iter'. v5 -> v6: * Refactor 'virtio_transport_fill_skb()'. * Add 'WARN_ON_ONCE()' and comment on invalid combination of destination socket and payload in 'virtio_transport_alloc_skb()'. v7 -> v8: * Move '+1' addition from 'can_msgzerocopy' callback body to the caller. This addition means packet header. * In 'virtio_transport_can_zcopy()' rename 'max_to_send' argument to 'pkt_len'. * Update commit message by adding details about new 'can_msgzerocopy' callback. * In 'virtio_transport_init_hdr()' move 'len' argument directly after 'info'. * Add comment about processing last skb in tx loop. * Update comment for 'can_msgzerocopy' callback for more details. v8 -> v9: * Return and update comment for 'virtio_transport_alloc_skb()'. * Pass pointer to transport ops to 'virtio_transport_can_zcopy()', this allows to use it directly without calling virtio_transport_get_ops()'. * Remove redundant call for 'msg_data_left()' in 'virtio_transport_fill_skb()'. * Do not pass 'struct vsock_sock*' to 'virtio_transport_alloc_skb()', use same pointer from already passed 'struct virtio_vsock_pkt_info*'. * Fix setting 'end of message' bit for SOCK_SEQPACKET (add call for 'msg_data_left()' == 0). * Add 'zcopy' parameter to packet allocation trace event.
Thanks for addressing the comments!
include/linux/virtio_vsock.h | 9 + .../events/vsock_virtio_transport_common.h | 12 +- net/vmw_vsock/virtio_transport.c | 32 +++ net/vmw_vsock/virtio_transport_common.c | 250 ++++++++++++++---- 4 files changed, 241 insertions(+), 62 deletions(-)
LGTM! Reviewed-by: Stefano Garzarella <sgarzare@xxxxxxxxxx>