On 04.09.2023 11:21, Stefano Garzarella wrote: > On Sun, Sep 03, 2023 at 11:13:23AM +0300, Arseniy Krasnov wrote: >> >> >> On 01.09.2023 15:30, Stefano Garzarella wrote: >>> On Sun, Aug 27, 2023 at 11:54:36AM +0300, Arseniy Krasnov wrote: >>>> This adds handling of MSG_ZEROCOPY flag on transmission path: if this >>>> flag is set and zerocopy transmission is possible (enabled in socket >>>> options and transport allows zerocopy), then non-linear skb will be >>>> created and filled with the pages of user's buffer. Pages of user's >>>> buffer are locked in memory by 'get_user_pages()'. Second thing that >>>> this patch does is replace type of skb owning: instead of calling >>>> 'skb_set_owner_sk_safe()' it calls 'skb_set_owner_w()'. Reason of this >>>> change is that '__zerocopy_sg_from_iter()' increments 'sk_wmem_alloc' >>>> of socket, so to decrease this field correctly proper skb destructor is >>>> needed: 'sock_wfree()'. This destructor is set by 'skb_set_owner_w()'. >>>> >>>> Signed-off-by: Arseniy Krasnov <avkrasnov@xxxxxxxxxxxxxxxxx> > > [...] > >>>> >>>> -/* Returns a new packet on success, otherwise returns NULL. >>>> - * >>>> - * If NULL is returned, errp is set to a negative errno. >>>> - */ >>>> -static struct sk_buff * >>>> -virtio_transport_alloc_skb(struct virtio_vsock_pkt_info *info, >>>> - size_t len, >>>> - u32 src_cid, >>>> - u32 src_port, >>>> - u32 dst_cid, >>>> - u32 dst_port) >>>> -{ >>>> - const size_t skb_len = VIRTIO_VSOCK_SKB_HEADROOM + len; >>>> - struct virtio_vsock_hdr *hdr; >>>> - struct sk_buff *skb; >>>> +static bool virtio_transport_can_zcopy(struct virtio_vsock_pkt_info *info, >>>> + size_t max_to_send) >>> ^ >>> I'd call it `pkt_len`, `max_to_send` is confusing IMHO. I didn't >>> initially if it was the number of buffers or bytes. >>> >>>> +{ >>>> + const struct virtio_transport *t_ops; >>>> + struct iov_iter *iov_iter; >>>> + >>>> + if (!info->msg) >>>> + return false; >>>> + >>>> + iov_iter = &info->msg->msg_iter; >>>> + >>>> + if (iov_iter->iov_offset) >>>> + return false; >>>> + >>>> + /* We can't send whole iov. */ >>>> + if (iov_iter->count > max_to_send) >>>> + return false; >>>> + >>>> + /* Check that transport can send data in zerocopy mode. */ >>>> + t_ops = virtio_transport_get_ops(info->vsk); >>>> + >>>> + if (t_ops->can_msgzerocopy) { >>> >>> So if `can_msgzerocopy` is not implemented, we always return true after >>> this point. Should we mention it in the .can_msgzerocopy documentation? ^^^ Sorry, ops again. Just checked this code during comments fixing. It is correct to "return true;" Idea is: if (generic conditions for MSG_ZEROCOPY == false) return false;// can't zerocopy if (t_ops->can_msgzerocopy) //transport needs extra check return t_ops->can_msgzerocopy(); return true;//transport doesn't require extra check and generic conditions above are OK -> can zerocopy But anyway: 1) I'll add comment in 'struct virtio_transport' for '.can_msgzerocopy' that this callback is not mandatory - just additional transport specific check. 2) Add test for fallback to copy. Thanks, Arseniy >> >> Ops, this is my mistake, I must return 'false' in this case. Seems I didn't >> catch this problem with my tests, because there was no test case where >> zerocopy will fallback to copy! >> >> I'll fix it and add new test! > > yep, I agree! > >> >>> >>> Can we also mention in the commit description why this is need only for >>> virtio_tranport and not for vhost and loopback? >>> >>>> + int pages_in_iov = iov_iter_npages(iov_iter, MAX_SKB_FRAGS); >>>> + int pages_to_send = min(pages_in_iov, MAX_SKB_FRAGS); >>>> + >>>> + return t_ops->can_msgzerocopy(pages_to_send); >>>> + } >>>> + >>>> + return true; >>>> +} >>>> + > > [...] > >>>> @@ -270,6 +395,17 @@ static int virtio_transport_send_pkt_info(struct vsock_sock *vsk, >>>> break; >>>> } >>>> >>>> + /* This is last skb to send this portion of data. */ >>> >>> Sorry I didn't get it :-( >>> >>> Can you elaborate this a bit more? >> >> I mean that we iterate over user's buffer here, allocating skb on each >> iteration. And for last skb for this buffer we initialize completion >> for user (we need to allocate one completion for one syscall). > > Okay, so maybe we should explain better also in the code comment. >> >> Thanks for review, I'll fix all other comments and resend patchset when >> 'net-next' will be opened again. > > Cool, thanks! > Stefano >