Re: [RFC PATCH v1 07/12] vsock/virtio: MGS_ZEROCOPY flag support

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Feb 20, 2023 at 09:04:04AM +0000, Krasnov Arseniy wrote:
On 16.02.2023 18:16, Stefano Garzarella wrote:
On Mon, Feb 06, 2023 at 07:00:35AM +0000, Arseniy Krasnov wrote:
This adds main logic of MSG_ZEROCOPY flag processing for packet
creation. When this flag is set and user's iov iterator fits for
zerocopy transmission, call 'get_user_pages()' and add returned
pages to the newly created skb.

Signed-off-by: Arseniy Krasnov <AVKrasnov@xxxxxxxxxxxxxx>
---
net/vmw_vsock/virtio_transport_common.c | 212 ++++++++++++++++++++++--
1 file changed, 195 insertions(+), 17 deletions(-)

diff --git a/net/vmw_vsock/virtio_transport_common.c b/net/vmw_vsock/virtio_transport_common.c
index 05ce97b967ad..69e37f8a68a6 100644
--- a/net/vmw_vsock/virtio_transport_common.c
+++ b/net/vmw_vsock/virtio_transport_common.c
@@ -37,6 +37,169 @@ virtio_transport_get_ops(struct vsock_sock *vsk)
    return container_of(t, struct virtio_transport, transport);
}


I'd use bool if we don't need to return an error value in the following
new functions.

+static int virtio_transport_can_zcopy(struct iov_iter *iov_iter,
+                      size_t free_space)
+{
+    size_t pages;
+    int i;
+
+    if (!iter_is_iovec(iov_iter))
+        return -1;
+
+    if (iov_iter->iov_offset)
+        return -1;
+
+    /* We can't send whole iov. */
+    if (free_space < iov_iter->count)
+        return -1;
+
+    for (pages = 0, i = 0; i < iov_iter->nr_segs; i++) {
+        const struct iovec *iovec;
+        int pages_in_elem;
+
+        iovec = &iov_iter->iov[i];
+
+        /* Base must be page aligned. */
+        if (offset_in_page(iovec->iov_base))
+            return -1;
+
+        /* Only last element could have not page aligned size.  */
+        if (i != (iov_iter->nr_segs - 1)) {
+            if (offset_in_page(iovec->iov_len))
+                return -1;
+
+            pages_in_elem = iovec->iov_len >> PAGE_SHIFT;
+        } else {
+            pages_in_elem = round_up(iovec->iov_len, PAGE_SIZE);
+            pages_in_elem >>= PAGE_SHIFT;
+        }
+
+        /* In case of user's pages - one page is one frag. */
+        if (pages + pages_in_elem > MAX_SKB_FRAGS)
+            return -1;
+
+        pages += pages_in_elem;
+    }
+
+    return 0;
+}
+
+static int virtio_transport_init_zcopy_skb(struct vsock_sock *vsk,
+                       struct sk_buff *skb,
+                       struct iov_iter *iter,
+                       bool zerocopy)
+{
+    struct ubuf_info_msgzc *uarg_zc;
+    struct ubuf_info *uarg;
+
+    uarg = msg_zerocopy_realloc(sk_vsock(vsk),
+                    iov_length(iter->iov, iter->nr_segs),
+                    NULL);
+
+    if (!uarg)
+        return -1;
+
+    uarg_zc = uarg_to_msgzc(uarg);
+    uarg_zc->zerocopy = zerocopy ? 1 : 0;
+
+    skb_zcopy_init(skb, uarg);
+
+    return 0;
+}
+
+static int virtio_transport_fill_nonlinear_skb(struct sk_buff *skb,
+                           struct vsock_sock *vsk,
+                           struct virtio_vsock_pkt_info *info)
+{
+    struct iov_iter *iter;
+    int frag_idx;
+    int seg_idx;
+
+    iter = &info->msg->msg_iter;
+    frag_idx = 0;
+    VIRTIO_VSOCK_SKB_CB(skb)->curr_frag = 0;
+    VIRTIO_VSOCK_SKB_CB(skb)->frag_off = 0;
+
+    /* At this moment:
+     * 1) 'iov_offset' is zero.
+     * 2) Every 'iov_base' and 'iov_len' are also page aligned
+     *    (except length of the last element).
+     * 3) Number of pages in this iov <= MAX_SKB_FRAGS.
+     * 4) Length of the data fits in current credit space.
+     */
+    for (seg_idx = 0; seg_idx < iter->nr_segs; seg_idx++) {
+        struct page *user_pages[MAX_SKB_FRAGS];
+        const struct iovec *iovec;
+        size_t last_frag_len;
+        size_t pages_in_seg;
+        int page_idx;
+
+        iovec = &iter->iov[seg_idx];
+        pages_in_seg = iovec->iov_len >> PAGE_SHIFT;
+
+        if (iovec->iov_len % PAGE_SIZE) {
+            last_frag_len = iovec->iov_len % PAGE_SIZE;
+            pages_in_seg++;
+        } else {
+            last_frag_len = PAGE_SIZE;
+        }
+
+        if (get_user_pages((unsigned long)iovec->iov_base,
+                   pages_in_seg, FOLL_GET, user_pages,
+                   NULL) != pages_in_seg)
+            return -1;

Reading the get_user_pages() documentation, this should pin the user
pages, so we should be fine if we then expose them in the virtqueue.

But reading Documentation/core-api/pin_user_pages.rst it seems that
drivers should use "pin_user_pages*() for DMA-pinned pages", so I'm not
sure what we should do.

That is really interesting question for me too. IIUC 'pin_user_pages()'
sets special value to ref counter of page, so we can distinguish such
pages from the others. I've grepped for pinned pages check and found,
the it is used in mm/vmscan.c by calling 'folio_maybe_dma_pinned()' during
page lists processing. Seems 'pin_user_pages()' is more strict version of
'get_user_pages()' and it is recommended to use 'pin_' when data on these
pages will be accessed.
I think, i'll check which API is used in the TCP implementation for zerocopy
transmission.

Additional advice would be great!

Anyway, when we are done using the pages, we should call put_page() or
unpin_user_page() depending on how we pin them.

In case of 'get_user_pages()' everything is ok here: when such skb
will be released, 'put_page()' will be called for every frag page
of it, so there is no page leak.

Got it!

But in case of 'pin_user_pages()',
i will need to unpin in manually before calling 'consume_skb()'
after it is processed by virtio device. But anyway - it is not a
problem.

Yep.

Thanks,
Stefano

_______________________________________________
Virtualization mailing list
Virtualization@xxxxxxxxxxxxxxxxxxxxxxxxxx
https://lists.linuxfoundation.org/mailman/listinfo/virtualization




[Index of Archives]     [KVM Development]     [Libvirt Development]     [Libvirt Users]     [CentOS Virtualization]     [Netdev]     [Ethernet Bridging]     [Linux Wireless]     [Kernel Newbies]     [Security]     [Linux for Hams]     [Netfilter]     [Bugtraq]     [Yosemite Forum]     [MIPS Linux]     [ARM Linux]     [Linux RAID]     [Linux Admin]     [Samba]

  Powered by Linux