Re: [PATCH net 2/2] virtio-net: get build_skb() buf by data ptr

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, 1 Jun 2021 14:17:27 +0800, Jason Wang <jasowang@xxxxxxxxxx> wrote:
>
> 在 2021/6/1 下午1:59, Xuan Zhuo 写道:
> > On Tue, 1 Jun 2021 11:27:12 +0800, Jason Wang <jasowang@xxxxxxxxxx> wrote:
> >> 在 2021/6/1 上午11:08, Xuan Zhuo 写道:
> >>> On Tue, 1 Jun 2021 11:03:37 +0800, Jason Wang <jasowang@xxxxxxxxxx> wrote:
> >>>> 在 2021/5/31 下午6:58, Xuan Zhuo 写道:
> >>>>> On Mon, 31 May 2021 14:10:55 +0800, Jason Wang <jasowang@xxxxxxxxxx> wrote:
> >>>>>> 在 2021/5/14 下午11:16, Xuan Zhuo 写道:
> >>>>>>> In the case of merge, the page passed into page_to_skb() may be a head
> >>>>>>> page, not the page where the current data is located.
> >>>>>> I don't get how this can happen?
> >>>>>>
> >>>>>> Maybe you can explain a little bit more?
> >>>>>>
> >>>>>> receive_mergeable() call page_to_skb() in two places:
> >>>>>>
> >>>>>> 1) XDP_PASS for linearized page , in this case we use xdp_page
> >>>>>> 2) page_to_skb() for "normal" page, in this case the page contains the data
> >>>>> The offset may be greater than PAGE_SIZE, because page is obtained by
> >>>>> virt_to_head_page(), not the page where buf is located. And "offset" is the offset
> >>>>> of buf relative to page.
> >>>>>
> >>>>> 	tailroom = truesize - len - offset;
> >>>>>
> >>>>> In this case, the tailroom must be less than 0. Although there may be enough
> >>>>> content on this page to save skb_shared_info.
> >>>> Interesting, I think we don't use compound pages for virtio-net. (We
> >>>> don't define SKB_FRAG_PAGE_ORDER).
> >>>>
> >>>> Am I wrong?
> >>> It seems to me that it seems to be a fixed setting, not for us to configure
> >>> independently
> >>
> >> Looks like you are right.
> >>
> >> See comments below.
> >>
> >>
> >>> Thanks.
> >>>
> >>> ==========================================
> >>>
> >>> net/sock.c
> >>>
> >>> #define SKB_FRAG_PAGE_ORDER	get_order(32768)
> >>> DEFINE_STATIC_KEY_FALSE(net_high_order_alloc_disable_key);
> >>>
> >>> /**
> >>>    * skb_page_frag_refill - check that a page_frag contains enough room
> >>>    * @sz: minimum size of the fragment we want to get
> >>>    * @pfrag: pointer to page_frag
> >>>    * @gfp: priority for memory allocation
> >>>    *
> >>>    * Note: While this allocator tries to use high order pages, there is
> >>>    * no guarantee that allocations succeed. Therefore, @sz MUST be
> >>>    * less or equal than PAGE_SIZE.
> >>>    */
> >>> bool skb_page_frag_refill(unsigned int sz, struct page_frag *pfrag, gfp_t gfp)
> >>> {
> >>> 	if (pfrag->page) {
> >>> 		if (page_ref_count(pfrag->page) == 1) {
> >>> 			pfrag->offset = 0;
> >>> 			return true;
> >>> 		}
> >>> 		if (pfrag->offset + sz <= pfrag->size)
> >>> 			return true;
> >>> 		put_page(pfrag->page);
> >>> 	}
> >>>
> >>> 	pfrag->offset = 0;
> >>> 	if (SKB_FRAG_PAGE_ORDER &&
> >>> 	    !static_branch_unlikely(&net_high_order_alloc_disable_key)) {
> >>> 		/* Avoid direct reclaim but allow kswapd to wake */
> >>> 		pfrag->page = alloc_pages((gfp & ~__GFP_DIRECT_RECLAIM) |
> >>> 					  __GFP_COMP | __GFP_NOWARN |
> >>> 					  __GFP_NORETRY,
> >>> 					  SKB_FRAG_PAGE_ORDER);
> >>> 		if (likely(pfrag->page)) {
> >>> 			pfrag->size = PAGE_SIZE << SKB_FRAG_PAGE_ORDER;
> >>> 			return true;
> >>> 		}
> >>> 	}
> >>> 	pfrag->page = alloc_page(gfp);
> >>> 	if (likely(pfrag->page)) {
> >>> 		pfrag->size = PAGE_SIZE;
> >>> 		return true;
> >>> 	}
> >>> 	return false;
> >>> }
> >>> EXPORT_SYMBOL(skb_page_frag_refill);
> >>>
> >>>
> >>>> Thanks
> >>>>
> >>>>
> >>>>> Thanks.
> >>>>>
> >>>>>> Thanks
> >>>>>>
> >>>>>>
> >>>>>>> So when trying to
> >>>>>>> get the buf where the data is located, you should directly use the
> >>>>>>> pointer(p) to get the address corresponding to the page.
> >>>>>>>
> >>>>>>> At the same time, the offset of the data in the page should also be
> >>>>>>> obtained using offset_in_page().
> >>>>>>>
> >>>>>>> This patch solves this problem. But if you don’t use this patch, the
> >>>>>>> original code can also run, because if the page is not the page of the
> >>>>>>> current data, the calculated tailroom will be less than 0, and will not
> >>>>>>> enter the logic of build_skb() . The significance of this patch is to
> >>>>>>> modify this logical problem, allowing more situations to use
> >>>>>>> build_skb().
> >>>>>>>
> >>>>>>> Signed-off-by: Xuan Zhuo <xuanzhuo@xxxxxxxxxxxxxxxxx>
> >>>>>>> Acked-by: Michael S. Tsirkin <mst@xxxxxxxxxx>
> >>>>>>> ---
> >>>>>>>      drivers/net/virtio_net.c | 8 ++++++--
> >>>>>>>      1 file changed, 6 insertions(+), 2 deletions(-)
> >>>>>>>
> >>>>>>> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
> >>>>>>> index 3e46c12dde08..073fec4c0df1 100644
> >>>>>>> --- a/drivers/net/virtio_net.c
> >>>>>>> +++ b/drivers/net/virtio_net.c
> >>>>>>> @@ -407,8 +407,12 @@ static struct sk_buff *page_to_skb(struct virtnet_info *vi,
> >>>>>>>      		 * see add_recvbuf_mergeable() + get_mergeable_buf_len()
> >>>>>>>      		 */
> >>>>>>>      		truesize = PAGE_SIZE;
> >>>>>>> -		tailroom = truesize - len - offset;
> >>>>>>> -		buf = page_address(page);
> >>>>>>> +
> >>>>>>> +		/* page maybe head page, so we should get the buf by p, not the
> >>>>>>> +		 * page
> >>>>>>> +		 */
> >>>>>>> +		tailroom = truesize - len - offset_in_page(p);
> >>
> >> I wonder why offset_in_page(p) is correct? I guess it should be:
> >>
> >> tailroom = truesize - len - headroom;
> >>
> >> The reason is that the buffer is not necessarily allocated at the page
> >> boundary.
> > In my understanding, offset_in_page(p) is the offset of p in the page where it
> > is located. In this case, the two should be equal.
>
>
> I think not, if the frag is not page aligned. offset_in_page(p) doesn't
> equal to headroom.
>
> Consider the case that the frag start from page offset 1500.

Yes, you are right. I also felt a little weird just now, so I took a closer look
at add_recvbuf_mergeable().

Thank you, very much.

>
>
> >   This has nothing to do with
> > which page is allocated.
> >
> > Of course I think using headroom is a good idea, and it is semantically better.
> >
> > Thanks.
>
>
> Thanks
>
>
> >
> >> Thanks
> >>
> >>
> >>>>>>> +		buf = (char *)((unsigned long)p & PAGE_MASK);
> >>>>>>>      	} else {
> >>>>>>>      		tailroom = truesize - len;
> >>>>>>>      		buf = p;
>
_______________________________________________
Virtualization mailing list
Virtualization@xxxxxxxxxxxxxxxxxxxxxxxxxx
https://lists.linuxfoundation.org/mailman/listinfo/virtualization




[Index of Archives]     [KVM Development]     [Libvirt Development]     [Libvirt Users]     [CentOS Virtualization]     [Netdev]     [Ethernet Bridging]     [Linux Wireless]     [Kernel Newbies]     [Security]     [Linux for Hams]     [Netfilter]     [Bugtraq]     [Yosemite Forum]     [MIPS Linux]     [ARM Linux]     [Linux RAID]     [Linux Admin]     [Samba]

  Powered by Linux