On Tue, 1 Jun 2021 14:17:27 +0800, Jason Wang <jasowang@xxxxxxxxxx> wrote: > > 在 2021/6/1 下午1:59, Xuan Zhuo 写道: > > On Tue, 1 Jun 2021 11:27:12 +0800, Jason Wang <jasowang@xxxxxxxxxx> wrote: > >> 在 2021/6/1 上午11:08, Xuan Zhuo 写道: > >>> On Tue, 1 Jun 2021 11:03:37 +0800, Jason Wang <jasowang@xxxxxxxxxx> wrote: > >>>> 在 2021/5/31 下午6:58, Xuan Zhuo 写道: > >>>>> On Mon, 31 May 2021 14:10:55 +0800, Jason Wang <jasowang@xxxxxxxxxx> wrote: > >>>>>> 在 2021/5/14 下午11:16, Xuan Zhuo 写道: > >>>>>>> In the case of merge, the page passed into page_to_skb() may be a head > >>>>>>> page, not the page where the current data is located. > >>>>>> I don't get how this can happen? > >>>>>> > >>>>>> Maybe you can explain a little bit more? > >>>>>> > >>>>>> receive_mergeable() call page_to_skb() in two places: > >>>>>> > >>>>>> 1) XDP_PASS for linearized page , in this case we use xdp_page > >>>>>> 2) page_to_skb() for "normal" page, in this case the page contains the data > >>>>> The offset may be greater than PAGE_SIZE, because page is obtained by > >>>>> virt_to_head_page(), not the page where buf is located. And "offset" is the offset > >>>>> of buf relative to page. > >>>>> > >>>>> tailroom = truesize - len - offset; > >>>>> > >>>>> In this case, the tailroom must be less than 0. Although there may be enough > >>>>> content on this page to save skb_shared_info. > >>>> Interesting, I think we don't use compound pages for virtio-net. (We > >>>> don't define SKB_FRAG_PAGE_ORDER). > >>>> > >>>> Am I wrong? > >>> It seems to me that it seems to be a fixed setting, not for us to configure > >>> independently > >> > >> Looks like you are right. > >> > >> See comments below. > >> > >> > >>> Thanks. > >>> > >>> ========================================== > >>> > >>> net/sock.c > >>> > >>> #define SKB_FRAG_PAGE_ORDER get_order(32768) > >>> DEFINE_STATIC_KEY_FALSE(net_high_order_alloc_disable_key); > >>> > >>> /** > >>> * skb_page_frag_refill - check that a page_frag contains enough room > >>> * @sz: minimum size of the fragment we want to get > >>> * @pfrag: pointer to page_frag > >>> * @gfp: priority for memory allocation > >>> * > >>> * Note: While this allocator tries to use high order pages, there is > >>> * no guarantee that allocations succeed. Therefore, @sz MUST be > >>> * less or equal than PAGE_SIZE. > >>> */ > >>> bool skb_page_frag_refill(unsigned int sz, struct page_frag *pfrag, gfp_t gfp) > >>> { > >>> if (pfrag->page) { > >>> if (page_ref_count(pfrag->page) == 1) { > >>> pfrag->offset = 0; > >>> return true; > >>> } > >>> if (pfrag->offset + sz <= pfrag->size) > >>> return true; > >>> put_page(pfrag->page); > >>> } > >>> > >>> pfrag->offset = 0; > >>> if (SKB_FRAG_PAGE_ORDER && > >>> !static_branch_unlikely(&net_high_order_alloc_disable_key)) { > >>> /* Avoid direct reclaim but allow kswapd to wake */ > >>> pfrag->page = alloc_pages((gfp & ~__GFP_DIRECT_RECLAIM) | > >>> __GFP_COMP | __GFP_NOWARN | > >>> __GFP_NORETRY, > >>> SKB_FRAG_PAGE_ORDER); > >>> if (likely(pfrag->page)) { > >>> pfrag->size = PAGE_SIZE << SKB_FRAG_PAGE_ORDER; > >>> return true; > >>> } > >>> } > >>> pfrag->page = alloc_page(gfp); > >>> if (likely(pfrag->page)) { > >>> pfrag->size = PAGE_SIZE; > >>> return true; > >>> } > >>> return false; > >>> } > >>> EXPORT_SYMBOL(skb_page_frag_refill); > >>> > >>> > >>>> Thanks > >>>> > >>>> > >>>>> Thanks. > >>>>> > >>>>>> Thanks > >>>>>> > >>>>>> > >>>>>>> So when trying to > >>>>>>> get the buf where the data is located, you should directly use the > >>>>>>> pointer(p) to get the address corresponding to the page. > >>>>>>> > >>>>>>> At the same time, the offset of the data in the page should also be > >>>>>>> obtained using offset_in_page(). > >>>>>>> > >>>>>>> This patch solves this problem. But if you don’t use this patch, the > >>>>>>> original code can also run, because if the page is not the page of the > >>>>>>> current data, the calculated tailroom will be less than 0, and will not > >>>>>>> enter the logic of build_skb() . The significance of this patch is to > >>>>>>> modify this logical problem, allowing more situations to use > >>>>>>> build_skb(). > >>>>>>> > >>>>>>> Signed-off-by: Xuan Zhuo <xuanzhuo@xxxxxxxxxxxxxxxxx> > >>>>>>> Acked-by: Michael S. Tsirkin <mst@xxxxxxxxxx> > >>>>>>> --- > >>>>>>> drivers/net/virtio_net.c | 8 ++++++-- > >>>>>>> 1 file changed, 6 insertions(+), 2 deletions(-) > >>>>>>> > >>>>>>> diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c > >>>>>>> index 3e46c12dde08..073fec4c0df1 100644 > >>>>>>> --- a/drivers/net/virtio_net.c > >>>>>>> +++ b/drivers/net/virtio_net.c > >>>>>>> @@ -407,8 +407,12 @@ static struct sk_buff *page_to_skb(struct virtnet_info *vi, > >>>>>>> * see add_recvbuf_mergeable() + get_mergeable_buf_len() > >>>>>>> */ > >>>>>>> truesize = PAGE_SIZE; > >>>>>>> - tailroom = truesize - len - offset; > >>>>>>> - buf = page_address(page); > >>>>>>> + > >>>>>>> + /* page maybe head page, so we should get the buf by p, not the > >>>>>>> + * page > >>>>>>> + */ > >>>>>>> + tailroom = truesize - len - offset_in_page(p); > >> > >> I wonder why offset_in_page(p) is correct? I guess it should be: > >> > >> tailroom = truesize - len - headroom; > >> > >> The reason is that the buffer is not necessarily allocated at the page > >> boundary. > > In my understanding, offset_in_page(p) is the offset of p in the page where it > > is located. In this case, the two should be equal. > > > I think not, if the frag is not page aligned. offset_in_page(p) doesn't > equal to headroom. > > Consider the case that the frag start from page offset 1500. Yes, you are right. I also felt a little weird just now, so I took a closer look at add_recvbuf_mergeable(). Thank you, very much. > > > > This has nothing to do with > > which page is allocated. > > > > Of course I think using headroom is a good idea, and it is semantically better. > > > > Thanks. > > > Thanks > > > > > >> Thanks > >> > >> > >>>>>>> + buf = (char *)((unsigned long)p & PAGE_MASK); > >>>>>>> } else { > >>>>>>> tailroom = truesize - len; > >>>>>>> buf = p; > _______________________________________________ Virtualization mailing list Virtualization@xxxxxxxxxxxxxxxxxxxxxxxxxx https://lists.linuxfoundation.org/mailman/listinfo/virtualization