Re: [PATCH net] virtio-net: fix overflow inside virtnet_rq_alloc

Si-Wei Liu <si-wei.liu@xxxxxxxxxx> · Tue, 20 Aug 2024 12:44:46 -0700

On 8/20/2024 12:19 AM, Xuan Zhuo wrote:
leads to regression on VM with the sysctl value of:

- net.core.high_order_alloc_disable=1

which could see reliable crashes or scp failure (scp a file 100M in size
to VM):

The issue is that the virtnet_rq_dma takes up 16 bytes at the beginning
of a new frag. When the frag size is larger than PAGE_SIZE,
everything is fine. However, if the frag is only one page and the
total size of the buffer and virtnet_rq_dma is larger than one page, an
overflow may occur. In this case, if an overflow is possible, I adjust
the buffer size. If net.core.high_order_alloc_disable=1, the maximum
buffer size is 4096 - 16. If net.core.high_order_alloc_disable=0, only
the first buffer of the frag is affected.

Fixes: f9dac92ba908 ("virtio_ring: enable premapped mode whatever use_dma_api")
Reported-by: "Si-Wei Liu" <si-wei.liu@xxxxxxxxxx>
Closes: http://lore.kernel.org/all/8b20cc28-45a9-4643-8e87-ba164a540c0a@xxxxxxxxxx
Signed-off-by: Xuan Zhuo <xuanzhuo@xxxxxxxxxxxxxxxxx>
---
  drivers/net/virtio_net.c | 12 +++++++++---
  1 file changed, 9 insertions(+), 3 deletions(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index c6af18948092..e5286a6da863 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -918,9 +918,6 @@ static void *virtnet_rq_alloc(struct receive_queue *rq, u32 size, gfp_t gfp)
  	void *buf, *head;
  	dma_addr_t addr;
  
-	if (unlikely(!skb_page_frag_refill(size, alloc_frag, gfp)))
-		return NULL;
-
  	head = page_address(alloc_frag->page);
  
  	dma = head;
@@ -2421,6 +2418,9 @@ static int add_recvbuf_small(struct virtnet_info *vi, struct receive_queue *rq,
  	len = SKB_DATA_ALIGN(len) +
  	      SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
  
+	if (unlikely(!skb_page_frag_refill(len, &rq->alloc_frag, gfp)))
+		return -ENOMEM;
+
Do you want to document the assumption that small packet case won't end 
up crossing the page frag boundary unlike the mergeable case? Add a 
comment block to explain or a WARN_ON() check against potential overflow 
would work with me.

  	buf = virtnet_rq_alloc(rq, len, gfp);
  	if (unlikely(!buf))
  		return -ENOMEM;
@@ -2521,6 +2521,12 @@ static int add_recvbuf_mergeable(struct virtnet_info *vi,
  	 */
  	len = get_mergeable_buf_len(rq, &rq->mrg_avg_pkt_len, room);
  
+	if (unlikely(!skb_page_frag_refill(len + room, alloc_frag, gfp)))
+		return -ENOMEM;
+
+	if (!alloc_frag->offset && len + room + sizeof(struct virtnet_rq_dma) > alloc_frag->size)
+		len -= sizeof(struct virtnet_rq_dma);
+
This could address my previous concern for possibly regressing every 
buffer size for the mergeable case, thanks. Though I still don't get why 
carving up a small chunk from page_frag for storing the virtnet_rq_dma 
metadata, this would cause perf regression on certain MTU size that 
happens to end up with one more base page (and an extra descriptor as 
well) to be allocated compared to the previous code without the extra 
virtnet_rq_dma content. How hard would it be to allocate a dedicated 
struct to store the related information without affecting the (size of) 
datapath pages?

FWIW, out of the code review perspective, I've looked up the past 
conversations but didn't see comprehensive benchmark was done before 
removing the old code and making premap the sole default mode. Granted 
this would reduce the footprint of additional code and the associated 
maintaining cost immediately, but I would assume at least there should 
have been thorough performance runs upfront to guarantee no regression 
is seen with every possible use case, or the negative effect is 
comparatively negligible even though there's slight regression in some 
limited case. If that kind of perf measurement hadn't been done before 
getting accepted/merged, I think at least it should allow both modes to 
coexist for a while such that every user could gauge the performance effect.

Thanks,
-Siwei

  	buf = virtnet_rq_alloc(rq, len + room, gfp);
  	if (unlikely(!buf))
  		return -ENOMEM;