Patch "net: enetc: avoid buffer leaks on xdp_do_redirect() failure" has been added to the 6.0-stable tree

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



This is a note to let you know that I've just added the patch titled

    net: enetc: avoid buffer leaks on xdp_do_redirect() failure

to the 6.0-stable tree which can be found at:
    http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=summary

The filename of the patch is:
     net-enetc-avoid-buffer-leaks-on-xdp_do_redirect-fail.patch
and it can be found in the queue-6.0 subdirectory.

If you, or anyone else, feels it should not be added to the stable tree,
please let <stable@xxxxxxxxxxxxxxx> know about it.



commit 3c6907f6227287932291b6cfdb73d7a6de44c7b7
Author: Vladimir Oltean <vladimir.oltean@xxxxxxx>
Date:   Tue Dec 13 02:19:08 2022 +0200

    net: enetc: avoid buffer leaks on xdp_do_redirect() failure
    
    [ Upstream commit 628050ec952d2e2e46ec9fb6aa07e41139e030c8 ]
    
    Before enetc_clean_rx_ring_xdp() calls xdp_do_redirect(), each software
    BD in the RX ring between index orig_i and i can have one of 2 refcount
    values on its page.
    
    We are the owner of the current buffer that is being processed, so the
    refcount will be at least 1.
    
    If the current owner of the buffer at the diametrically opposed index
    in the RX ring (i.o.w, the other half of this page) has not yet called
    kfree(), this page's refcount could even be 2.
    
    enetc_page_reusable() in enetc_flip_rx_buff() tests for the page
    refcount against 1, and [ if it's 2 ] does not attempt to reuse it.
    
    But if enetc_flip_rx_buff() is put after the xdp_do_redirect() call,
    the page refcount can have one of 3 values. It can also be 0, if there
    is no owner of the other page half, and xdp_do_redirect() for this
    buffer ran so far that it triggered a flush of the devmap/cpumap bulk
    queue, and the consumers of those bulk queues also freed the buffer,
    all by the time xdp_do_redirect() returns the execution back to enetc.
    
    This is the reason why enetc_flip_rx_buff() is called before
    xdp_do_redirect(), but there is a big flaw with that reasoning:
    enetc_flip_rx_buff() will set rx_swbd->page = NULL on both sides of the
    enetc_page_reusable() branch, and if xdp_do_redirect() returns an error,
    we call enetc_xdp_free(), which does not deal gracefully with that.
    
    In fact, what happens is quite special. The page refcounts start as 1.
    enetc_flip_rx_buff() figures they're reusable, transfers these
    rx_swbd->page pointers to a different rx_swbd in enetc_reuse_page(), and
    bumps the refcount to 2. When xdp_do_redirect() later returns an error,
    we call the no-op enetc_xdp_free(), but we still haven't lost the
    reference to that page. A copy of it is still at rx_ring->next_to_alloc,
    but that has refcount 2 (and there are no concurrent owners of it in
    flight, to drop the refcount). What really kills the system is when
    we'll flip the rx_swbd->page the second time around. With an updated
    refcount of 2, the page will not be reusable and we'll really leak it.
    Then enetc_new_page() will have to allocate more pages, which will then
    eventually leak again on further errors from xdp_do_redirect().
    
    The problem, summarized, is that we zeroize rx_swbd->page before we're
    completely done with it, and this makes it impossible for the error path
    to do something with it.
    
    Since the packet is potentially multi-buffer and therefore the
    rx_swbd->page is potentially an array, manual passing of the old
    pointers between enetc_flip_rx_buff() and enetc_xdp_free() is a bit
    difficult.
    
    For the sake of going with a simple solution, we accept the possibility
    of racing with xdp_do_redirect(), and we move the flip procedure to
    execute only on the redirect success path. By racing, I mean that the
    page may be deemed as not reusable by enetc (having a refcount of 0),
    but there will be no leak in that case, either.
    
    Once we accept that, we have something better to do with buffers on
    XDP_REDIRECT failure. Since we haven't performed half-page flipping yet,
    we won't, either (and this way, we can avoid enetc_xdp_free()
    completely, which gives the entire page to the slab allocator).
    Instead, we'll call enetc_xdp_drop(), which will recycle this half of
    the buffer back to the RX ring.
    
    Fixes: 9d2b68cc108d ("net: enetc: add support for XDP_REDIRECT")
    Suggested-by: Lorenzo Bianconi <lorenzo.bianconi@xxxxxxxxxx>
    Signed-off-by: Vladimir Oltean <vladimir.oltean@xxxxxxx>
    Link: https://lore.kernel.org/r/20221213001908.2347046-1-vladimir.oltean@xxxxxxx
    Signed-off-by: Jakub Kicinski <kuba@xxxxxxxxxx>
    Signed-off-by: Sasha Levin <sashal@xxxxxxxxxx>

diff --git a/drivers/net/ethernet/freescale/enetc/enetc.c b/drivers/net/ethernet/freescale/enetc/enetc.c
index 1d8ec1b120a1..525d506797fc 100644
--- a/drivers/net/ethernet/freescale/enetc/enetc.c
+++ b/drivers/net/ethernet/freescale/enetc/enetc.c
@@ -1489,23 +1489,6 @@ static void enetc_xdp_drop(struct enetc_bdr *rx_ring, int rx_ring_first,
 	rx_ring->stats.xdp_drops++;
 }
 
-static void enetc_xdp_free(struct enetc_bdr *rx_ring, int rx_ring_first,
-			   int rx_ring_last)
-{
-	while (rx_ring_first != rx_ring_last) {
-		struct enetc_rx_swbd *rx_swbd = &rx_ring->rx_swbd[rx_ring_first];
-
-		if (rx_swbd->page) {
-			dma_unmap_page(rx_ring->dev, rx_swbd->dma, PAGE_SIZE,
-				       rx_swbd->dir);
-			__free_page(rx_swbd->page);
-			rx_swbd->page = NULL;
-		}
-		enetc_bdr_idx_inc(rx_ring, &rx_ring_first);
-	}
-	rx_ring->stats.xdp_redirect_failures++;
-}
-
 static int enetc_clean_rx_ring_xdp(struct enetc_bdr *rx_ring,
 				   struct napi_struct *napi, int work_limit,
 				   struct bpf_prog *prog)
@@ -1527,8 +1510,8 @@ static int enetc_clean_rx_ring_xdp(struct enetc_bdr *rx_ring,
 		int orig_i, orig_cleaned_cnt;
 		struct xdp_buff xdp_buff;
 		struct sk_buff *skb;
-		int tmp_orig_i, err;
 		u32 bd_status;
+		int err;
 
 		rxbd = enetc_rxbd(rx_ring, i);
 		bd_status = le32_to_cpu(rxbd->r.lstatus);
@@ -1615,18 +1598,16 @@ static int enetc_clean_rx_ring_xdp(struct enetc_bdr *rx_ring,
 				break;
 			}
 
-			tmp_orig_i = orig_i;
-
-			while (orig_i != i) {
-				enetc_flip_rx_buff(rx_ring,
-						   &rx_ring->rx_swbd[orig_i]);
-				enetc_bdr_idx_inc(rx_ring, &orig_i);
-			}
-
 			err = xdp_do_redirect(rx_ring->ndev, &xdp_buff, prog);
 			if (unlikely(err)) {
-				enetc_xdp_free(rx_ring, tmp_orig_i, i);
+				enetc_xdp_drop(rx_ring, orig_i, i);
+				rx_ring->stats.xdp_redirect_failures++;
 			} else {
+				while (orig_i != i) {
+					enetc_flip_rx_buff(rx_ring,
+							   &rx_ring->rx_swbd[orig_i]);
+					enetc_bdr_idx_inc(rx_ring, &orig_i);
+				}
 				xdp_redirect_frm_cnt++;
 				rx_ring->stats.xdp_redirect++;
 			}



[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux