Re: [PATCH net v2] net: sfc: add missing xdp queue reinitialization

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 4/1/22 20:06, Martin Habets wrote:

Hi Martin,
Thank you so much for your review!

> Hi Taehee,
>
> Thanks for looking into this. Unfortunately efx_realloc_channels()
> has turned out to be quite fragile over the years, so I'm
> keen to remove it in stead of patching it up all the time.

I agree with you.
efx_realloc_channels() is too complex.

>
> Could you try the patch below please?
> If it works ok for you as well we'll be able to remove
> efx_realloc_channels(). The added advantage of this approach
> is that the netdev notifiers get informed of the change.

I tested your patch and I found a page reference count problem.
How to test:
1. set up XDP_TX
2. traffic on
3. traffic off
4. ring buffer size change
5. loop from 2 to 4.

[ 87.836195][ T72] BUG: Bad page state in process kworker/u16:1 pfn:125445 [ 87.843356][ T72] page:000000003725f642 refcount:-2 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x125445 [ 87.853783][ T72] flags: 0x200000000000000(node=0|zone=2) [ 87.859391][ T72] raw: 0200000000000000 dead000000000100 dead000000000122 0000000000000000 [ 87.867928][ T72] raw: 0000000000000000 0000000000000000 fffffffeffffffff 0000000000000000 [ 87.876569][ T72] page dumped because: nonzero _refcount [ 87.882125][ T72] Modules linked in: af_packet sfc ixgbe mtd atlantic coretemp mdio hwmon sch_fq_codel msr bpf_prelx [ 87.895331][ T72] CPU: 0 PID: 72 Comm: kworker/u16:1 Not tainted 5.17.0+ #62 dbf33652f22e5147659e7e2472bb962779c4833 [ 87.906350][ T72] Hardware name: ASUS System Product Name/PRIME Z690-P D4, BIOS 0603 11/01/2021 [ 87.915360][ T72] Workqueue: netns cleanup_net [ 87.920087][ T72] Call Trace: [ 87.923311][ T72] <TASK> [ 87.926188][ T72] dump_stack_lvl+0x56/0x7b [ 87.930597][ T72] bad_page.cold.125+0x63/0x93 [ 87.935288][ T72] free_pcppages_bulk+0x63c/0x6f0 [ 87.940232][ T72] free_unref_page+0x8b/0xf0 [ 87.944749][ T72] efx_fini_rx_queue+0x15f/0x210 [sfc 49c5d4f562a40c6a7ed913c25f5bd4e126bcfa4e] [ 87.953756][ T72] efx_stop_channels+0xef/0x1b0 [sfc 49c5d4f562a40c6a7ed913c25f5bd4e126bcfa4e] [ 87.962699][ T72] efx_net_stop+0x4d/0x60 [sfc 49c5d4f562a40c6a7ed913c25f5bd4e126bcfa4e] [ 87.971029][ T72] __dev_close_many+0x8b/0xf0 [ 87.975618][ T72] dev_close_many+0x7d/0x120
[ ... ]


In addition, I would like to share issues that I'm currently looking into:
1. TX DMA error
when interface down/up or ring buffer size changes, TX DMA error would occur
because tx_queue can be used before initialization.
But It will be fixed by the below patch.

 static void efx_ethtool_get_wol(struct net_device *net_dev,
diff --git a/drivers/net/ethernet/sfc/tx.c b/drivers/net/ethernet/sfc/tx.c
index d16e031e95f4..6983799e1c05 100644
--- a/drivers/net/ethernet/sfc/tx.c
+++ b/drivers/net/ethernet/sfc/tx.c
@@ -443,6 +443,9 @@ int efx_xdp_tx_buffers(struct efx_nic *efx, int n, struct xdp_frame **xdpfs,
        if (unlikely(!tx_queue))
                return -EINVAL;

+       if (!tx_queue->initialised)
+               return -EINVAL;
+
        if (efx->xdp_txq_queues_mode != EFX_XDP_TX_QUEUES_DEDICATED)
                HARD_TX_LOCK(efx->net_dev, tx_queue->core_txq, cpu);

diff --git a/drivers/net/ethernet/sfc/tx_common.c b/drivers/net/ethernet/sfc/tx_common.c
index d530cde2b864..9bc8281b7f5b 100644
--- a/drivers/net/ethernet/sfc/tx_common.c
+++ b/drivers/net/ethernet/sfc/tx_common.c
@@ -101,6 +101,8 @@ void efx_fini_tx_queue(struct efx_tx_queue *tx_queue)
        netif_dbg(tx_queue->efx, drv, tx_queue->efx->net_dev,
                  "shutting down TX queue %d\n", tx_queue->queue);

+       tx_queue->initialised = false;
+
        if (!tx_queue->buffer)
                return;

After your patch, unfortunately, it can't fix ring buffer size change case.
It can fix only interface down/up case.
I will look into this more.

2. Memory leak
There is a memory leak in ring buffer size change logic.
reproducer:
   while :
   do
       ethtool -G <interface name> rx 2048 tx 2048
       ethtool -G <interface name> rx 1024 tx 1024
   done

Thanks a lot,
Taehee Yoo

>
> Regards,
> Martin Habets <habetsm.xilinx@xxxxxxxxx>
>
> ---
>   drivers/net/ethernet/sfc/ethtool.c |   13 ++++++++++++-
>   1 file changed, 12 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/net/ethernet/sfc/ethtool.c b/drivers/net/ethernet/sfc/ethtool.c
> index 48506373721a..8cfbe61737bb 100644
> --- a/drivers/net/ethernet/sfc/ethtool.c
> +++ b/drivers/net/ethernet/sfc/ethtool.c
> @@ -179,6 +179,7 @@ efx_ethtool_set_ringparam(struct net_device *net_dev,
>   {
>   	struct efx_nic *efx = netdev_priv(net_dev);
>   	u32 txq_entries;
> +	int rc = 0;
>
>   	if (ring->rx_mini_pending || ring->rx_jumbo_pending ||
>   	    ring->rx_pending > EFX_MAX_DMAQ_SIZE ||
> @@ -198,7 +199,17 @@ efx_ethtool_set_ringparam(struct net_device *net_dev,
>   			   "increasing TX queue size to minimum of %u\n",
>   			   txq_entries);
>
> -	return efx_realloc_channels(efx, ring->rx_pending, txq_entries);
> +	/* Apply the new settings */
> +	efx->rxq_entries = ring->rx_pending;
> +	efx->txq_entries = ring->tx_pending;
> +
> +	/* Update the datapath with the new settings if the interface is up */
> +	if (!efx_check_disabled(efx) && netif_running(efx->net_dev)) {
> +		dev_close(net_dev);
> +		rc = dev_open(net_dev, NULL);
> +	}
> +
> +	return rc;
>   }
>
>   static void efx_ethtool_get_wol(struct net_device *net_dev,



[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux