Re: [PATCH rdma-next v1] IB/ipoib: Fix double free of skb in case of multicast traffic in CM mode

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, 2020-05-27 at 16:47 +0300, Leon Romanovsky wrote:
> From: Valentine Fatiev <valentinef@xxxxxxxxxxxx>
> 
> While connected mode set and we have connected and datagram traffic
> in parallel it might end with double free of datagram skb.
> 
> Current mechanism assumes that the order in the completion queue is
> the
> same as the order of sent packets for all QPs. Order is kept only for
> specific QP, in case of mixed UD and CM traffic we have few QPs(one UD
> and few CM's) in parallel.
> 
> The problem:
> ----------------------------------------------------------
> 
> Transmit queue:
> -----------------
> UD skb pointer kept in queue itself, CM skb kept in spearate queue and
> uses transmit queue as a placeholder to count the number of total
> transmitted packets.
> 
> 0   1   2   3   4  5  6  7  8   9  10  11 12 13 .........127
> ------------------------------------------------------------
> NL ud1 UD2 CM1 ud3 cm2 cm3 ud4 cm4 ud5 NL NL NL ...........
> ------------------------------------------------------------
>     ^                                  ^
>    tail                               head
> 
> Completion queue (problematic scenario) - the order not the same as in
> the transmit queue:
> 
>   1  2  3  4  5  6  7  8  9
> ------------------------------------
>  ud1 CM1 UD2 ud3 cm2 cm3 ud4 cm4 ud5
> ------------------------------------
> 
> 1. CM1 'wc' processing
>    - skb freed in cm separate ring.
>    - tx_tail of transmit queue increased although UD2 is not freed.
>      Now driver assumes UD2 index is already freed and it could be
> used for
>      new transmitted skb.
> 
> 0   1   2   3   4  5  6  7  8   9  10  11 12 13 .........127
> ------------------------------------------------------------
> NL NL  UD2 CM1 ud3 cm2 cm3 ud4 cm4 ud5 NL NL NL ...........
> ------------------------------------------------------------
>         ^   ^                       ^
>       (Bad)tail                    head
> (Bad - Could be used for new SKB)
> 
> In this case (due to heavy load) UD2 skb pointer could be replaced by
> new transmitted packet UD_NEW, as the driver assumes its free.
> At this point we will have to process two 'wc' with same index but we
> have only one pointer to free.
> During second attempt to free the same skb we will have NULL pointer
> exception.
> 
> 2. UD2 'wc' processing
>    - skb freed according the index we got from 'wc', but it was
> already
>      overwritten by mistake. So actually the skb that was released is
> the
>      skb of the new transmitted packet and not the original one.
> 
> 3. UD_NEW 'wc' processing
>    - attempt to free already freed skb. NUll pointer exception.
> 
> The fix:
> ----------------------------------------------------------------------
> -
> The fix is to stop using the UD ring as a placeholder for CM packets,
> the cyclic ring variables tx_head and tx_tail will manage the UD
> tx_ring,
> a new cyclic variables global_tx_head and global_tx_tail are
> introduced
> for managing and counting the overall outstanding sent packets, then
> the
> send queue will be stopped and waken based on these variables only.
> 
> Note that no locking is needed since global_tx_head is updated in the
> xmit
> flow and global_tx_tail is updated in the NAPI flow only.
> A previous attempt tried to use one variable to count the outstanding
> sent
> packets, but it did not work since xmit and NAPI flows can run at the
> same
> time and the counter will be updated wrongly. Thus, we use the same
> simple
> cyclic head and tail scheme that we have today for the UD tx_ring.
> 
> Fixes: 2c104ea68350 ("IB/ipoib: Get rid of the tx_outstanding variable
> in all modes")
> Signed-off-by: Valentine Fatiev <valentinef@xxxxxxxxxxxx>
> Signed-off-by: Alaa Hleihel <alaa@xxxxxxxxxxxx>
> Signed-off-by: Leon Romanovsky <leonro@xxxxxxxxxxxx>

This seems like a pretty important fix that should go to for-rc, not
for-next.

Regardless, looks good to me.
Acked-by: Doug Ledford <dledford@xxxxxxxxxx>

> ---
> Changelog:
> v1:
>  * Alaa rewrote the patch without atomic variables
> v0: 
> https://lore.kernel.org/linux-rdma/20200212072635.682689-5-leon@xxxxxxxxxx/
> ---
>  drivers/infiniband/ulp/ipoib/ipoib.h      |  4 ++++
>  drivers/infiniband/ulp/ipoib/ipoib_cm.c   | 15 +++++++++------
>  drivers/infiniband/ulp/ipoib/ipoib_ib.c   |  9 +++++++--
>  drivers/infiniband/ulp/ipoib/ipoib_main.c | 10 ++++++----
>  4 files changed, 26 insertions(+), 12 deletions(-)
> 
> diff --git a/drivers/infiniband/ulp/ipoib/ipoib.h
> b/drivers/infiniband/ulp/ipoib/ipoib.h
> index e188a95984b5..9a3379c49541 100644
> --- a/drivers/infiniband/ulp/ipoib/ipoib.h
> +++ b/drivers/infiniband/ulp/ipoib/ipoib.h
> @@ -377,8 +377,12 @@ struct ipoib_dev_priv {
>  	struct ipoib_rx_buf *rx_ring;
> 
>  	struct ipoib_tx_buf *tx_ring;
> +	/* cyclic ring variables for managing tx_ring, for UD only */
>  	unsigned int	     tx_head;
>  	unsigned int	     tx_tail;
> +	/* cyclic ring variables for counting overall outstanding send
> WRs */
> +	unsigned int	     global_tx_head;
> +	unsigned int	     global_tx_tail;
>  	struct ib_sge	     tx_sge[MAX_SKB_FRAGS + 1];
>  	struct ib_ud_wr      tx_wr;
>  	struct ib_wc	     send_wc[MAX_SEND_CQE];
> diff --git a/drivers/infiniband/ulp/ipoib/ipoib_cm.c
> b/drivers/infiniband/ulp/ipoib/ipoib_cm.c
> index c59e00a0881f..9bf0fa30df28 100644
> --- a/drivers/infiniband/ulp/ipoib/ipoib_cm.c
> +++ b/drivers/infiniband/ulp/ipoib/ipoib_cm.c
> @@ -756,7 +756,8 @@ void ipoib_cm_send(struct net_device *dev, struct
> sk_buff *skb, struct ipoib_cm_
>  		return;
>  	}
> 
> -	if ((priv->tx_head - priv->tx_tail) == ipoib_sendq_size - 1) {
> +	if ((priv->global_tx_head - priv->global_tx_tail) ==
> +	    ipoib_sendq_size - 1) {
>  		ipoib_dbg(priv, "TX ring 0x%x full, stopping kernel net
> queue\n",
>  			  tx->qp->qp_num);
>  		netif_stop_queue(dev);
> @@ -786,7 +787,7 @@ void ipoib_cm_send(struct net_device *dev, struct
> sk_buff *skb, struct ipoib_cm_
>  	} else {
>  		netif_trans_update(dev);
>  		++tx->tx_head;
> -		++priv->tx_head;
> +		++priv->global_tx_head;
>  	}
>  }
> 
> @@ -820,10 +821,11 @@ void ipoib_cm_handle_tx_wc(struct net_device
> *dev, struct ib_wc *wc)
>  	netif_tx_lock(dev);
> 
>  	++tx->tx_tail;
> -	++priv->tx_tail;
> +	++priv->global_tx_tail;
> 
>  	if (unlikely(netif_queue_stopped(dev) &&
> -		     (priv->tx_head - priv->tx_tail) <= ipoib_sendq_size
> >> 1 &&
> +		     ((priv->global_tx_head - priv->global_tx_tail) <=
> +		      ipoib_sendq_size >> 1) &&
>  		     test_bit(IPOIB_FLAG_ADMIN_UP, &priv->flags)))
>  		netif_wake_queue(dev);
> 
> @@ -1232,8 +1234,9 @@ static void ipoib_cm_tx_destroy(struct
> ipoib_cm_tx *p)
>  		dev_kfree_skb_any(tx_req->skb);
>  		netif_tx_lock_bh(p->dev);
>  		++p->tx_tail;
> -		++priv->tx_tail;
> -		if (unlikely(priv->tx_head - priv->tx_tail ==
> ipoib_sendq_size >> 1) &&
> +		++priv->global_tx_tail;
> +		if (unlikely((priv->global_tx_head - priv-
> >global_tx_tail) <=
> +			     ipoib_sendq_size >> 1) &&
>  		    netif_queue_stopped(p->dev) &&
>  		    test_bit(IPOIB_FLAG_ADMIN_UP, &priv->flags))
>  			netif_wake_queue(p->dev);
> diff --git a/drivers/infiniband/ulp/ipoib/ipoib_ib.c
> b/drivers/infiniband/ulp/ipoib/ipoib_ib.c
> index c332b4761816..da3c5315bbb5 100644
> --- a/drivers/infiniband/ulp/ipoib/ipoib_ib.c
> +++ b/drivers/infiniband/ulp/ipoib/ipoib_ib.c
> @@ -407,9 +407,11 @@ static void ipoib_ib_handle_tx_wc(struct
> net_device *dev, struct ib_wc *wc)
>  	dev_kfree_skb_any(tx_req->skb);
> 
>  	++priv->tx_tail;
> +	++priv->global_tx_tail;
> 
>  	if (unlikely(netif_queue_stopped(dev) &&
> -		     ((priv->tx_head - priv->tx_tail) <=
> ipoib_sendq_size >> 1) &&
> +		     ((priv->global_tx_head - priv->global_tx_tail) <=
> +		      ipoib_sendq_size >> 1) &&
>  		     test_bit(IPOIB_FLAG_ADMIN_UP, &priv->flags)))
>  		netif_wake_queue(dev);
> 
> @@ -634,7 +636,8 @@ int ipoib_send(struct net_device *dev, struct
> sk_buff *skb,
>  	else
>  		priv->tx_wr.wr.send_flags &= ~IB_SEND_IP_CSUM;
>  	/* increase the tx_head after send success, but use it for queue
> state */
> -	if (priv->tx_head - priv->tx_tail == ipoib_sendq_size - 1) {
> +	if ((priv->global_tx_head - priv->global_tx_tail) ==
> +	    ipoib_sendq_size - 1) {
>  		ipoib_dbg(priv, "TX ring full, stopping kernel net
> queue\n");
>  		netif_stop_queue(dev);
>  	}
> @@ -662,6 +665,7 @@ int ipoib_send(struct net_device *dev, struct
> sk_buff *skb,
> 
>  		rc = priv->tx_head;
>  		++priv->tx_head;
> +		++priv->global_tx_head;
>  	}
>  	return rc;
>  }
> @@ -807,6 +811,7 @@ int ipoib_ib_dev_stop_default(struct net_device
> *dev)
>  				ipoib_dma_unmap_tx(priv, tx_req);
>  				dev_kfree_skb_any(tx_req->skb);
>  				++priv->tx_tail;
> +				++priv->global_tx_tail;
>  			}
> 
>  			for (i = 0; i < ipoib_recvq_size; ++i) {
> diff --git a/drivers/infiniband/ulp/ipoib/ipoib_main.c
> b/drivers/infiniband/ulp/ipoib/ipoib_main.c
> index d12e5c9c38af..3cfb682b91b0 100644
> --- a/drivers/infiniband/ulp/ipoib/ipoib_main.c
> +++ b/drivers/infiniband/ulp/ipoib/ipoib_main.c
> @@ -1183,9 +1183,11 @@ static void ipoib_timeout(struct net_device
> *dev, unsigned int txqueue)
> 
>  	ipoib_warn(priv, "transmit timeout: latency %d msecs\n",
>  		   jiffies_to_msecs(jiffies - dev_trans_start(dev)));
> -	ipoib_warn(priv, "queue stopped %d, tx_head %u, tx_tail %u\n",
> -		   netif_queue_stopped(dev),
> -		   priv->tx_head, priv->tx_tail);
> +	ipoib_warn(priv,
> +		   "queue stopped %d, tx_head %u, tx_tail %u,
> global_tx_head %u, global_tx_tail %u\n",
> +		   netif_queue_stopped(dev), priv->tx_head, priv-
> >tx_tail,
> +		   priv->global_tx_head, priv->global_tx_tail);
> +
>  	/* XXX reset QP, etc. */
>  }
> 
> @@ -1700,7 +1702,7 @@ static int ipoib_dev_init_default(struct
> net_device *dev)
>  		goto out_rx_ring_cleanup;
>  	}
> 
> -	/* priv->tx_head, tx_tail & tx_outstanding are already 0 */
> +	/* priv->tx_head, tx_tail and global_tx_tail/head are already 0
> */
> 
>  	if (ipoib_transport_dev_init(dev, priv->ca)) {
>  		pr_warn("%s: ipoib_transport_dev_init failed\n",
> --
> 2.26.2
> 

-- 
Doug Ledford <dledford@xxxxxxxxxx>
    GPG KeyID: B826A3330E572FDD
    Fingerprint = AE6B 1BDA 122B 23B4 265B  1274 B826 A333 0E57 2FDD

Attachment: signature.asc
Description: This is a digitally signed message part


[Index of Archives]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Photo]     [Yosemite News]     [Yosemite Photos]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux