On Wed, 2020-05-27 at 16:47 +0300, Leon Romanovsky wrote: > From: Valentine Fatiev <valentinef@xxxxxxxxxxxx> > > While connected mode set and we have connected and datagram traffic > in parallel it might end with double free of datagram skb. > > Current mechanism assumes that the order in the completion queue is > the > same as the order of sent packets for all QPs. Order is kept only for > specific QP, in case of mixed UD and CM traffic we have few QPs(one UD > and few CM's) in parallel. > > The problem: > ---------------------------------------------------------- > > Transmit queue: > ----------------- > UD skb pointer kept in queue itself, CM skb kept in spearate queue and > uses transmit queue as a placeholder to count the number of total > transmitted packets. > > 0 1 2 3 4 5 6 7 8 9 10 11 12 13 .........127 > ------------------------------------------------------------ > NL ud1 UD2 CM1 ud3 cm2 cm3 ud4 cm4 ud5 NL NL NL ........... > ------------------------------------------------------------ > ^ ^ > tail head > > Completion queue (problematic scenario) - the order not the same as in > the transmit queue: > > 1 2 3 4 5 6 7 8 9 > ------------------------------------ > ud1 CM1 UD2 ud3 cm2 cm3 ud4 cm4 ud5 > ------------------------------------ > > 1. CM1 'wc' processing > - skb freed in cm separate ring. > - tx_tail of transmit queue increased although UD2 is not freed. > Now driver assumes UD2 index is already freed and it could be > used for > new transmitted skb. > > 0 1 2 3 4 5 6 7 8 9 10 11 12 13 .........127 > ------------------------------------------------------------ > NL NL UD2 CM1 ud3 cm2 cm3 ud4 cm4 ud5 NL NL NL ........... > ------------------------------------------------------------ > ^ ^ ^ > (Bad)tail head > (Bad - Could be used for new SKB) > > In this case (due to heavy load) UD2 skb pointer could be replaced by > new transmitted packet UD_NEW, as the driver assumes its free. > At this point we will have to process two 'wc' with same index but we > have only one pointer to free. > During second attempt to free the same skb we will have NULL pointer > exception. > > 2. UD2 'wc' processing > - skb freed according the index we got from 'wc', but it was > already > overwritten by mistake. So actually the skb that was released is > the > skb of the new transmitted packet and not the original one. > > 3. UD_NEW 'wc' processing > - attempt to free already freed skb. NUll pointer exception. > > The fix: > ---------------------------------------------------------------------- > - > The fix is to stop using the UD ring as a placeholder for CM packets, > the cyclic ring variables tx_head and tx_tail will manage the UD > tx_ring, > a new cyclic variables global_tx_head and global_tx_tail are > introduced > for managing and counting the overall outstanding sent packets, then > the > send queue will be stopped and waken based on these variables only. > > Note that no locking is needed since global_tx_head is updated in the > xmit > flow and global_tx_tail is updated in the NAPI flow only. > A previous attempt tried to use one variable to count the outstanding > sent > packets, but it did not work since xmit and NAPI flows can run at the > same > time and the counter will be updated wrongly. Thus, we use the same > simple > cyclic head and tail scheme that we have today for the UD tx_ring. > > Fixes: 2c104ea68350 ("IB/ipoib: Get rid of the tx_outstanding variable > in all modes") > Signed-off-by: Valentine Fatiev <valentinef@xxxxxxxxxxxx> > Signed-off-by: Alaa Hleihel <alaa@xxxxxxxxxxxx> > Signed-off-by: Leon Romanovsky <leonro@xxxxxxxxxxxx> This seems like a pretty important fix that should go to for-rc, not for-next. Regardless, looks good to me. Acked-by: Doug Ledford <dledford@xxxxxxxxxx> > --- > Changelog: > v1: > * Alaa rewrote the patch without atomic variables > v0: > https://lore.kernel.org/linux-rdma/20200212072635.682689-5-leon@xxxxxxxxxx/ > --- > drivers/infiniband/ulp/ipoib/ipoib.h | 4 ++++ > drivers/infiniband/ulp/ipoib/ipoib_cm.c | 15 +++++++++------ > drivers/infiniband/ulp/ipoib/ipoib_ib.c | 9 +++++++-- > drivers/infiniband/ulp/ipoib/ipoib_main.c | 10 ++++++---- > 4 files changed, 26 insertions(+), 12 deletions(-) > > diff --git a/drivers/infiniband/ulp/ipoib/ipoib.h > b/drivers/infiniband/ulp/ipoib/ipoib.h > index e188a95984b5..9a3379c49541 100644 > --- a/drivers/infiniband/ulp/ipoib/ipoib.h > +++ b/drivers/infiniband/ulp/ipoib/ipoib.h > @@ -377,8 +377,12 @@ struct ipoib_dev_priv { > struct ipoib_rx_buf *rx_ring; > > struct ipoib_tx_buf *tx_ring; > + /* cyclic ring variables for managing tx_ring, for UD only */ > unsigned int tx_head; > unsigned int tx_tail; > + /* cyclic ring variables for counting overall outstanding send > WRs */ > + unsigned int global_tx_head; > + unsigned int global_tx_tail; > struct ib_sge tx_sge[MAX_SKB_FRAGS + 1]; > struct ib_ud_wr tx_wr; > struct ib_wc send_wc[MAX_SEND_CQE]; > diff --git a/drivers/infiniband/ulp/ipoib/ipoib_cm.c > b/drivers/infiniband/ulp/ipoib/ipoib_cm.c > index c59e00a0881f..9bf0fa30df28 100644 > --- a/drivers/infiniband/ulp/ipoib/ipoib_cm.c > +++ b/drivers/infiniband/ulp/ipoib/ipoib_cm.c > @@ -756,7 +756,8 @@ void ipoib_cm_send(struct net_device *dev, struct > sk_buff *skb, struct ipoib_cm_ > return; > } > > - if ((priv->tx_head - priv->tx_tail) == ipoib_sendq_size - 1) { > + if ((priv->global_tx_head - priv->global_tx_tail) == > + ipoib_sendq_size - 1) { > ipoib_dbg(priv, "TX ring 0x%x full, stopping kernel net > queue\n", > tx->qp->qp_num); > netif_stop_queue(dev); > @@ -786,7 +787,7 @@ void ipoib_cm_send(struct net_device *dev, struct > sk_buff *skb, struct ipoib_cm_ > } else { > netif_trans_update(dev); > ++tx->tx_head; > - ++priv->tx_head; > + ++priv->global_tx_head; > } > } > > @@ -820,10 +821,11 @@ void ipoib_cm_handle_tx_wc(struct net_device > *dev, struct ib_wc *wc) > netif_tx_lock(dev); > > ++tx->tx_tail; > - ++priv->tx_tail; > + ++priv->global_tx_tail; > > if (unlikely(netif_queue_stopped(dev) && > - (priv->tx_head - priv->tx_tail) <= ipoib_sendq_size > >> 1 && > + ((priv->global_tx_head - priv->global_tx_tail) <= > + ipoib_sendq_size >> 1) && > test_bit(IPOIB_FLAG_ADMIN_UP, &priv->flags))) > netif_wake_queue(dev); > > @@ -1232,8 +1234,9 @@ static void ipoib_cm_tx_destroy(struct > ipoib_cm_tx *p) > dev_kfree_skb_any(tx_req->skb); > netif_tx_lock_bh(p->dev); > ++p->tx_tail; > - ++priv->tx_tail; > - if (unlikely(priv->tx_head - priv->tx_tail == > ipoib_sendq_size >> 1) && > + ++priv->global_tx_tail; > + if (unlikely((priv->global_tx_head - priv- > >global_tx_tail) <= > + ipoib_sendq_size >> 1) && > netif_queue_stopped(p->dev) && > test_bit(IPOIB_FLAG_ADMIN_UP, &priv->flags)) > netif_wake_queue(p->dev); > diff --git a/drivers/infiniband/ulp/ipoib/ipoib_ib.c > b/drivers/infiniband/ulp/ipoib/ipoib_ib.c > index c332b4761816..da3c5315bbb5 100644 > --- a/drivers/infiniband/ulp/ipoib/ipoib_ib.c > +++ b/drivers/infiniband/ulp/ipoib/ipoib_ib.c > @@ -407,9 +407,11 @@ static void ipoib_ib_handle_tx_wc(struct > net_device *dev, struct ib_wc *wc) > dev_kfree_skb_any(tx_req->skb); > > ++priv->tx_tail; > + ++priv->global_tx_tail; > > if (unlikely(netif_queue_stopped(dev) && > - ((priv->tx_head - priv->tx_tail) <= > ipoib_sendq_size >> 1) && > + ((priv->global_tx_head - priv->global_tx_tail) <= > + ipoib_sendq_size >> 1) && > test_bit(IPOIB_FLAG_ADMIN_UP, &priv->flags))) > netif_wake_queue(dev); > > @@ -634,7 +636,8 @@ int ipoib_send(struct net_device *dev, struct > sk_buff *skb, > else > priv->tx_wr.wr.send_flags &= ~IB_SEND_IP_CSUM; > /* increase the tx_head after send success, but use it for queue > state */ > - if (priv->tx_head - priv->tx_tail == ipoib_sendq_size - 1) { > + if ((priv->global_tx_head - priv->global_tx_tail) == > + ipoib_sendq_size - 1) { > ipoib_dbg(priv, "TX ring full, stopping kernel net > queue\n"); > netif_stop_queue(dev); > } > @@ -662,6 +665,7 @@ int ipoib_send(struct net_device *dev, struct > sk_buff *skb, > > rc = priv->tx_head; > ++priv->tx_head; > + ++priv->global_tx_head; > } > return rc; > } > @@ -807,6 +811,7 @@ int ipoib_ib_dev_stop_default(struct net_device > *dev) > ipoib_dma_unmap_tx(priv, tx_req); > dev_kfree_skb_any(tx_req->skb); > ++priv->tx_tail; > + ++priv->global_tx_tail; > } > > for (i = 0; i < ipoib_recvq_size; ++i) { > diff --git a/drivers/infiniband/ulp/ipoib/ipoib_main.c > b/drivers/infiniband/ulp/ipoib/ipoib_main.c > index d12e5c9c38af..3cfb682b91b0 100644 > --- a/drivers/infiniband/ulp/ipoib/ipoib_main.c > +++ b/drivers/infiniband/ulp/ipoib/ipoib_main.c > @@ -1183,9 +1183,11 @@ static void ipoib_timeout(struct net_device > *dev, unsigned int txqueue) > > ipoib_warn(priv, "transmit timeout: latency %d msecs\n", > jiffies_to_msecs(jiffies - dev_trans_start(dev))); > - ipoib_warn(priv, "queue stopped %d, tx_head %u, tx_tail %u\n", > - netif_queue_stopped(dev), > - priv->tx_head, priv->tx_tail); > + ipoib_warn(priv, > + "queue stopped %d, tx_head %u, tx_tail %u, > global_tx_head %u, global_tx_tail %u\n", > + netif_queue_stopped(dev), priv->tx_head, priv- > >tx_tail, > + priv->global_tx_head, priv->global_tx_tail); > + > /* XXX reset QP, etc. */ > } > > @@ -1700,7 +1702,7 @@ static int ipoib_dev_init_default(struct > net_device *dev) > goto out_rx_ring_cleanup; > } > > - /* priv->tx_head, tx_tail & tx_outstanding are already 0 */ > + /* priv->tx_head, tx_tail and global_tx_tail/head are already 0 > */ > > if (ipoib_transport_dev_init(dev, priv->ca)) { > pr_warn("%s: ipoib_transport_dev_init failed\n", > -- > 2.26.2 > -- Doug Ledford <dledford@xxxxxxxxxx> GPG KeyID: B826A3330E572FDD Fingerprint = AE6B 1BDA 122B 23B4 265B 1274 B826 A333 0E57 2FDD
Attachment:
signature.asc
Description: This is a digitally signed message part