This is a note to let you know that I've just added the patch titled IPoIB: Fix send lockup due to missed TX completion to the 3.0-stable tree which can be found at: http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=summary The filename of the patch is: ipoib-fix-send-lockup-due-to-missed-tx-completion.patch and it can be found in the queue-3.0 subdirectory. If you, or anyone else, feels it should not be added to the stable tree, please let <stable@xxxxxxxxxxxxxxx> know about it. >From 1ee9e2aa7b31427303466776f455d43e5e3c9275 Mon Sep 17 00:00:00 2001 From: Mike Marciniszyn <mike.marciniszyn@xxxxxxxxx> Date: Tue, 26 Feb 2013 15:46:27 +0000 Subject: IPoIB: Fix send lockup due to missed TX completion From: Mike Marciniszyn <mike.marciniszyn@xxxxxxxxx> commit 1ee9e2aa7b31427303466776f455d43e5e3c9275 upstream. Commit f0dc117abdfa ("IPoIB: Fix TX queue lockup with mixed UD/CM traffic") attempts to solve an issue where unprocessed UD send completions can deadlock the netdev. The patch doesn't fully resolve the issue because if more than half the tx_outstanding's were UD and all of the destinations are RC reachable, arming the CQ doesn't solve the issue. This patch uses the IB_CQ_REPORT_MISSED_EVENTS on the ib_req_notify_cq(). If the rc is above 0, the UD send cq completion callback is called directly to re-arm the send completion timer. This issue is seen in very large parallel filesystem deployments and the patch has been shown to correct the issue. Reviewed-by: Dean Luick <dean.luick@xxxxxxxxx> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@xxxxxxxxx> Signed-off-by: Roland Dreier <roland@xxxxxxxxxxxxxxx> Signed-off-by: Greg Kroah-Hartman <gregkh@xxxxxxxxxxxxxxxxxxx> --- drivers/infiniband/ulp/ipoib/ipoib_cm.c | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-) --- a/drivers/infiniband/ulp/ipoib/ipoib_cm.c +++ b/drivers/infiniband/ulp/ipoib/ipoib_cm.c @@ -753,9 +753,13 @@ void ipoib_cm_send(struct net_device *de if (++priv->tx_outstanding == ipoib_sendq_size) { ipoib_dbg(priv, "TX ring 0x%x full, stopping kernel net queue\n", tx->qp->qp_num); - if (ib_req_notify_cq(priv->send_cq, IB_CQ_NEXT_COMP)) - ipoib_warn(priv, "request notify on send CQ failed\n"); netif_stop_queue(dev); + rc = ib_req_notify_cq(priv->send_cq, + IB_CQ_NEXT_COMP | IB_CQ_REPORT_MISSED_EVENTS); + if (rc < 0) + ipoib_warn(priv, "request notify on send CQ failed\n"); + else if (rc) + ipoib_send_comp_handler(priv->send_cq, dev); } } } Patches currently in stable-queue which might be from mike.marciniszyn@xxxxxxxxx are queue-3.0/ipoib-fix-send-lockup-due-to-missed-tx-completion.patch queue-3.0/x86-64-fix-the-failure-case-in-copy_user_handle_tail.patch -- To unsubscribe from this list: send the line "unsubscribe stable" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html