This is a note to let you know that I've just added the patch titled IB/hfi1: Prevent kernel QP post send hard lockups to the 4.9-stable tree which can be found at: http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=summary The filename of the patch is: ib-hfi1-prevent-kernel-qp-post-send-hard-lockups.patch and it can be found in the queue-4.9 subdirectory. If you, or anyone else, feels it should not be added to the stable tree, please let <stable@xxxxxxxxxxxxxxx> know about it. >From b6eac931b9bb2bce4db7032c35b41e5e34ec22a5 Mon Sep 17 00:00:00 2001 From: Mike Marciniszyn <mike.marciniszyn@xxxxxxxxx> Date: Sun, 9 Apr 2017 10:16:35 -0700 Subject: IB/hfi1: Prevent kernel QP post send hard lockups From: Mike Marciniszyn <mike.marciniszyn@xxxxxxxxx> commit b6eac931b9bb2bce4db7032c35b41e5e34ec22a5 upstream. The driver progress routines can call cond_resched() when a timeslice is exhausted and irqs are enabled. If the ULP had been holding a spin lock without disabling irqs and the post send directly called the progress routine, the cond_resched() could yield allowing another thread from the same ULP to deadlock on that same lock. Correct by replacing the current hfi1_do_send() calldown with a unique one for post send and adding an argument to hfi1_do_send() to indicate that the send engine is running in a thread. If the routine is not running in a thread, avoid calling cond_resched(). Fixes: Commit 831464ce4b74 ("IB/hfi1: Don't call cond_resched in atomic mode when sending packets") Reviewed-by: Dennis Dalessandro <dennis.dalessandro@xxxxxxxxx> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@xxxxxxxxx> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@xxxxxxxxx> Signed-off-by: Doug Ledford <dledford@xxxxxxxxxx> Signed-off-by: Greg Kroah-Hartman <gregkh@xxxxxxxxxxxxxxxxxxx> --- drivers/infiniband/hw/hfi1/ruc.c | 26 ++++++++++++++++---------- drivers/infiniband/hw/hfi1/verbs.c | 4 ++-- drivers/infiniband/hw/hfi1/verbs.h | 6 ++++-- 3 files changed, 22 insertions(+), 14 deletions(-) --- a/drivers/infiniband/hw/hfi1/ruc.c +++ b/drivers/infiniband/hw/hfi1/ruc.c @@ -1,5 +1,5 @@ /* - * Copyright(c) 2015, 2016 Intel Corporation. + * Copyright(c) 2015 - 2017 Intel Corporation. * * This file is provided under a dual BSD/GPLv2 license. When using or * redistributing this file, you may do so under either license. @@ -833,23 +833,29 @@ void hfi1_make_ruc_header(struct rvt_qp /* when sending, force a reschedule every one of these periods */ #define SEND_RESCHED_TIMEOUT (5 * HZ) /* 5s in jiffies */ +void hfi1_do_send_from_rvt(struct rvt_qp *qp) +{ + hfi1_do_send(qp, false); +} + void _hfi1_do_send(struct work_struct *work) { struct iowait *wait = container_of(work, struct iowait, iowork); struct rvt_qp *qp = iowait_to_qp(wait); - hfi1_do_send(qp); + hfi1_do_send(qp, true); } /** * hfi1_do_send - perform a send on a QP * @work: contains a pointer to the QP + * @in_thread: true if in a workqueue thread * * Process entries in the send work queue until credit or queue is * exhausted. Only allow one CPU to send a packet per QP. * Otherwise, two threads could send packets out of order. */ -void hfi1_do_send(struct rvt_qp *qp) +void hfi1_do_send(struct rvt_qp *qp, bool in_thread) { struct hfi1_pkt_state ps; struct hfi1_qp_priv *priv = qp->priv; @@ -917,8 +923,10 @@ void hfi1_do_send(struct rvt_qp *qp) qp->s_hdrwords = 0; /* allow other tasks to run */ if (unlikely(time_after(jiffies, timeout))) { - if (workqueue_congested(cpu, - ps.ppd->hfi1_wq)) { + if (!in_thread || + workqueue_congested( + cpu, + ps.ppd->hfi1_wq)) { spin_lock_irqsave( &qp->s_lock, ps.flags); @@ -931,11 +939,9 @@ void hfi1_do_send(struct rvt_qp *qp) *ps.ppd->dd->send_schedule); return; } - if (!irqs_disabled()) { - cond_resched(); - this_cpu_inc( - *ps.ppd->dd->send_schedule); - } + cond_resched(); + this_cpu_inc( + *ps.ppd->dd->send_schedule); timeout = jiffies + (timeout_int) / 8; } spin_lock_irqsave(&qp->s_lock, ps.flags); --- a/drivers/infiniband/hw/hfi1/verbs.c +++ b/drivers/infiniband/hw/hfi1/verbs.c @@ -1,5 +1,5 @@ /* - * Copyright(c) 2015, 2016 Intel Corporation. + * Copyright(c) 2015 - 2017 Intel Corporation. * * This file is provided under a dual BSD/GPLv2 license. When using or * redistributing this file, you may do so under either license. @@ -1697,7 +1697,7 @@ int hfi1_register_ib_device(struct hfi1_ dd->verbs_dev.rdi.driver_f.qp_priv_free = qp_priv_free; dd->verbs_dev.rdi.driver_f.free_all_qps = free_all_qps; dd->verbs_dev.rdi.driver_f.notify_qp_reset = notify_qp_reset; - dd->verbs_dev.rdi.driver_f.do_send = hfi1_do_send; + dd->verbs_dev.rdi.driver_f.do_send = hfi1_do_send_from_rvt; dd->verbs_dev.rdi.driver_f.schedule_send = hfi1_schedule_send; dd->verbs_dev.rdi.driver_f.schedule_send_no_lock = _hfi1_schedule_send; dd->verbs_dev.rdi.driver_f.get_pmtu_from_attr = get_pmtu_from_attr; --- a/drivers/infiniband/hw/hfi1/verbs.h +++ b/drivers/infiniband/hw/hfi1/verbs.h @@ -1,5 +1,5 @@ /* - * Copyright(c) 2015, 2016 Intel Corporation. + * Copyright(c) 2015 - 2017 Intel Corporation. * * This file is provided under a dual BSD/GPLv2 license. When using or * redistributing this file, you may do so under either license. @@ -372,7 +372,9 @@ void hfi1_make_ruc_header(struct rvt_qp void _hfi1_do_send(struct work_struct *work); -void hfi1_do_send(struct rvt_qp *qp); +void hfi1_do_send_from_rvt(struct rvt_qp *qp); + +void hfi1_do_send(struct rvt_qp *qp, bool in_thread); void hfi1_send_complete(struct rvt_qp *qp, struct rvt_swqe *wqe, enum ib_wc_status status); Patches currently in stable-queue which might be from mike.marciniszyn@xxxxxxxxx are queue-4.9/ib-hfi1-prevent-kernel-qp-post-send-hard-lockups.patch