From: Mustafa Ismail <mustafa.ismail@xxxxxxxxx>
Running fio can occasionally cause a hang when sbitmap_queue_get() fails to
return a tag in iscsit_allocate_cmd() and iscsit_wait_for_tag() is called
and will never return from the schedule(). This is because the polling
thread of the CQ is suspended, and will not poll for a SQ completion which
would free up a tag.
Fix this by creating a separate CQ for the SQ so that send completions are
processed on a separate thread and are not blocked when the RQ CQ is
stalled.
Fixes: 10e9cbb6b531 ("scsi: target: Convert target drivers to use sbitmap")
Is this the real offending commit? What prevented this from happening
before?
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@xxxxxxxxx>
Signed-off-by: Mustafa Ismail <mustafa.ismail@xxxxxxxxx>
Signed-off-by: Shiraz Saleem <shiraz.saleem@xxxxxxxxx>
---
drivers/infiniband/ulp/isert/ib_isert.c | 33 +++++++++++++++++++++++----------
drivers/infiniband/ulp/isert/ib_isert.h | 3 ++-
2 files changed, 25 insertions(+), 11 deletions(-)
diff --git a/drivers/infiniband/ulp/isert/ib_isert.c b/drivers/infiniband/ulp/isert/ib_isert.c
index 7540488..f827b91 100644
--- a/drivers/infiniband/ulp/isert/ib_isert.c
+++ b/drivers/infiniband/ulp/isert/ib_isert.c
@@ -109,19 +109,27 @@ static int isert_sg_tablesize_set(const char *val, const struct kernel_param *kp
struct ib_qp_init_attr attr;
int ret, factor;
- isert_conn->cq = ib_cq_pool_get(ib_dev, cq_size, -1, IB_POLL_WORKQUEUE);
- if (IS_ERR(isert_conn->cq)) {
- isert_err("Unable to allocate cq\n");
- ret = PTR_ERR(isert_conn->cq);
+ isert_conn->snd_cq = ib_cq_pool_get(ib_dev, cq_size, -1,
+ IB_POLL_WORKQUEUE);
+ if (IS_ERR(isert_conn->snd_cq)) {
+ isert_err("Unable to allocate send cq\n");
+ ret = PTR_ERR(isert_conn->snd_cq);
return ERR_PTR(ret);
}
+ isert_conn->rcv_cq = ib_cq_pool_get(ib_dev, cq_size, -1,
+ IB_POLL_WORKQUEUE);
+ if (IS_ERR(isert_conn->rcv_cq)) {
+ isert_err("Unable to allocate receive cq\n");
+ ret = PTR_ERR(isert_conn->rcv_cq);
+ goto create_cq_err;
+ }
Does this have any noticeable performance implications?
Also I wander if there are any other assumptions in the code
for having a single context processing completions...
It'd be much easier if iscsi_allocate_cmd could accept
a timeout to fail...
CCing target-devel and Mike.