When using the write()/read() interface for submitting commands, the SCSI generic driver does not call blk_put_request() on a completed SCSI command until userspace calls read() to get the command completion. Since scsi-mq uses a fixed number of preallocated requests, this makes it possible for userspace to exhaust the entire preallocated supply of requests, leading to deadlock with the user process stuck in a permanent unkillable I/O wait in sg_write() -> ... -> blk_get_request() -> ... -> bt_get(). Note that this deadlock can happen only if scsi-mq is enabled. Prevent the deadlock by calling blk_put_request() as soon as the SCSI command completes instead of waiting for userspace to call read(). Cc: <stable@xxxxxxxxxxxxxxx> # 3.17+ Signed-off-by: Tony Battersby <tonyb@xxxxxxxxxxxxxxx> --- For inclusion in kernel 3.20. I encountered this problem using mptsas (can_queue == 127) and 8 disks connected via an expander. I have a test program called cydiskbench that spawns multiple threads, opens multiple /dev/sg* file descriptors, and sends multiple disk read/write commands to each /dev/sg* file descriptor. I can vary the # of disks being tested and the command queue depth per disk. Whenever I chose test parameters such that (n_disks * queue_depth_per_disk) > shost->can_queue, the test deadlocked as described when scsi-mq was enabled but worked just fine with scsi-mq disabled. I will send a separate patch to fix the same problem in the bsg driver. --- linux-3.19.0/drivers/scsi/sg.c.orig 2015-02-08 21:54:22.000000000 -0500 +++ linux-3.19.0/drivers/scsi/sg.c 2015-02-09 17:40:00.000000000 -0500 @@ -1350,6 +1350,17 @@ sg_rq_end_io(struct request *rq, int upt } /* Rely on write phase to clean out srp status values, so no "else" */ + /* + * Free the request as soon as it is complete so that its resources + * can be reused without waiting for userspace to read() the + * result. But keep the associated bio (if any) around until + * blk_rq_unmap_user() can be called from user context. + */ + srp->rq = NULL; + if (rq->cmd != rq->__cmd) + kfree(rq->cmd); + __blk_put_request(rq->q, rq); + write_lock_irqsave(&sfp->rq_list_lock, iflags); if (unlikely(srp->orphan)) { if (sfp->keep_orphan) @@ -1777,10 +1788,10 @@ sg_finish_rem_req(Sg_request *srp) SCSI_LOG_TIMEOUT(4, sg_printk(KERN_INFO, sfp->parentdp, "sg_finish_rem_req: res_used=%d\n", (int) srp->res_used)); + if (srp->bio) + ret = blk_rq_unmap_user(srp->bio); + if (srp->rq) { - if (srp->bio) - ret = blk_rq_unmap_user(srp->bio); - if (srp->rq->cmd != srp->rq->__cmd) kfree(srp->rq->cmd); blk_put_request(srp->rq); -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html